OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	421ec9017f	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu changes from Ingo Molnar: "Various x86 FPU handling cleanups, refactorings and fixes (Borislav Petkov, Oleg Nesterov, Rik van Riel)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/fpu: Kill eager_fpu_init_bp() x86/fpu: Don't allocate fpu->state for swapper/0 x86/fpu: Rename drop_init_fpu() to fpu_reset_state() x86/fpu: Fold __drop_fpu() into its sole user x86/fpu: Don't abuse drop_init_fpu() in flush_thread() x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec x86/fpu: Introduce restore_init_xstate() x86/fpu: Document user_fpu_begin() x86/fpu: Factor out memset(xstate, 0) in fpu_finit() paths x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu() x86/fpu: Always allow FPU in interrupt if use_eager_fpu() x86/fpu: __kernel_fpu_begin() should clear fpu_owner_task even if use_eager_fpu() x86/fpu: Also check fpu_lazy_restore() when use_eager_fpu() x86/fpu: Use task_disable_lazy_fpu_restore() helper x86/fpu: Use an explicit if/else in switch_fpu_prepare() x86/fpu: Introduce task_disable_lazy_fpu_restore() helper x86/fpu: Move lazy restore functions up a few lines x86/fpu: Change math_error() to use unlazy_fpu(), kill (now) unused save_init_fpu() x86/fpu: Don't do __thread_fpu_end() if use_eager_fpu() ...	2015-04-13 13:24:23 -07:00
Linus Torvalds	64f004a2ab	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug changes from Ingo Molnar: "Stack printing fixlets" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kernel: Use kstack_end() in dumpstack_64.c x86/kernel: Fix output of show_stack_log_lvl()	2015-04-13 13:23:34 -07:00
Linus Torvalds	b48488d109	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cacheinfo sysfs changes from Ingo Molnar: "This tree converts the x86 cacheinfo sysfs code to use the generic code in drivers/base/cacheinfo.c. It's not intended to change the sysfs ABI: 'This patch neither alters any existing sysfs entries nor their formating, however since the generic cacheinfo has switched to use the device attributes instead of the traditional raw kobjects, a directory named 'power' along with its standard attributes are added similar to any other device'" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/cacheinfo: Fix cache_get_priv_group() for Intel processors x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure	2015-04-13 13:21:51 -07:00
Linus Torvalds	9f3252f1ad	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Various cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/iommu: Fix header comments regarding standard and _FINISH macros x86/earlyprintk: Put CONFIG_PCI-only functions under the #ifdef x86: Fix up obsolete __cpu_set() function usage	2015-04-13 13:20:54 -07:00
Linus Torvalds	5945fba8c5	Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build changes from Ingo Molnar: "Small cleanups and fixes" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Cleanup KEXEC_VERIFY_SIG Kconfig help text x86/build/defconfig: Enable USB_EHCI_TT_NEWSCHED=y x86/build: Fix mkcapflags.sh bash-ism x86/Kconfig: Simplify X86_UP_APIC handling x86/Kconfig: Simplify X86_IO_APIC dependencies x86/Kconfig: Avoid issuing pointless turned off entries to .config	2015-04-13 13:19:59 -07:00
Linus Torvalds	8f74bc5ff0	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot changes from Ingo Molnar: "A number of cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Standardize strcmp() x86/boot/64: Remove pointless early_printk() message x86/boot/video: Move the 'video_segment' variable to video.c	2015-04-13 13:19:10 -07:00
Linus Torvalds	60f898eeaa	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "There were lots of changes in this development cycle: - over 100 separate cleanups, restructuring changes, speedups and fixes in the x86 system call, irq, trap and other entry code, part of a heroic effort to deobfuscate a decade old spaghetti asm code and its C code dependencies (Denys Vlasenko, Andy Lutomirski) - alternatives code fixes and enhancements (Borislav Petkov) - simplifications and cleanups to the compat code (Brian Gerst) - signal handling fixes and new x86 testcases (Andy Lutomirski) - various other fixes and cleanups By their nature many of these changes are risky - we tried to test them well on many different x86 systems (there are no known regressions), and they are split up finely to help bisection - but there's still a fair bit of residual risk left so caveat emptor" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (148 commits) perf/x86/64: Report regs_user->ax too in get_regs_user() perf/x86/64: Simplify regs_user->abi setting code in get_regs_user() perf/x86/64: Do report user_regs->cx while we are in syscall, in get_regs_user() perf/x86/64: Do not guess user_regs->cs, ss, sp in get_regs_user() x86/asm/entry/32: Tidy up JNZ instructions after TESTs x86/asm/entry/64: Reduce padding in execve stubs x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_fork x86/asm/entry/64: Simplify jumps in ret_from_fork x86/asm/entry/64: Remove a redundant jump x86/asm/entry/64: Optimize [v]fork/clone stubs x86/asm/entry: Zero EXTRA_REGS for stub32_execve() too x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat() x86/asm/entry/64: Use common code for rt_sigreturn() epilogue x86/asm/entry/64: Add forgotten CFI annotation x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout x86/asm/entry/64: Move opportunistic sysret code to syscall code path x86, selftests: Add sigreturn selftest x86/alternatives: Guard NOPs optimization x86/asm/entry: Clear EXTRA_REGS for all executable formats x86/signal: Remove pax argument from restore_sigcontext ...	2015-04-13 13:16:36 -07:00
Linus Torvalds	977e1ba508	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic changes from Ingo Molnar: "Changes: - SGI UV APIC driver updates - dead code removal" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/uv: Update the UV APIC HUB check x86/apic/uv: Update the UV APIC driver check x86/apic/uv: Update the APIC UV OEM check x86/apic: Remove verify_local_APIC()	2015-04-13 13:15:09 -07:00
Andrey Ryabinin	fc9bea0e28	x86, UML: fix integer overflow in ELF_ET_DYN_BASE Almost all arches define ELF_ET_DYN_BASE as 2/3 of TASK_SIZE. Though it seems that some architectures do this in a wrong way. The problem is that 2TASK_SIZE may overflow 32-bits so the real ELF_ET_DYN_BASE becomes wrong. Fix this overflow by dividing TASK_SIZE prior to multiplying: (TASK_SIZE / 3 2) Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2015-04-13 21:16:12 +02:00
Richard Weinberger	a98a6d864d	um: Remove broken highmem support Highmem was always buggy and experimental on UML(i386). In times where 64 bit computers are default we can remove that experimental code. Signed-off-by: Richard Weinberger <richard@nod.at>	2015-04-13 21:01:02 +02:00
Richard Weinberger	28fa468f53	um: Remove broken SMP support At times where UML used the TT mode to operate it had kind of SMP support. It never got finished nor was stable. Let's rip out that cruft and stop confusing developers which do tree-wide SMP cleanups. If someone wants SMP support UML it has do be done from scratch. Signed-off-by: Richard Weinberger <richard@nod.at>	2015-04-13 21:00:58 +02:00
Richard Weinberger	d0b5e15f0c	um: Remove SKAS3/4 support Before we had SKAS0 UML had two modes of operation TT (tracing thread) and SKAS3/4 (separated kernel address space). TT was known to be insecure and got removed a long time ago. SKAS3/4 required a few (3 or 4) patches on the host side which never went mainline. The last host patch is 10 years old. With SKAS0 mode (separated kernel address space using 0 host patches), default since 2005, SKAS3/4 is obsolete and can be removed. Signed-off-by: Richard Weinberger <richard@nod.at>	2015-04-13 21:00:53 +02:00
Linus Torvalds	7fd56474db	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main changes in this cycle were: - clockevents state machine cleanups and enhancements (Viresh Kumar) - clockevents broadcast notifier horror to state machine conversion and related cleanups (Thomas Gleixner, Rafael J Wysocki) - clocksource and timekeeping core updates (John Stultz) - clocksource driver updates and fixes (Ben Dooks, Dmitry Osipenko, Hans de Goede, Laurent Pinchart, Maxime Ripard, Xunlei Pang) - y2038 fixes (Xunlei Pang, John Stultz) - NMI-safe ktime_get_raw_fast() and general refactoring of the clock code, in preparation to perf's per event clock ID support (Peter Zijlstra) - generic sched/clock fixes, optimizations and cleanups (Daniel Thompson) - clockevents cpu_down() race fix (Preeti U Murthy)" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits) timers/PM: Drop unnecessary braces from tick_freeze() timers/PM: Fix up tick_unfreeze() timekeeping: Get rid of stale comment clockevents: Cleanup dead cpu explicitely clockevents: Make tick handover explicit clockevents: Remove broadcast oneshot control leftovers sched/idle: Use explicit broadcast oneshot control function ARM: Tegra: Use explicit broadcast oneshot control function ARM: OMAP: Use explicit broadcast oneshot control function intel_idle: Use explicit broadcast oneshot control function ACPI/idle: Use explicit broadcast control function ACPI/PAD: Use explicit broadcast oneshot control function x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions clockevents: Provide explicit broadcast oneshot control functions clockevents: Remove the broadcast control leftovers ARM: OMAP: Use explicit broadcast control function intel_idle: Use explicit broadcast control function cpuidle: Use explicit broadcast control function ACPI/processor: Use explicit broadcast control function ACPI/PAD: Use explicit broadcast control function ...	2015-04-13 11:08:28 -07:00
Linus Torvalds	49d2953c72	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Major changes: - Reworked CPU capacity code, for better SMP load balancing on systems with assymetric CPUs. (Vincent Guittot, Morten Rasmussen) - Reworked RT task SMP balancing to be push based instead of pull based, to reduce latencies on large CPU count systems. (Steven Rostedt) - SCHED_DEADLINE support updates and fixes. (Juri Lelli) - SCHED_DEADLINE task migration support during CPU hotplug. (Wanpeng Li) - x86 mwait-idle optimizations and fixes. (Mike Galbraith, Len Brown) - sched/numa improvements. (Rik van Riel) - various cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) sched/core: Drop debugging leftover trace_printk call sched/deadline: Support DL task migration during CPU hotplug sched/core: Check for available DL bandwidth in cpuset_cpu_inactive() sched/deadline: Always enqueue on previous rq when dl_task_timer() fires sched/core: Remove unused argument from init_[rt\|dl]_rq() sched/deadline: Fix rt runtime corruption when dl fails its global constraints sched/deadline: Avoid a superfluous check sched: Improve load balancing in the presence of idle CPUs sched: Optimize freq invariant accounting sched: Move CFS tasks to CPUs with higher capacity sched: Add SD_PREFER_SIBLING for SMT level sched: Remove unused struct sched_group_capacity::capacity_orig sched: Replace capacity_factor by usage sched: Calculate CPU's usage statistic and put it into struct sg_lb_stats::group_usage sched: Add struct rq::cpu_capacity_orig sched: Make scale_rt invariant with frequency sched: Make sched entity usage tracking scale-invariant sched: Remove frequency scaling from cpu_capacity sched: Track group sched_entity usage contributions sched: Add sched_avg::utilization_avg_contrib ...	2015-04-13 10:47:34 -07:00
Linus Torvalds	cc76ee75a9	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "Main changes: - jump label asm preparatory work for PowerPC (Anton Blanchard) - rwsem optimizations and cleanups (Davidlohr Bueso) - mutex optimizations and cleanups (Jason Low) - futex fix (Oleg Nesterov) - remove broken atomicity checks from {READ,WRITE}_ONCE() (Peter Zijlstra)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL define jump_label: Allow jump labels to be used in assembly jump_label: Allow asm/jump_label.h to be included in assembly locking/mutex: Further simplify mutex_spin_on_owner() locking: Remove atomicy checks from {READ,WRITE}_ONCE locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() documentation as well locking/rwsem: Fix lock optimistic spinning when owner is not running locking: Remove ACCESS_ONCE() usage locking/rwsem: Check for active lock before bailing on spinning locking/rwsem: Avoid deceiving lock spinners locking/rwsem: Set lock ownership ASAP locking/rwsem: Document barrier need when waking tasks locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads locking/mutex: Refactor mutex_spin_on_owner() locking/mutex: In mutex_spin_on_owner(), return true when owner changes	2015-04-13 10:27:28 -07:00
Linus Torvalds	9c65e12a55	Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI update from Ingo Molnar: "This tree includes various fixes, cleanups, a new efi=debug boot option and EFI boot stub memory allocation optimizations" * 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Retrieve FDT size when loaded from UEFI config table efi: Clean up the efi_call_phys_[prolog\|epilog]() save/restore interaction efi: Disable interrupts around EFI calls, not in the epilog/prolog calls x86/efi: Add a "debug" option to the efi= cmdline firmware: dmi_scan: Use direct access to static vars firmware: dmi_scan: Use full dmi version for SMBIOS3	2015-04-13 10:22:30 -07:00
Linus Torvalds	9003601310	The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7 KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs= =yu2b -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...	2015-04-13 09:47:01 -07:00
Richard Weinberger	3050a35fba	x86: Remove signal translation and exec_domain As execution domain support is gone we can remove signal translation from the signal code and remove exec_domain from thread_info. Signed-off-by: Richard Weinberger <richard@nod.at>	2015-04-12 21:03:28 +02:00
Richard Weinberger	fd223849f1	um: Remove signal translation and exec_domain As execution domain support is gone we can remove signal translation from the signal code and remove exec_domain from thread_info. Signed-off-by: Richard Weinberger <richard@nod.at>	2015-04-12 21:03:28 +02:00
Ingo Molnar	066450be41	perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() Dan Carpenter pointed out that the control flow in pt_pmu_hw_init() is a bit messy: for example the kfree(de_attrs) is entirely superfluous. Another problem is the inconsistent mixing of label based and direct return error handling. Add modern, label based error handling instead and clean up the code a bit as well. Note that we'll still do a kfree(NULL) in the normal case - this does not matter as this is an init path and kfree() returns early if it sees a NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20150409090805.GG17605@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-12 11:21:15 +02:00
Denys Vlasenko	3b75232d55	perf/x86/64: Report regs_user->ax too in get_regs_user() I don't see why we report e.g. orix_ax, which is not always meaningful, but don't report ax, which is meaningful. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1428671219-29341-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-11 13:08:53 +02:00
Denys Vlasenko	32caa06091	perf/x86/64: Simplify regs_user->abi setting code in get_regs_user() user_64bit_mode(regs) basically checks regs->cs to point to a 64-bit segment. This check used to be unreliable here because regs->cs was not always correct in syscalls. Now regs->cs is always correct: in syscalls, in interrupts, in exceptions. No need to emply heuristics here. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1428671219-29341-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-11 13:08:53 +02:00
Denys Vlasenko	5df71b396b	perf/x86/64: Do report user_regs->cx while we are in syscall, in get_regs_user() Yes, it is true that cx contains return address. It's not clear why we trash it. Stop doing that. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1428671219-29341-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-11 13:08:53 +02:00
Denys Vlasenko	aa21df0424	perf/x86/64: Do not guess user_regs->cs, ss, sp in get_regs_user() After recent changes to syscall entry points, user_regs->{cs,ss,sp} are always correct. (They used to be undefined while in syscalls). We can report them reliably, without guessing. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1428671219-29341-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-11 13:08:52 +02:00
Denys Vlasenko	54cfa458b4	x86/asm/entry/32: Tidy up JNZ instructions after TESTs After TESTs, use logically correct JNZ mnemonic instead of JNE. This doesn't change code: md5: c3005b39a11fe582b7df7908561ad4ee entry_32.o.before.asm c3005b39a11fe582b7df7908561ad4ee entry_32.o.after.asm Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428689620-21881-1-git-send-email-dvlasenk@redhat.com [ Added object file comparison. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-11 10:40:02 +02:00
Ard Biesheuvel	e68410ebf6	crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer This removes all the boilerplate from the existing implementation, and replaces it with calls into the base layer. It also changes the prototypes of the core asm functions to be compatible with the base prototype void (sha512_block_fn)(struct sha256_state sst, u8 const src, int blocks) so that they can be passed to the base layer directly. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-04-10 21:39:48 +08:00
Ard Biesheuvel	1631030ae6	crypto: x86/sha256_ssse3 - move SHA-224/256 SSSE3 implementation to base layer This removes all the boilerplate from the existing implementation, and replaces it with calls into the base layer. It also changes the prototypes of the core asm functions to be compatible with the base prototype void (sha256_block_fn)(struct sha256_state sst, u8 const src, int blocks) so that they can be passed to the base layer directly. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-04-10 21:39:47 +08:00
Ard Biesheuvel	824b43763c	crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer This removes all the boilerplate from the existing implementation, and replaces it with calls into the base layer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-04-10 21:39:47 +08:00
Rafael J. Wysocki	be77002101	Merge back earlier suspend/hibernate material for v4.1.	2015-04-10 12:01:59 +02:00
Mike Travis	1912c7afa3	x86/apic/uv: Update the UV APIC HUB check Update the check for UV2000/3000. Note when the HUB is not recognized. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Hedi Berriche <hedi@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/20150409182629.267239403@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-10 10:16:08 +02:00
Mike Travis	379b97e280	x86/apic/uv: Update the UV APIC driver check Fix a bug in the OEM check function that determines if the system is a UV system and the BIOS is compatible with the kernel's UV apic driver. This prevents some possibly obscure panics and guards the system against being started on SGI hardware that does not have the required kernel support. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Hedi Berriche <hedi@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/20150409182629.112998930@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-10 10:16:07 +02:00
Mike Travis	7a4e017041	x86/apic/uv: Update the APIC UV OEM check Optimize the first "SGI" OEM check to return faster if the system is not an SGI or UV system. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Hedi Berriche <hedi@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/20150409182628.952357922@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-10 10:16:07 +02:00
Aravind Gopalakrishnan	b44915927c	x86/iommu: Fix header comments regarding standard and _FINISH macros The comment line regarding IOMMU_INIT and IOMMU_INIT_FINISH macros is incorrect: "The standard vs the _FINISH differs in that the _FINISH variant will continue detecting other IOMMUs in the call list..." It should be "..the standard variant will continue detecting..." Fix that. Also, make it readable while at it. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: konrad.wilk@oracle.com Fixes: `6e96366933` ("x86, iommu: Update header comments with appropriate naming") Link: http://lkml.kernel.org/r/1428508017-5316-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:56:31 +02:00
Denys Vlasenko	a37f34a325	x86/asm/entry/64: Reduce padding in execve stubs execve stubs are 7 bytes only. Padding them to 16 bytes is a waste. text data bss dec hex filename 12594 0 0 12594 3132 entry_64.o.before 12530 0 0 12530 30f2 entry_64.o Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-8-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:26 +02:00
Denys Vlasenko	54a81e914b	x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_fork It used to be used to check for _TIF_IA32, but the check has been removed. Remove GET_THREAD_INFO() too. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-7-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:26 +02:00
Denys Vlasenko	66ad4efa51	x86/asm/entry/64: Simplify jumps in ret_from_fork Replace test jz 1f jmp label 1: with test jnz label Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-6-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:25 +02:00
Denys Vlasenko	a30b0085f5	x86/asm/entry/64: Remove a redundant jump Jumping to the very next instruction is not very useful: jmp label label: Removing the jump. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-5-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:25 +02:00
Denys Vlasenko	772951c4e4	x86/asm/entry/64: Optimize [v]fork/clone stubs Replace "call func; ret" with "jmp func". Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:25 +02:00
Denys Vlasenko	0f90fb979d	x86/asm/entry: Zero EXTRA_REGS for stub32_execve() too The change which affected how execve clears EXTRA_REGS missed 32-bit execve syscalls. Fix this by using 64-bit execve stub epilogue for them too. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:24 +02:00
Denys Vlasenko	05f1752d19	x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat() This is a preparatory patch for moving stub32_execve[at]() to this file. It makes sense to have all execve stubs in one place, so that they can reuse code. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:24 +02:00
Denys Vlasenko	31f0119b81	x86/asm/entry/64: Use common code for rt_sigreturn() epilogue Similarly to stub_execve, we can reuse the epilogue in stub_rt_sigreturn() and stub_x32_rt_sigreturn(). Add a comment explaining why we can't eliminage SAVE_EXTRA_REGS here. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428439424-7258-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 10:31:24 +02:00
Anton Blanchard	55dd0df781	jump_label: Allow asm/jump_label.h to be included in assembly Wrap asm/jump_label.h for all archs with #ifndef __ASSEMBLY__. Since these are kernel only headers, we don't need #ifdef __KERNEL__ so can simplify things a bit. If an architecture wants to use jump labels in assembly, it will still need to define a macro to create the __jump_table entries (see ARCH_STATIC_BRANCH in the powerpc asm/jump_label.h for an example). Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: catalin.marinas@arm.com Cc: davem@davemloft.net Cc: heiko.carstens@de.ibm.com Cc: jbaron@akamai.com Cc: linux@arm.linux.org.uk Cc: linuxppc-dev@lists.ozlabs.org Cc: liuj97@gmail.com Cc: mgorman@suse.de Cc: mmarek@suse.cz Cc: mpe@ellerman.id.au Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rostedt@goodmis.org Cc: schwidefsky@de.ibm.com Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1428551492-21977-1-git-send-email-anton@samba.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-09 09:40:23 +02:00
Linus Torvalds	cae2a173fe	x86: clean up/fix 'copy_in_user()' tail zeroing The rule for 'copy_from_user()' is that it zeroes the remaining kernel buffer even when the copy fails halfway, just to make sure that we don't leave uninitialized kernel memory around. Because even if we check for errors, some kernel buffers stay around after thge copy (think page cache). However, the x86-64 logic for user copies uses a copy_user_generic() function for all the cases, that set the "zerorest" flag for any fault on the source buffer. Which meant that it didn't just try to clear the kernel buffer after a failure in copy_from_user(), it also tried to clear the destination user buffer for the "copy_in_user()" case. Not only is that pointless, it also means that the clearing code has to worry about the tail clearing taking page faults for the user buffer case. Which is just stupid, since that case shouldn't happen in the first place. Get rid of the whole "zerorest" thing entirely, and instead just check if the destination is in kernel space or not. And then just use memset() to clear the tail of the kernel buffer if necessary. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-08 14:28:45 -07:00
Wanpeng Li	3ea3b7fa9a	kvm: mmu: lazy collapse small sptes into large sptes Dirty logging tracks sptes in 4k granularity, meaning that large sptes have to be split. If live migration is successful, the guest in the source machine will be destroyed and large sptes will be created in the destination. However, the guest continues to run in the source machine (for example if live migration fails), small sptes will remain around and cause bad performance. This patch introduce lazy collapsing of small sptes into large sptes. The rmap will be scanned in ioctl context when dirty logging is stopped, dropping those sptes which can be collapsed into a single large-page spte. Later page faults will create the large-page sptes. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428046825-6905-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:04 +02:00
Nadav Amit	1119022c71	KVM: x86: Clear CR2 on VCPU reset CR2 is not cleared as it should after reset. See Intel SDM table named "IA-32 Processor States Following Power-up, Reset, or INIT". Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-5-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:03 +02:00
Nadav Amit	ae561edeb4	KVM: x86: DR0-DR3 are not clear on reset DR0-DR3 are not cleared as they should during reset and when they are set from userspace. It appears to be caused by `c77fb5fe6f` ("KVM: x86: Allow the guest to run with dirty debug registers"). Force their reload on these situations. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-4-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:03 +02:00
Nadav Amit	58d269d8cc	KVM: x86: BSP in MSR_IA32_APICBASE is writable After reset, the CPU can change the BSP, which will be used upon INIT. Reset should return the BSP which QEMU asked for, and therefore handled accordingly. To quote: "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:02 +02:00
Radim Krčmář	3b5a5ffa92	KVM: x86: simplify kvm_apic_map recalculate_apic_map() uses two passes over all VCPUs. This is a relic from time when we selected a global mode in the first pass and set up the optimized table in the second pass (to have a consistent mode). Recent changes made mixed mode unoptimized and we can do it in one pass. Format of logical MDA is a function of the mode, so we encode it in apic_logical_id() and drop obsoleted variables from the struct. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-5-git-send-email-rkrcmar@redhat.com> [Add lid_bits temporary in apic_logical_id. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:01 +02:00
Radim Krčmář	3548a259f6	KVM: x86: avoid logical_map when it is invalid We want to support mixed modes and the easiest solution is to avoid optimizing those weird and unlikely scenarios. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-4-git-send-email-rkrcmar@redhat.com> [Add comment above KVM_APIC_MODE_* defines. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:01 +02:00
Radim Krčmář	9ea369b032	KVM: x86: fix mixed APIC mode broadcast Broadcast allowed only one global APIC mode, but mixed modes are theoretically possible. x2APIC IPI doesn't mean 0xff as broadcast, the rest does. x2APIC broadcasts are accepted by xAPIC. If we take SDM to be logical, even addreses beginning with 0xff should be accepted, but real hardware disagrees. This patch aims for simple code by considering most of real behavior as undefined. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-3-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:00 +02:00
Radim Krčmář	03d2249ea6	KVM: x86: use MDA for interrupt matching In mixed modes, we musn't deliver xAPIC IPIs like x2APIC and vice versa. Instead of preserving the information in apic_send_ipi(), we regain it by converting all destinations into correct MDA in the slow path. This allows easier reasoning about subsequent matching. Our kvm_apic_broadcast() had an interesting design decision: it didn't consider IOxAPIC 0xff as broadcast in x2APIC mode ... everything worked because IOxAPIC can't set that in physical mode and logical mode considered it as a message for first 8 VCPUs. This patch interprets IOxAPIC 0xff as x2APIC broadcast. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-2-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:59 +02:00
Eugene Korenevsky	92d71bc695	KVM: nVMX: remove unnecessary double caching of MAXPHYADDR After speed-up of cpuid_maxphyaddr() it can be called frequently: instead of heavyweight enumeration of CPUID entries it returns a cached pre-computed value. It is also inlined now. So caching its result became unnecessary and can be removed. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205644.GA1258@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:58 +02:00
Eugene Korenevsky	9090422f1c	KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry On each VM-entry CPU should check the following VMCS fields for zero bits beyond physical address width: - APIC-access address - virtual-APIC address - posted-interrupt descriptor address This patch adds these checks required by Intel SDM. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205627.GA1244@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:57 +02:00
Eugene Korenevsky	5a4f55cde8	KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu cpuid_maxphyaddr(), which performs lot of memory accesses is called extensively across KVM, especially in nVMX code. This patch adds a cached value of maxphyaddr to vcpu.arch to reduce the pressure onto CPU cache and simplify the code of cpuid_maxphyaddr() callers. The cached value is initialized in kvm_arch_vcpu_init() and reloaded every time CPUID is updated by usermode. It is obvious that these reloads occur infrequently. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205612.GA1223@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:56 +02:00
Radim Krčmář	80f0e95d1b	KVM: vmx: pass error code with internal error #2 Exposing the on-stack error code with internal error is cheap and potentially useful. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428001865-32280-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:56 +02:00
Radim Krčmář	80f7fdb1c7	x86: vdso: fix pvclock races with task migration If we were migrated right after __getcpu, but before reading the migration_count, we wouldn't notice that we read TSC of a different VCPU, nor that KVM's bug made pvti invalid, as only migration_count on source VCPU is increased. Change vdso instead of updating migration_count on destination. Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: `0a4e6be9ca` ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"") Message-Id: <1428000263-11892-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:55 +02:00
Paolo Bonzini	9c8fd1ba22	KVM: x86: optimize delivery of TSC deadline timer interrupt The newly-added tracepoint shows the following results on the tscdeadline_latency test: qemu-kvm-8387 [002] 6425.558974: kvm_vcpu_wakeup: poll time 10407 ns qemu-kvm-8387 [002] 6425.558984: kvm_vcpu_wakeup: poll time 0 ns qemu-kvm-8387 [002] 6425.561242: kvm_vcpu_wakeup: poll time 10477 ns qemu-kvm-8387 [002] 6425.561251: kvm_vcpu_wakeup: poll time 0 ns and so on. This is because we need to go through kvm_vcpu_block again after the timer IRQ is injected. Avoid it by polling once before entering kvm_vcpu_block. On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%) from the latency of the TSC deadline timer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:54 +02:00
Paolo Bonzini	362c698f82	KVM: x86: extract blocking logic from __vcpu_run Rename the old __vcpu_run to vcpu_run, and extract part of it to a new function vcpu_block. The next patch will add a new condition in vcpu_block, avoid extra indentation. Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:53 +02:00
Wanpeng Li	35fd68a38d	kvm: x86: fix x86 eflags fixed bit Guest can't be booted w/ ept=0, there is a message dumped as below: If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000011 EBX=f000d2f6 ECX=00006cac EDX=000f8956 ESI=bffbdf62 EDI=00000000 EBP=00006c68 ESP=00006c68 EIP=0000d187 EFL=00000004 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =e000 000e0000 ffffffff 00809300 DPL=0 DS16 [-WA] CS =f000 000f0000 ffffffff 00809b00 DPL=0 CS16 [-RA] SS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] DS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] FS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] GS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f6a80 00000037 IDT= 000f6abe 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=01 1e b8 6a 2e 0f 01 16 74 6a 0f 20 c0 66 83 c8 01 0f 22 c0 <66> ea 8f d1 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 b8X X86 eflags bit 1 is fixed set, which means that 1 << 1 is set instead of 1, this patch fix it. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428473294-6633-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:52 +02:00
Denys Vlasenko	8b3607b5b8	x86/asm/entry/64: Add forgotten CFI annotation Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428424967-14460-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-08 09:18:36 +02:00
Denys Vlasenko	3304c9c37b	x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout Interrupt entry points are handled with the following code, each 32-byte code block contains seven entry points: ... [push][jump 22] // 4 bytes [push][jump 18] // 4 bytes [push][jump 14] // 4 bytes [push][jump 10] // 4 bytes [push][jump 6] // 4 bytes [push][jump 2] // 4 bytes [push][jump common_interrupt][padding] // 8 bytes [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump] [push][jump common_interrupt][padding] [padding_2] common_interrupt: And there is a table which holds pointers to every entry point, IOW: to every push. In cold cache, two jumps are still costlier than one, even though we get the benefit of them residing in the same cacheline. This change replaces short jumps with near ones to 'common_interrupt', and pads every push+jump pair to 8 bytes. This way, each interrupt takes only one jump. This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before dispatch table with ".align 8" - we do not need anything stronger than that. The table of entry addresses (the interrupt[] array) is no longer necessary, the address of entries can be easily calculated as (irq_entries_start + i*8). text data bss dec hex filename 12546 0 0 12546 3102 entry_64.o.before 11626 0 0 11626 2d6a entry_64.o The size decrease is because 1656 bytes of .init.rodata are gone. That's initdata, though. The resident size does go up a bit. Run-tested (32 and 64 bits). Acked-and-Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-08 09:02:13 +02:00
Denys Vlasenko	fffbb5dcfd	x86/asm/entry/64: Move opportunistic sysret code to syscall code path This change does two things: Copy-pastes "retint_swapgs:" code into syscall handling code, the copy is under "syscall_return:" label. The code is unchanged apart from some label renames. Removes "opportunistic sysret" code from "retint_swapgs:" code block, since now it won't be reached by syscall return. This in fact removes most of the code in question. text data bss dec hex filename 12530 0 0 12530 30f2 entry_64.o.before 12562 0 0 12562 3112 entry_64.o Run-tested. Acked-and-Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427993219-7291-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-08 09:02:12 +02:00
Ingo Molnar	4bcc7827b0	Linux 4.0-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVIws/AAoJEHm+PkMAQRiGwEcH/1GCBqrBzXaKwDdCPMRcYVUb MYkXmGkCGRYWe5MXI8QNAaa/CdG6mAFMHWN6CaMMpLTxnM1m87uBg01fQMsh73BO mRVLKE/soiJDnR1gYzBBDBYV/AUvytN5PhgeNaA95YIJvU3T1f3iTnV8vs30Dp0L YpxSqwr3C0k7C9IE0VcgfzvWJPCnQ9IWHuX3jn5s1XjGKVNbBYHMt6FusHdyXMfT dp8ksuGHwm30mTFI5xJpKOrRzfi+P5EsEUrsnFRPRM/iFTVrM5R7eaUhsRZb2+Wo YApnbYhUYz7om1AuQ+UZ/+S6y7ZLlGWegI1lWI754GIsczG5vPHEYhhgkzMhTsc= =kR1V -----END PGP SIGNATURE----- Merge tag 'v4.0-rc7' into x86/asm, to resolve conflicts Conflicts: arch/x86/kernel/entry_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-08 09:01:54 +02:00
Paolo Bonzini	bf0fb67cf9	KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVHQ0kAAoJECPQ0LrRPXpDHKQQALjw6STaZd7n20OFopNgHd4P qVeWYEKBxnsiSvL4p3IOSlZlEul+7x08aZqtyxWQRQcDT4ggTI+3FKKfc+8yeRpH WV6YJP0bGqz7039PyMLuIgs48xkSZtntePw69hPJfHZh4C1RBlP5T2SfE8mU8VZX fWToiU3W12QfKnmN7JFgxZopnGhrYCrG0EexdTDziAZu0GEMlDrO4wnyTR60WCvT 4TEF73R0kpAz4yplKuhcDHuxIG7VFhQ4z7b09M1JtR0gQ3wUvfbD3Wqqi49SwHkv NQOStcyLsIlDosSRcLXNCwb3IxjObXTBcAxnzgm2Aoc1xMMZX1ZPQNNs6zFZzycb 2c6QMiQ35zm7ellbvrG+bT+BP86JYWcAtHjWcaUFgqSJjb8MtqcMtsCea/DURhqx /kictqbPYBBwKW6SKbkNkisz59hPkuQnv35fuf992MRCbT9LAXLPRLbcirucCzkE p1MOotsWoO3ldJMZaVn0KYk3sQf6mCIfbYPEdOcw3fhJlvyy3NdjVkLOFbA5UUg1 rQ7Ru2rTemBc0ExVrymngNTMpMB4XcEeJzXfhcgMl3DWbDj60Ku/O26sDtZ6bsFv JuDYn8FVDHz9gpEQHgiUi1YMsBKXLhnILa1ppaa6AflykU3BRfYjAk1SXmX84nQK mJUJEdFuxi6pHN0UKxUI =avA4 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups	2015-04-07 18:09:20 +02:00
Dave Young	22ef882e6b	x86/mm/numa: Fix kernel stack corruption in numa_init()->numa_clear_kernel_node_hotplug() I got below kernel panic during kdump test on Thinkpad T420 laptop: [ 0.000000] No NUMA configuration found [ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000037ba4fff] [ 0.000000] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81d21910 ... [ 0.000000] Call Trace: [ 0.000000] [<ffffffff817c2a26>] dump_stack+0x45/0x57 [ 0.000000] [<ffffffff817bc8d2>] panic+0xd0/0x204 [ 0.000000] [<ffffffff81d21910>] ? numa_clear_kernel_node_hotplug+0xe6/0xf2 [ 0.000000] [<ffffffff8107741b>] __stack_chk_fail+0x1b/0x20 [ 0.000000] [<ffffffff81d21910>] numa_clear_kernel_node_hotplug+0xe6/0xf2 [ 0.000000] [<ffffffff81d21e5d>] numa_init+0x1a5/0x520 [ 0.000000] [<ffffffff81d222b1>] x86_numa_init+0x19/0x3d [ 0.000000] [<ffffffff81d22460>] initmem_init+0x9/0xb [ 0.000000] [<ffffffff81d0d00c>] setup_arch+0x94f/0xc82 [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff817bd0bb>] ? printk+0x55/0x6b [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff81d05d9b>] start_kernel+0xe8/0x4d6 [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff81d055ee>] x86_64_start_reservations+0x2a/0x2c [ 0.000000] [<ffffffff81d05751>] x86_64_start_kernel+0x161/0x184 [ 0.000000] ---[ end Kernel panic - not syncing: stack-protector: Kernel sta This is caused by writing over the end of numa mask bitmap in numa_clear_kernel_node(). numa_clear_kernel_node() tries to set the node id in a mask bitmap, by iterating all reserved regions and assuming that every region has a valid nid. This assumption is not true because there's an exception for some graphic memory quirks. See trim_snb_memory() in arch/x86/kernel/setup.c It is easily to reproduce the bug in the kdump kernel because kdump kernel use pre-reserved memory instead of the whole memory, but kexec pass other reserved memory ranges to 2nd kernel as well. like below in my test: kdump kernel ram 0x2d000000 - 0x37bfffff One of the reserved regions: 0x40000000 - 0x40100000 which includes 0x40004000, a page excluded in trim_snb_memory(). For this memblock reserved region the nid is not set, it is still default value MAX_NUMNODES. later node_set will set bit MAX_NUMNODES thus stack corruption happen. This also happens when booting with mem= kernel commandline during my test. Fixing it by adding a check, do not call node_set in case nid is MAX_NUMNODES. Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bhe@redhat.com Cc: qiuxishi@huawei.com Link: http://lkml.kernel.org/r/20150407134132.GA23522@dhcp-16-198.nay.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-07 16:01:19 +02:00
Greg Kroah-Hartman	b3e3bf2ef2	Merge 4.0-rc7 into tty-next We want the fixes in here as well, also to help out with merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-04-07 11:07:20 +02:00
Borislav Petkov	69df353ff3	x86/alternatives: Guard NOPs optimization Take a look at the first instruction byte before optimizing the NOP - there might be something else there already, like the ALTERNATIVE_2() in rdtsc_barrier() which NOPs out on AMD even though we just patched in an MFENCE. This happens because the alternatives sees X86_FEATURE_MFENCE_RDTSC, AMD CPUs set it, we patch in the MFENCE and right afterwards it sees X86_FEATURE_LFENCE_RDTSC which AMD CPUs don't set and we blindly optimize the NOP. Checking whether at least the first byte is 0x90 prevents that. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1428181662-18020-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-06 09:24:09 +02:00
Denys Vlasenko	fc3e958a2b	x86/asm/entry: Clear EXTRA_REGS for all executable formats On failure, sys_execve() does not clobber EXTRA_REGS, so we can just return to userpsace without saving/restoring them. On success, ELF_PLAT_INIT() in sys_execve() clears all these registers. On other executable formats: - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone else except sh) doesn't define it. - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most others) doesn't define it. - There are no such hooks in binfmt_aout.c et al. We inherit EXTRA_REGS from the prior executable. This inconsistency was not intended. This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve, removes register clearing in ELF_PLAT_INIT(), and instead simply clears them on success in stub_execve. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-06 09:24:08 +02:00
Brian Gerst	6a3713f001	x86/signal: Remove pax argument from restore_sigcontext The 'pax' argument is unnecesary. Instead, store the RAX value directly in regs. This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext() was changed to return an error code instead of EAX directly: https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174 In 2007 sigaltstack syscall support was added, where the return value of restore_sigcontext() was changed to carry the memory-copying failure code. But instead of putting 'ax' into regs->ax directly, it was carried in via a pointer and then returned, where the generic syscall return code copied it to regs->ax. So there was never any deeper reason for this suboptimal pattern, it was simply never noticed after being introduced. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-06 09:06:39 +02:00
Borislav Petkov	dbe4058a6a	x86/alternatives: Fix ALTERNATIVE_2 padding generation properly Quentin caught a corner case with the generation of instruction padding in the ALTERNATIVE_2 macro: if len(orig_insn) < len(alt1) < len(alt2), then not enough padding gets added and that is not good(tm) as we could overwrite the beginning of the next instruction. Luckily, at the time of this writing, we don't have ALTERNATIVE_2() invocations which have that problem and even if we did, a simple fix would be to prepend the instructions with enough prefixes so that that corner case doesn't happen. However, best it would be if we fixed it properly. See below for a simple, abstracted example of what we're doing. So what we ended up doing is, we compute the max(len(alt1), len(alt2)) - len(orig_insn) and feed that value to the .skip gas directive. The max() cannot have conditionals due to gas limitations, thus the fancy integer math. With this patch, all ALTERNATIVE_2 sites get padded correctly; generating obscure test cases pass too: #define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b))))) #define gen_skip(orig, alt1, alt2, marker) \ .skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \ (alt_max_short(alt1, alt2) - (orig)),marker .pushsection .text, "ax" .globl main main: gen_skip(1, 2, 4, 0x09) gen_skip(4, 1, 2, 0x10) ... .popsection Thanks to Quentin for catching it and double-checking the fix! Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-04 15:58:23 +02:00
Linus Torvalds	567cfea99a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a SYSRET single-stepping fix, a dmi-scan robustization fix, a reboot quirk and a kgdb fixlet" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kgdb/x86: Fix reporting of 'si' in kgdb on x86_64 x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk MAINTAINERS: Change the x86 microcode loader maintainer firmware: dmi_scan: Prevent dmi_num integer overflow	2015-04-03 10:42:32 -07:00
Linus Torvalds	ec2e76b4c7	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two x86 Intel PMU constraint handling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints perf/x86/intel: Filter branches for PEBS event	2015-04-03 10:38:36 -07:00
Borislav Petkov	6b51311c97	x86/asm/entry/64: Use a define for an invalid segment selector ... instead of a naked number, for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:29:13 +02:00
Borislav Petkov	7c74d5b7b7	x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR value Commit: `d56fe4bf5f` ("x86/asm/entry/64: Always set up SYSENTER MSRs") missed to add "ULL" to the 0 and wrmsrl_safe() complains: arch/x86/kernel/cpu/common.c: In function ‘syscall_init’: arch/x86/kernel/cpu/common.c:1226:2: warning: right shift count >= width of type wrmsrl_safe(MSR_IA32_SYSENTER_CS, 0); Fix it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:29:12 +02:00
Borislav Petkov	78cac48c04	x86/mm/KASLR: Propagate KASLR status to kernel proper Commit: `e2b32e6785` ("x86, kaslr: randomize module base load address") made module base address randomization unconditional and didn't regard disabled KKASLR due to CONFIG_HIBERNATION and command line option "nokaslr". For more info see (now reverted) commit: `f47233c2d3` ("x86/mm/ASLR: Propagate base load address calculation") In order to propagate KASLR status to kernel proper, we need a single bit in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the top-down allocated bits for bits supposed to be used by the bootloader. Originally-From: Jiri Kosina <jkosina@suse.cz> Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:26:15 +02:00
Mark Einon	7f99f8b94c	x86/earlyprintk: Put CONFIG_PCI-only functions under the #ifdef Two static functions are only used if CONFIG_PCI is defined, so only build them if this is the case. Fixes the build warnings: arch/x86/kernel/early_printk.c:98:13: warning: ‘mem32_serial_out’ defined but not used [-Wunused-function] static void mem32_serial_out(unsigned long addr, int offset, int value) ^ arch/x86/kernel/early_printk.c:105:21: warning: ‘mem32_serial_in’ defined but not used [-Wunused-function] static unsigned int mem32_serial_in(unsigned long addr, int offset) ^ Also convert a few related instances of uintXX_t to kernel specific uXX defines. Signed-off-by: Mark Einon <mark.einon@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stuart.r.anderson@intel.com Link: http://lkml.kernel.org/r/1427923924-22653-1-git-send-email-mark.einon@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:21:56 +02:00
Aravind Gopalakrishnan	cee8f5a6c8	x86/mce/severity: Fix warning about indented braces Dan reported compiler warnings about missing curly braces in mce_severity_amd(). Reindent the catch-all "return MCE_AR_SEVERITY" correctly to single tab. While at it, chain ctx == IN_KERNEL check with mcgstatus check to make it cleaner, as suggested by Boris. No functional changes are introduced by this patch. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1427814281-18192-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:20:38 +02:00
Borislav Petkov	47091e3c5b	x86/asm/entry: Drop now unused ENABLE_INTERRUPTS_SYSEXIT32 Commit: `4214a16b02` ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER") removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the macro now too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 10:34:19 +02:00
Andy Lutomirski	4214a16b02	x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked with usergs and IRQs on. That means that we rely on STI to correctly mask interrupts for one instruction. This is okay by itself, but the semantics with respect to NMIs are unclear. Avoid the whole issue by using SYSRETL instead. For background, Intel CPUs don't allow SYSCALL from compat mode, but they do allow SYSRETL back to compat mode. Go figure. To avoid doing too much at once, this doesn't revamp the calling convention. We still return with EBP, EDX, and ECX on the user stack. Oddly this seems to be 30 cycles or so faster. Avoiding POPFQ and STI will account for under half of that, I think, so my best guess is that Intel just optimizes SYSRET much better than SYSEXIT. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 09:14:00 +02:00
Thomas Gleixner	435c350e81	x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions Replace the clockevents_notify() call with an explicit function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8569669.lgxIty9PKW@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 08:44:34 +02:00
Thomas Gleixner	162a688e84	x86/amd/idle, clockevents: Use explicit broadcast control function Replace the clockevents_notify() call with an explicit function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1528188.S1pjqkSL1P@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 08:44:31 +02:00
Andy Lutomirski	cf9328cc99	x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1 We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once, and we unnecessarily cache the value in tss.sp1. We never read the cached value. Remove all of the caching. It serves no purpose. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 08:30:44 +02:00
Andy Lutomirski	ff8287f363	x86/asm/entry/32: Improve a TOP_OF_KERNEL_STACK_PADDING comment At Denys' request, clean up the comment describing stack padding in the 32-bit sysenter path. No code changes. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/41fee7bb8490ae840fe7ef2699f9c2feb932e729.1428002830.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 08:30:44 +02:00
Ross Zwisler	d9dc64f30a	x86/asm: Add support for the CLWB instruction Add support for the new CLWB (cache line write back) instruction. This instruction was announced in the document "Intel Architecture Instruction Set Extensions Programming Reference" with reference number 319433-022. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf The CLWB instruction is used to write back the contents of dirtied cache lines to memory without evicting the cache lines from the processor's cache hierarchy. This should be used in favor of clflushopt or clflush in cases where you require the cache line to be written to memory but plan to access the data again in the near future. One of the main use cases for this is with persistent memory where CLWB can be used with PCOMMIT to ensure that data has been accepted to memory and is durable on the DIMM. This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT with appropriate fencing: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); /* Flush any possible final partial cacheline / clwb(vend); / * Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes. * (MFENCE via mb() also works) / wmb(); / PCOMMIT and the required SFENCE for ordering */ pcommit_sfence(); } After this function completes the data pointed to by vaddr is has been accepted to memory and will be durable if the vaddr points to persistent memory. Regarding the details of how the alternatives assembly is set up, we need one additional byte at the beginning of the CLFLUSH so that we can flip it into a CLFLUSHOPT by changing that byte into a 0x66 prefix. Two options are to either insert a 1 byte ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX. Both have no functional effect with the plain CLFLUSH, but I've been told that executing a CLFLUSH + prefix should be faster than executing a CLFLUSH + NOP. We had to hard code the assembly for CLWB because, lacking the ability to assemble the CLWB instruction itself, the next closest thing is to have an xsaveopt instruction with a 0x66 prefix. Unfortunately XSAVEOPT itself is also relatively new, and isn't included by all the GCC versions that the kernel needs to support. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 06:56:38 +02:00
Linus Torvalds	b0838b1501	xen: regression fixes for 4.0-rc6 - Fix two regressions in the balloon driver's use of memory hotplug when used in a PV guest. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJVHWimAAoJEFxbo/MsZsTRSFkH/A1SYwcqUADPVZYDSc2T10Km 1kF7UifizlJ8/gCbEbhzLMDSPTwEENzN5ONWXQ+a3EXuK0UwLBnic9uKWAg2Q8cp +uIpU1P15yM7rJFRaf6PZ2BqFdvWpXUJgNnof4+r0uqqa57hdmVOmVMKFenlf/h7 2jj0b19wS6jhuwl9PzgKTKQyjnpWrmkhfg5tDQzn1qUCHoKDIhM7ROUP7wsYfIu8 iAT3JV9lmNwdGT5jwf4MgRmBpqGZzQs03xaAz2E9/fnWWv7u4EuZcmbDcooUmi7H luHQ2Dds1ajhjgtRwPoxT0tHFuRl7s0jsyJ8GOLsdmoMkJTAQnQA21egI7Bpdfo= =HHbI -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen regression fixes from David Vrabel: "Fix two regressions in the balloon driver's use of memory hotplug when used in a PV guest" * tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: before adding hotplugged memory, set frames to invalid x86/xen: prepare p2m list for memory hotplug	2015-04-02 13:53:53 -07:00
Ingo Molnar	2e54a5bdba	perf/x86/intel/pt: Fix the 32-bit build On a 32-bit build I got: arch/x86/kernel/cpu/perf_event_intel_pt.c:413:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] arch/x86/kernel/cpu/perf_event_intel_bts.c:162:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Fix it. The code should probably be (re-)tested on 32-bit systems to make sure all is fine. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:58:45 +02:00
Andi Kleen	cd1f11de69	perf/x86/intel: Avoid rewriting DEBUGCTL with the same value for LBRs perf with LBRs on has a tendency to rewrite the DEBUGCTL MSR with the same value. Add a little optimization to skip the unnecessary write. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1426871484-21285-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:20 +02:00
Andi Kleen	1a78d93750	perf/x86/intel: Streamline LBR MSR handling in PMI The perf PMI currently does unnecessary MSR accesses when LBRs are enabled. We use LBR freezing, or when in callstack mode force the LBRs to only filter on ring 3. So there is no need to disable the LBRs explicitely in the PMI handler. Also we always unnecessarily rewrite LBR_SELECT in the LBR handler, even though it can never change. 5) \| /* write_msr: MSR_LBR_SELECT(1c8), value 0 / 5) \| / read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 / 5) \| / write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 / 5) \| / write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f / 5) \| / write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 / 5) \| / write_msr: MSR_LBR_SELECT(1c8), value 0 / 5) \| / read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 / 5) \| / write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */ This patch: - Avoids disabling already frozen LBRs unnecessarily in the PMI - Avoids changing LBR_SELECT in the PMI Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1426871484-21285-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:19 +02:00
Andi Kleen	15fde1101a	perf/x86: Only dump PEBS register when PEBS has been detected Technically PEBS_ENABLED is only guaranteed to exist when we detected PEBS. So add a check for this to the PMU dump function. I don't think it can happen on a real CPU, but could in a VM. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:17 +02:00
Andi Kleen	da3e606d88	perf/x86: Dump DEBUGCTL in PMU dump LBRs and LBR freezing are controlled through the DEBUGCTL MSR. So dump the state of DEBUGCTL too when dumping the PMU state. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:17 +02:00
Andi Kleen	8882edf735	perf/x86/intel: Reset more state in PMU reset The PMU reset code didn't quite keep up with newer PMU features. Improve it a bit to really reset a modern PMU: - Clear all overflow status - Clear LBRs and freezing state - Disable fixed counters too Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:16 +02:00
Stephane Eranian	b37609c30e	perf/x86/intel: Make the HT bug workaround conditional on HT enabled This patch disables the PMU HT bug when Hyperthreading (HT) is disabled. We cannot do this test immediately when perf_events is initialized. We need to wait until the topology information is setup properly. As such, we register a later initcall, check the topology and potentially disable the workaround. To do this, we need to ensure there is no user of the PMU. At this point of the boot, the only user is the NMI watchdog, thus we disable it during the switch and re-enable it right after. Having the workaround disabled when it is not needed provides some benefits by limiting the overhead is time and space. The workaround still ensures correct scheduling of the corrupting memory events (0xd0, 0xd1, 0xd2) when HT is off. Those events can only be measured on counters 0-3. Something else the current kernel did not handle correctly. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-13-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:15 +02:00
Stephane Eranian	c02cdbf60b	perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation This patch limits the number of counters available to each CPU when the HT bug workaround is enabled. This is necessary to avoid situation of counter starvation. Such can arise from configuration where one HT thread, HT0, is using all 4 counters with corrupting events which require exclusion the the sibling HT, HT1. In such case, HT1 would not be able to schedule any event until HT0 is done. To mitigate this problem, this patch artificially limits the number of counters to 2. That way, we can gurantee that at least 2 counters are not in exclusive mode and therefore allow the sibling thread to schedule events of the same type (system vs. per-thread). The 2 counters are not determined in advance. We simply set the limit to two events per HT. This helps mitigate starvation in case of events with specific counter constraints such a PREC_DIST. Note that this does not elimintate the starvation is all cases. But it is better than not having it. (Solution suggested by Peter Zjilstra.) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:14 +02:00
Stephane Eranian	a90738c2cb	perf/x86/intel: Fix intel_get_event_constraints() for dynamic constraints With dynamic constraint, we need to restart from the static constraints each time the intel_get_event_constraints() is called. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:14 +02:00
Maria Dimakopoulou	b63b4b459a	perf/x86/intel: Enforce HT bug workaround with PEBS for SNB/IVB/HSW This patch modifies the PEBS constraint tables for SNB/IVB/HSW such that corrupting events supporting PEBS activate the HT workaround. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-9-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:13 +02:00
Maria Dimakopoulou	93fcf72cc0	perf/x86/intel: Enforce HT bug workaround for SNB/IVB/HSW This patches activates the HT bug workaround for the SNB/IVB/HSW processors. This covers non-PEBS mode. Activation is done thru the constraint tables. Both client and server processors needs this workaround. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:12 +02:00
Maria Dimakopoulou	e979121b1b	perf/x86/intel: Implement cross-HT corruption bug workaround This patch implements a software workaround for a HW erratum on Intel SandyBridge, IvyBridge and Haswell processors with Hyperthreading enabled. The errata are documented for each processor in their respective specification update documents: - SandyBridge: BJ122 - IvyBridge: BV98 - Haswell: HSD29 The bug causes silent counter corruption across hyperthreads only when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3). Counters measuring those events may leak counts to the sibling counter. For instance, counter 0, thread 0 measuring event 0xd0, may leak to counter 0, thread 1, regardless of the event measured there. The size of the leak is not predictible. It all depends on the workload and the state of each sibling hyper-thread. The corrupting events do undercount as a consequence of the leak. The leak is compensated automatically only when the sibling counter measures the exact same corrupting event AND the workload is on the two threads is the same. Given, there is no way to guarantee this, a work-around is necessary. Furthermore, there is a serious problem if the leaked count is added to a low-occurrence event. In that case the corruption on the low occurrence event can be very large, e.g., orders of magnitude. There is no HW or FW workaround for this problem. The bug is very easy to reproduce on a loaded system. Here is an example on a Haswell client, where CPU0, CPU4 are siblings. We load the CPUs with a simple triad app streaming large floating-point vector. We use 0x81d0 corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and 0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not using the LBR, the 0x20cc event should be zero. $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 139 277 291 r20cc 10,000969126 seconds time elapsed In this example, 0x81d0 and r20cc ar eusing sinling counters on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it from 0 to 139 millions occurrences. This patch provides a software workaround to this problem by modifying the way events are scheduled onto counters by the kernel. The patch forces cross-thread mutual exclusion between counters in case a corrupting event is measured by one of the hyper-threads. If thread 0, counter 0 is measuring event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting event is measured on any hyper-thread, event scheduling proceeds as before. The same example run with the workaround enabled, yield the correct answer: $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 0 r20cc 10,000969126 seconds time elapsed The patch does provide correctness for all non-corrupting events. It does not "repatriate" the leaked counts back to the leaking counter. This is planned for a second patch series. This patch series makes this repatriation more easy by guaranteeing the sibling counter is not measuring any useful event. The patch introduces dynamic constraints for events. That means that events which did not have constraints, i.e., could be measured on any counters, may now be constrained to a subset of the counters depending on what is going on the sibling thread. The algorithm is similar to a cache coherency protocol. We call it XSU in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU counter. As a consequence of the workaround, users may see an increased amount of event multiplexing, even in situtations where there are fewer events than counters measured on a CPU. Patch has been tested on all three impacted processors. Note that when HT is off, there is no corruption. However, the workaround is still enabled, yet not costing too much. Adding a dynamic detection of HT on turned out to be complex are requiring too much to code to be justified. This patch addresses the issue when PEBS is not used. A subsequent patch fixes the problem when PEBS is used. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:12 +02:00
Maria Dimakopoulou	6f6539cad9	perf/x86/intel: Add cross-HT counter exclusion infrastructure This patch adds a new shared_regs style structure to the per-cpu x86 state (cpuc). It is used to coordinate access between counters which must be used with exclusion across HyperThreads on Intel processors. This new struct is not needed on each PMU, thus is is allocated on demand. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [peterz: spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:11 +02:00
Stephane Eranian	79cba82244	perf/x86: Add 'index' param to get_event_constraint() callback This patch adds an index parameter to the get_event_constraint() x86_pmu callback. It is expected to represent the index of the event in the cpuc->event_list[] array. When the callback is used for fake_cpuc (evnet validation), then the index must be -1. The motivation for passing the index is to use it to index into another cpuc array. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:10 +02:00
Maria Dimakopoulou	c5362c0c37	perf/x86: Add 3 new scheduling callbacks This patch adds 3 new PMU model specific callbacks during the event scheduling done by x86_schedule_events(). ->start_scheduling(): invoked when entering the schedule routine. ->stop_scheduling(): invoked at the end of the schedule routine ->commit_scheduling(): invoked for each committed event To be used optionally by model-specific code. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:09 +02:00

1 2 3 4 5 ...

21411 Commits