OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Borislav Petkov	8feaa64a9a	x86/microcode/AMD: Make find_proper_container() sane again Fixup signature and retvals, return the container struct through the passed in pointer, not as a function return value. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jürgen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/20161218164414.9649-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 10:46:19 +01:00
Linus Torvalds	8421c60446	platform-drivers-x86 for 4.10-2 Move and add registration for the mlx-platform driver. Introduce button and lid drivers for the surface3 (different from the surface3-pro). Add BXT PMIC TMU support. Add Y700 to existing ideapad-laptop quirk. ideapad-laptop: - Add Y700 15-ACZ to no_hw_rfkill DMI list surface3_button: - Introduce button support for the Surface 3 surface3-wmi: - Add custom surface3 platform device for controlling LID - Balance locking on error path mlx-platform: - Add mlxcpld-hotplug driver registration - Fix semicolon.cocci warnings - Move module from arch/x86 platform/x86: - Add Whiskey Cove PMIC TMU support -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYVxXxAAoJEKbMaAwKp364mDAIAL750zoO/liY1qfFGktkY2zS xknJqh4UeIVW6iN9hEq7jQD3jkFl7hNuXgFyogLZgJQge8gfufsOD7ljkhp2wVF2 3Ir437f4GdcUeuhO6PwuPZnn++Y6oH2rLZc1d+5anZRikW470tnXitrXs5hsqG57 74KiSFa+v+Uj1kqU4Gjt0QDSfmQc185Wn5xieHPpZrcjDaut+N4t06+MDMcH7fjk 6nJ+tvo/dreZojA+AklljaNB2BioIGbEOGamnxOJ9Rjs0jV2d0A1c/6XBHDe+Xtu SkWqvjdrLOSxAjErBpB97VYnYihjCDXfl+DIABOdumGO6hJz01DvbZ+VVLwGM0E= =YH31 -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v4.10-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull more x86 platform driver updates from Darren Hart: "Move and add registration for the mlx-platform driver. Introduce button and lid drivers for the surface3 (different from the surface3-pro). Add BXT PMIC TMU support. Add Y700 to existing ideapad-laptop quirk. Summary: ideapad-laptop: - Add Y700 15-ACZ to no_hw_rfkill DMI list surface3_button: - Introduce button support for the Surface 3 surface3-wmi: - Add custom surface3 platform device for controlling LID - Balance locking on error path mlx-platform: - Add mlxcpld-hotplug driver registration - Fix semicolon.cocci warnings - Move module from arch/x86 platform/x86: - Add Whiskey Cove PMIC TMU support" * tag 'platform-drivers-x86-v4.10-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: platform/x86: surface3-wmi: Balance locking on error path platform/x86: Add Whiskey Cove PMIC TMU support platform/x86: ideapad-laptop: Add Y700 15-ACZ to no_hw_rfkill DMI list platform/x86: Introduce button support for the Surface 3 platform/x86: Add custom surface3 platform device for controlling LID platform/x86: mlx-platform: Add mlxcpld-hotplug driver registration platform/x86: mlx-platform: Fix semicolon.cocci warnings platform/x86: mlx-platform: Move module from arch/x86	2016-12-18 15:45:33 -08:00
Linus Torvalds	f7dd3b1734	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "This is the last functional update from the tip tree for 4.10. It got delayed due to a newly reported and anlyzed variant of BIOS bug and the resulting wreckage: - Seperation of TSC being marked realiable and the fact that the platform provides the TSC frequency via CPUID/MSRs and making use for it for GOLDMONT. - TSC adjust MSR validation and sanitizing: The TSC adjust MSR contains the offset to the hardware counter. The sum of the adjust MSR and the counter is the TSC value which is read via RDTSC. On at least two machines from different vendors the BIOS sets the TSC adjust MSR to negative values. This happens on cold and warm boot. While on cold boot the offset is a few milliseconds, on warm boot it basically compensates the power on time of the system. The BIOSes are not even using the adjust MSR to set all CPUs in the package to the same offset. The offsets are different which renders the TSC unusable, What's worse is that the TSC deadline timer has a HW feature^Wbug. It malfunctions when the TSC adjust value is negative or greater equal 0x80000000 resulting in silent boot failures, hard lockups or non firing timers. This looks like some hardware internal 32/64bit issue with a sign extension problem. Intel has been silent so far on the issue. The update contains sanity checks and keeps the adjust register within working limits and in sync on the package. As it looks like this disease is spreading via BIOS crapware, we need to address this urgently as the boot failures are hard to debug for users" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Limit the adjust value further x86/tsc: Annotate printouts as firmware bug x86/tsc: Force TSC_ADJUST register to value >= zero x86/tsc: Validate TSC_ADJUST after resume x86/tsc: Validate cpumask pointer before accessing it x86/tsc: Fix broken CONFIG_X86_TSC=n build x86/tsc: Try to adjust TSC if sync test fails x86/tsc: Prepare warp test for TSC adjustment x86/tsc: Move sync cleanup to a safe place x86/tsc: Sync test only for the first cpu in a package x86/tsc: Verify TSC_ADJUST from idle x86/tsc: Store and check TSC ADJUST MSR x86/tsc: Detect random warps x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() x86/tsc: Finalize the split of the TSC_RELIABLE flag x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable x86/tsc: Mark TSC frequency determined by CPUID as known x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag	2016-12-18 13:59:10 -08:00
Linus Torvalds	1bbb05f520	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes and cleanups from Thomas Gleixner: "This set of updates contains: - Robustification for the logical package managment. Cures the AMD and virtualization issues. - Put the correct start_cpu() return address on the stack of the idle task. - Fixups for the fallout of the nodeid <-> cpuid persistent mapping modifciations - Move the x86/MPX specific mm_struct member to the arch specific mm_context where it belongs - Cleanups for C89 struct initializers and useless function arguments" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/floppy: Use designated initializers x86/mpx: Move bd_addr to mm_context_t x86/mm: Drop unused argument 'removed' from sync_global_pgds() ACPI/NUMA: Do not map pxm to node when NUMA is turned off x86/acpi: Use proper macro for invalid node x86/smpboot: Prevent false positive out of bounds cpumask access warning x86/boot/64: Push correct start_cpu() return address x86/boot/64: Use 'push' instead of 'call' in start_cpu() x86/smpboot: Make logical package management more robust	2016-12-18 11:12:53 -08:00
Thomas Gleixner	8c9b9d87b8	x86/tsc: Limit the adjust value further Adjust value 0x80000000 and other values larger than that render the TSC deadline timer disfunctional. We have not yet any information about this from Intel, but experimentation clearly proves that this is a 32/64 bit and sign extension issue. If adjust values larger than that are actually required, which might be the case for physical CPU hotplug, then we need to disable the deadline timer on the affected package/CPUs and use the local APIC timer instead. That requires some surgery in the APIC setup code, so we just limit the ADJUST register value into the known to work range for now and revisit this when Intel comes forth with proper information. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de>	2016-12-18 16:37:04 +01:00
Thomas Gleixner	16588f6592	x86/tsc: Annotate printouts as firmware bug Make it more obvious that the BIOS is screwed up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de>	2016-12-18 16:35:15 +01:00
Kees Cook	ffc7dc8d83	x86/floppy: Use designated initializers Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161217213705.GA1248@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-18 09:25:38 +01:00
Linus Torvalds	41e0e24b45	Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - prototypes for x86 asm-exported symbols (Adam Borowski) and a warning about missing CRCs (Nick Piggin) - asm-exports fix for LTO (Nicolas Pitre) - thin archives improvements (Nick Piggin) - linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick Piggin) - genksyms support for __builtin_va_list keyword - misc minor fixes * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: x86/kbuild: enable modversions for symbols exported from asm kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case scripts/kallsyms: remove last remnants of --page-offset option make use of make variable CURDIR instead of calling pwd kbuild: cmd_export_list: tighten the sed script kbuild: minor improvement for thin archives build kbuild: modpost warn if export version crc is missing kbuild: keep data tables through dead code elimination kbuild: improve linker compatibility with lib-ksyms.o build genksyms: Regenerate parser kbuild/genksyms: handle va_list type kbuild: thin archives for multi-y targets kbuild: kallsyms allow 3-pass generation if symbols size has changed	2016-12-17 16:24:13 -08:00
Mark Rutland	cb02de96ec	x86/mpx: Move bd_addr to mm_context_t Currently bd_addr lives in mm_struct, which is otherwise architecture independent. Architecture-specific data is supposed to live within mm_context_t (itself contained in mm_struct). Other x86-specific context like the pkey accounting data lives in mm_context_t, and there's no readon the MPX data can't also live there. So as to keep the arch-specific data togather, and to set a good example for others, this patch moves bd_addr into x86's mm_context_t. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1481892055-24596-1-git-send-email-mark.rutland@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-17 12:29:56 +01:00
Vadim Pasternak	6613d18e90	platform/x86: mlx-platform: Move module from arch/x86 Since mlx-platform is not an architectural driver, it is moved out of arch/x86/platform to drivers/platform/x86. Relevant Makefile and Kconfig are updated. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>	2016-12-16 23:30:24 +02:00
Linus Torvalds	de399813b5	powerpc updates for 4.10 Highlights include: - Support for the kexec_file_load() syscall, which is a prereq for secure and trusted boot. - Prevent kernel execution of userspace on P9 Radix (similar to SMEP/PXN). - Sort the exception tables at build time, to save time at boot, and store them as relative offsets to save space in the kernel image & memory. - Allow building the kernel with thin archives, which should allow us to build an allyesconfig once some other fixes land. - Build fixes to allow us to correctly rebuild when changing the kernel endian from big to little or vice versa. - Plumbing so that we can avoid doing a full mm TLB flush on P9 Radix. - Initial stack protector support (-fstack-protector). - Support for dumping the radix (aka. Linux) and hash page tables via debugfs. - Fix an oops in cxl coredump generation when cxl_get_fd() is used. - Freescale updates from Scott: "Highlights include 8xx hugepage support, qbman fixes/cleanup, device tree updates, and some misc cleanup." - Many and varied fixes and minor enhancements as always. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz, Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin, Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYU4YSAAoJEFHr6jzI4aWAC4gQALtIAqqPon0Cd5b/FVVcMbW7 mMqB2b/0FGEl5GoRTzGUDaQqElilm6AEVfHO86C7DFji/a6olneFfw87iz+mtWuZ JvrNq68ZiSnoeszdUy4MgtXFLb5sTzNMev4skaHfjI9E5CepWBoR0zH4G+kNVnd5 WSgudv8Cq4Px+MEuTOigt3QYjHzZ3cw/XNOOm9c+oGj+PDW4O9UItVI+S1WLoey4 rAB2nRcLMDPuwfRQC9XsF3zEbkv4h1dEXo/EBRuRpcF+0lLTzFw1lv1WE8OxlUmS kAXbty3dIytBfSbtJT0c0Ps6sfQ4HFhu6ZV2fjnxNTz2KDkBIN7LBYHmBYiqY9oZ 9zvbUWtfiTu5ocfRtTq7rC/Hcj4Kbr9S9F/FvXR0WyDsKgu4xxAovqC3gcn6YjYK Rr1tcCI4nUzyhVJVmd+OEhUvc5JbFy9aGage+YeOyejfvvSbXIunaxWlPjoDkvim Vjl+UKU8gw51XFssqY5ZBi/HNlMFKYedLpMFp/fItnLglhj50V0eFWkpDgdSCYom vo9ifPLZx8n8m8De3H7TV4E0F4gCHcTeqZdu7tW9AAUVM6iLJcDLm3asGmtNh21t snOHNOJ5QSIno6ezUUg29T6VBjbPh46fdJJSlIZrEe8OzLZ1haGyttf0tD00PQvY Z2W/m3gxafnOeGgBqvyv =xOzf -----END PGP SIGNATURE----- Merge tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Support for the kexec_file_load() syscall, which is a prereq for secure and trusted boot. - Prevent kernel execution of userspace on P9 Radix (similar to SMEP/PXN). - Sort the exception tables at build time, to save time at boot, and store them as relative offsets to save space in the kernel image & memory. - Allow building the kernel with thin archives, which should allow us to build an allyesconfig once some other fixes land. - Build fixes to allow us to correctly rebuild when changing the kernel endian from big to little or vice versa. - Plumbing so that we can avoid doing a full mm TLB flush on P9 Radix. - Initial stack protector support (-fstack-protector). - Support for dumping the radix (aka. Linux) and hash page tables via debugfs. - Fix an oops in cxl coredump generation when cxl_get_fd() is used. - Freescale updates from Scott: "Highlights include 8xx hugepage support, qbman fixes/cleanup, device tree updates, and some misc cleanup." - Many and varied fixes and minor enhancements as always. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz, Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin, Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain" [ And thanks to Michael, who took time off from a new baby to get this pull request done. - Linus ] * tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits) powerpc/fsl/dts: add FMan node for t1042d4rdb powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb powerpc/fsl/dts: add QMan and BMan nodes on t1024 powerpc/fsl/dts: add QMan and BMan nodes on t1023 soc/fsl/qman: test: use DEFINE_SPINLOCK() powerpc/fsl-lbc: use DEFINE_SPINLOCK() powerpc/8xx: Implement support of hugepages powerpc: get hugetlbpage handling more generic powerpc: port 64 bits pgtable_cache to 32 bits powerpc/boot: Request no dynamic linker for boot wrapper soc/fsl/bman: Use resource_size instead of computation soc/fsl/qe: use builtin_platform_driver powerpc/fsl_pmc: use builtin_platform_driver powerpc/83xx/suspend: use builtin_platform_driver powerpc/ftrace: Fix the comments for ftrace_modify_code powerpc/perf: macros for power9 format encoding powerpc/perf: power9 raw event format encoding powerpc/perf: update attribute_group data structure powerpc/perf: factor out the event format field powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown ...	2016-12-16 09:26:42 -08:00
Paolo Bonzini	3f5ad8be37	KVM: hyperv: fix locking of struct kvm_hv fields Introduce a new mutex to avoid an AB-BA deadlock between kvm->lock and vcpu->mutex. Protect accesses in kvm_hv_setup_tsc_page too, as suggested by Roman. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-16 17:53:38 +01:00
Linus Torvalds	179a7ba680	This release has a few updates: o STM can hook into the function tracer o Function filtering now supports more advance glob matching o Ftrace selftests updates and added tests o Softirq tag in traces now show only softirqs o ARM nop added to non traced locations at compile time o New trace_marker_raw file that allows for binary input o Optimizations to the ring buffer o Removal of kmap in trace_marker o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file o Other various fixes and clean ups Note, there are two patches marked for stable. These were discovered near the end of the 4.9 rc release cycle. By the time I had them tested it was just a matter of days before 4.9 would be released, and I figured I would just submit them in the merge window. They are old bugs and not critical. Nothing non-root could abuse. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L 2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00 1VW+DHRpRZfElsCcya6S6P4bs5Y= =MGZ/ -----END PGP SIGNATURE----- Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release has a few updates: - STM can hook into the function tracer - Function filtering now supports more advance glob matching - Ftrace selftests updates and added tests - Softirq tag in traces now show only softirqs - ARM nop added to non traced locations at compile time - New trace_marker_raw file that allows for binary input - Optimizations to the ring buffer - Removal of kmap in trace_marker - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file - Other various fixes and clean ups" * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits) selftests: ftrace: Shift down default message verbosity kprobes/trace: Fix kprobe selftest for newer gcc tracing/kprobes: Add a helper method to return number of probe hits tracing/rb: Init the CPU mask on allocation tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too fgraph: Handle a case where a tracer ignores set_graph_notrace tracing: Replace kmap with copy_from_user() in trace_marker writing ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it tracing: Allow benchmark to be enabled at early_initcall() tracing: Have system enable return error if one of the events fail tracing: Do not start benchmark on boot up tracing: Have the reg function allow to fail ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline ring-buffer: Froce rb_update_write_stamp() to be inlined ring-buffer: Force inline of hotpath helper functions tracing: Make __buffer_unlock_commit() always_inline tracing: Make tracepoint_printk a static_key ring-buffer: Always inline rb_event_data() ring-buffer: Make rb_reserve_next_event() always inlined ...	2016-12-15 13:49:34 -08:00
Yi Sun	83781d180b	KVM: x86: Expose Intel AVX512IFMA/AVX512VBMI/SHA features to guest. Expose AVX512IFMA/AVX512VBMI/SHA features to guest. AVX512 spec can be found at: https://software.intel.com/sites/default/files/managed/26/40/319433-026.pdf SHA spec can be found at: https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf This patch depends on below patch. http://marc.info/?l=linux-kernel&m=147932800828178&w=2 Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-15 15:02:44 +01:00
GanShun	37b9a671f3	kvm: nVMX: Correct a VMX instruction error code for VMPTRLD When the operand passed to VMPTRLD matches the address of the VMXON region, the VMX instruction error code should be VMXERR_VMPTRLD_VMXON_POINTER rather than VMXERR_VMCLEAR_VMXON_POINTER. Signed-off-by: GanShun <ganshun@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-15 15:02:44 +01:00
Kirill A. Shutemov	5372e155a2	x86/mm: Drop unused argument 'removed' from sync_global_pgds() Since commit `af2cf278ef` ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()") there are no callers of sync_global_pgds() which set the 'removed' argument to 1. Remove the argument and the related conditionals in the function. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20161214234403.137556-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 12:46:07 +01:00
Thomas Gleixner	5bae156241	x86/tsc: Force TSC_ADJUST register to value >= zero Roland reported that his DELL T5810 sports a value add BIOS which completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with random negative TSC_ADJUST values, different on all CPUs. That renders the TSC useless because the sycnchronization check fails. Roland tested the new TSC_ADJUST mechanism. While it manages to readjust the TSCs he needs to disable the TSC deadline timer, otherwise the machine just stops booting. Deeper investigation unearthed that the TSC deadline timer is sensitive to the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an interrupt storm caused by the TSC deadline timer. This does not make any sense and it's hard to imagine what kind of hardware wreckage is behind that misfeature, but it's reliably reproducible on other systems which have TSC_ADJUST and TSC deadline timer. While it would be understandable that a big enough negative value which moves the resulting TSC readout into the negative space could have the described effect, this happens even with a adjust value of -1, which keeps the TSC readout definitely in the positive space. The compare register for the TSC deadline timer is set to a positive value larger than the TSC, but despite not having reached the deadline the interrupt is raised immediately. If this happens on the boot CPU, then the machine dies silently because this setup happens before the NMI watchdog is armed. Further experiments showed that any other adjustment of TSC_ADJUST works as expected as long as it stays in the positive range. The direction of the adjustment has no influence either. See the lkml link for further analysis. Yet another proof for the theory that timers are designed by janitors and the underlying (obviously undocumented) mechanisms which allow BIOSes to wreckage them are considered a feature. Well done Intel - NOT! To address this wreckage add the following sanity measures: - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0 - If the TSC_ADJUST value on any cpu is negative, set it to 0 - Prevent the cross package synchronization mechanism from setting negative TSC_ADJUST values. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:44:29 +01:00
Thomas Gleixner	6a36958317	x86/tsc: Validate TSC_ADJUST after resume Some 'feature' BIOSes fiddle with the TSC_ADJUST register during suspend/resume which renders the TSC unusable. Add sanity checks into the resume path and restore the original value if it was adjusted. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:44:29 +01:00
Boris Ostrovsky	aec03f89e9	ACPI/NUMA: Do not map pxm to node when NUMA is turned off acpi_map_pxm_to_node() unconditially maps nodes even when NUMA is turned off. So acpi_get_node() might return a node > 0, which is fatal when NUMA is disabled as the rest of the kernel assumes that only node 0 exists. Expose numa_off to the acpi code and return NUMA_NO_NODE when it's set. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: linux-ia64@vger.kernel.org Cc: catalin.marinas@arm.com Cc: rjw@rjwysocki.net Cc: will.deacon@arm.com Cc: linux-acpi@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1481602709-18260-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:32:32 +01:00
Boris Ostrovsky	4370a3ef39	x86/acpi: Use proper macro for invalid node Use NUMA_NO_NODE instead of -1. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: pavel@ucw.cz Link: http://lkml.kernel.org/r/1481570993-13941-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:32:32 +01:00
Thomas Gleixner	427d77a323	x86/smpboot: Prevent false positive out of bounds cpumask access warning prefill_possible_map() reinitializes the cpu_possible_map by setting the possible cpu bits and clearing all other bits up to NR_CPUS. This is technically always correct because cpu_possible_map is statically allocated and sized NR_CPUS. With CPUMASK_OFFSTACK and DEBUG_PER_CPU_MAPS enabled the bounds check of cpu masks happens on nr_cpu_ids. nr_cpu_ids is initialized to NR_CPUS and only limited after the set/clear bit loops have been executed. But if the system was booted with "nr_cpus=N" on the command line, where N is < NR_CPUS then nr_cpu_ids is limited in the parameter parsing function before prefill_possible_map() is invoked. As a consequence the cpumask bounds check triggers when clearing the bits past nr_cpu_ids. Add a helper which allows to reset cpu_possible_map w/o the bounds check and then set only the possible bits which are well inside bounds. Reported-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: 0x7f454c46@gmail.com Cc: Jan Beulich <JBeulich@novell.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612131836050.3415@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:32:31 +01:00
Linus Torvalds	a57cb1c1d7	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - a few misc things - kexec updates - DMA-mapping updates to better support networking DMA operations - IPC updates - various MM changes to improve DAX fault handling - lots of radix-tree changes, mainly to the test suite. All leading up to reimplementing the IDA/IDR code to be a wrapper layer over the radix-tree. However the final trigger-pulling patch is held off for 4.11. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) radix tree test suite: delete unused rcupdate.c radix tree test suite: add new tag check radix-tree: ensure counts are initialised radix tree test suite: cache recently freed objects radix tree test suite: add some more functionality idr: reduce the number of bits per level from 8 to 6 rxrpc: abstract away knowledge of IDR internals tpm: use idr_find(), not idr_find_slowpath() idr: add ida_is_empty radix tree test suite: check multiorder iteration radix-tree: fix replacement for multiorder entries radix-tree: add radix_tree_split_preload() radix-tree: add radix_tree_split radix-tree: add radix_tree_join radix-tree: delete radix_tree_range_tag_if_tagged() radix-tree: delete radix_tree_locate_item() radix-tree: improve multiorder iterators btrfs: fix race in btrfs_free_dummy_fs_info() radix-tree: improve dump output radix-tree: make radix_tree_find_next_bit more useful ...	2016-12-14 17:25:18 -08:00
Jan Kara	1a29d85eb0	mm: use vmf->address instead of of vmf->virtual_address Every single user of vmf->virtual_address typed that entry to unsigned long before doing anything with it so the type of virtual_address does not really provide us any additional safety. Just use masked vmf->address which already has the appropriate type. Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:09 -08:00
Baoquan He	401721ecd1	kexec: export the value of phys_base instead of symbol address Currently in x86_64, the symbol address of phys_base is exported to vmcoreinfo. Dave Anderson complained this is really useless for his Crash implementation. Because in user-space utility Crash and Makedumpfile which exported vmcore information is mainly used for, value of phys_base is needed to covert virtual address of exported kernel symbol to physical address. Especially init_level4_pgt, if we want to access and go over the page table to look up a PA corresponding to VA, firstly we need calculate page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base; Now in Crash and Makedumpfile, we have to analyze the vmcore elf program header to get value of phys_base. As Dave said, it would be preferable if it were readily availabl in vmcoreinfo rather than depending upon the PT_LOAD semantics. Hence in this patch change to export the value of phys_base instead of its virtual address. And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64 only, should be moved into arch dependent function arch_crash_save_vmcoreinfo. Do the moving in this patch. Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:07 -08:00
Baoquan He	69f5838479	Revert "kdump, vmcoreinfo: report memory sections virtual addresses" This reverts commit `0549a3c02e` ("kdump, vmcoreinfo: report memory sections virtual addresses"). Commit `0549a3c02e` tells the userspace utility makedumpfile the randomized base address of these memmory sections when mm kaslr is enabled. However the following patch "kexec: export the value of phys_base instead of symbol address" makes makedumpfile not need these addresses any more. Besides we should use VMCOREINFO_NUMBER to export the value of the variable so that we can use the existing number_table mechanism of Makedumpfile to fetch it. So revert it now. If needed we can add it later. http://lists.infradead.org/pipermail/kexec/2016-October/017540.html Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:07 -08:00
Linus Torvalds	0f1d6dfe03	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.10: API: - add skcipher walk interface - add asynchronous compression (acomp) interface - fix algif_aed AIO handling of zero buffer Algorithms: - fix unaligned access in poly1305 - fix DRBG output to large buffers Drivers: - add support for iMX6UL to caam - fix givenc descriptors (used by IPsec) in caam - accelerated SHA256/SHA512 for ARM64 from OpenSSL - add SSE CRCT10DIF and CRC32 to ARM/ARM64 - add AEAD support to Chelsio chcr - add Armada 8K support to omap-rng" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (148 commits) crypto: testmgr - fix overlap in chunked tests again crypto: arm/crc32 - accelerated support based on x86 SSE implementation crypto: arm64/crc32 - accelerated support based on x86 SSE implementation crypto: arm/crct10dif - port x86 SSE implementation to ARM crypto: arm64/crct10dif - port x86 SSE implementation to arm64 crypto: testmgr - add/enhance test cases for CRC-T10DIF crypto: testmgr - avoid overlap in chunked tests crypto: chcr - checking for IS_ERR() instead of NULL crypto: caam - check caam_emi_slow instead of re-lookup platform crypto: algif_aead - fix AIO handling of zero buffer crypto: aes-ce - Make aes_simd_algs static crypto: algif_skcipher - set error code when kcalloc fails crypto: caam - make aamalg_desc a proper module crypto: caam - pass key buffers with typesafe pointers crypto: arm64/aes-ce-ccm - Fix AEAD decryption length MAINTAINERS: add crypto headers to crypto entry crypt: doc - remove misleading mention of async API crypto: doc - fix header file name crypto: api - fix comment typo crypto: skcipher - Add separate walker for AEAD decryption ..	2016-12-14 13:31:29 -08:00
Linus Torvalds	a9042defa2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: NTB: correct ntb_spad_count comment typo misc: ibmasm: fix typo in error message Remove references to dead make variable LINUX_INCLUDE Remove last traces of ikconfig.h treewide: Fix printk() message errors Documentation/device-mapper: s/getsize/getsz/	2016-12-14 11:12:25 -08:00
Paul Bolle	846221cfb8	Remove references to dead make variable LINUX_INCLUDE Commit `4fd06960f1` ("Use the new x86 setup code for i386") introduced a reference to the make variable LINUX_INCLUDE. That reference got moved around a bit and copied twice and now there are three references to it. There has never been a definition of that variable. (Presumably that is because it started out as a mistyped reference to LINUXINCLUDE.) So this reference has always been an empty string. Let's remove it before it spreads any further. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-12-14 10:54:28 +01:00
Josh Poimboeuf	31dcfec11f	x86/boot/64: Push correct start_cpu() return address start_cpu() pushes a text address on the stack so that stack traces from idle tasks will show start_cpu() at the end. But it currently shows the wrong function offset. It's more correct to show the address immediately after the 'lretq' instruction. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2cadd9f16c77da7ee7957bfc5e1c67928c23ca48.1481685203.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-14 08:48:05 +01:00
Josh Poimboeuf	ec2d86a9b6	x86/boot/64: Use 'push' instead of 'call' in start_cpu() start_cpu() pushes a text address on the stack so that stack traces from idle tasks will show start_cpu() at the end. But it uses a call instruction to do that, which is rather obtuse. Use a straightforward push instead. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4d8a1952759721d42d1e62ba9e4a7e3ac5df8574.1481685203.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-14 08:48:05 +01:00
Linus Torvalds	aa3ecf388a	xen: features and fixes for 4.10 rc0 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJYT5HMAAoJELDendYovxMvhNQH/1g/3ahM4JKN8Z0SbjKBEdQm yj2xOj6cE3l6wMSUblKjZD2DLLhpmcHT/E97Xro/lZQEfQJoMXXWWDFowMU/P1LA mJxb7Fzq5Wr+6eGSAlIQB270MrpNi/luf+CWHMwVA3V7R3KRXwonOdGQSkISIzCd tgIydEA3a9r2+HgeIBpZFZ4GcSrJQU75krMyl2tjD1C+jeYVd+zdoj2OnDsZQDZQ hDWApMpNbpSBAn7JtSSdXWSTBsGH0lUECebeYPhPQ2sX2P6Y8+UCGwA7i6FFdbTa agXfVSdRz8dCe3k19VcKDAw6nK9BTTMnEeEHmkmygIh6wuHPP44CzigTXIbJoXI= =zjfm -----END PGP SIGNATURE----- Merge tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Xen features and fixes for 4.10 These are some fixes, a move of some arm related headers to share them between arm and arm64 and a series introducing a helper to make code more readable. The most notable change is David stepping down as maintainer of the Xen hypervisor interface. This results in me sending you the pull requests for Xen related code from now on" * tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits) xen/balloon: Only mark a page as managed when it is released xenbus: fix deadlock on writes to /proc/xen/xenbus xen/scsifront: don't request a slot on the ring until request is ready xen/x86: Increase xen_e820_map to E820_X_MAX possible entries x86: Make E820_X_MAX unconditionally larger than E820MAX xen/pci: Bubble up error and fix description. xen: xenbus: set error code on failure xen: set error code on failures arm/xen: Use alloc_percpu rather than __alloc_percpu arm/arm64: xen: Move shared architecture headers to include/xen/arm xen/events: use xen_vcpu_id mapping for EVTCHNOP_status xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing xen-scsifront: Add a missing call to kfree MAINTAINERS: update XEN HYPERVISOR INTERFACE xenfs: Use proc_create_mount_point() to create /proc/xen xen-platform: use builtin_pci_driver xen-netback: fix error handling output xen: make use of xenbus_read_unsigned() in xenbus xen: make use of xenbus_read_unsigned() in xen-pciback xen: make use of xenbus_read_unsigned() in xen-fbfront ...	2016-12-13 16:07:55 -08:00
Linus Torvalds	b5cab0da75	Merge branch 'stable/for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb updates from Konrad Rzeszutek Wilk: - minor fixes (rate limiting), remove certain functions - support for DMA_ATTR_SKIP_CPU_SYNC which is an optimization in the DMA API * 'stable/for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Minor fix-ups for DMA_ATTR_SKIP_CPU_SYNC support swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC swiotlb-xen: Enforce return of DMA_ERROR_CODE in mapping function swiotlb: Drop unused functions swiotlb_map_sg and swiotlb_unmap_sg swiotlb: Rate-limit printing when running out of SW-IOMMU space	2016-12-13 15:52:23 -08:00
Linus Torvalds	93173b5bf2	Small release, the most interesting stuff is x86 nested virt improvements. x86: userspace can now hide nested VMX features from guests; nested VMX can now run Hyper-V in a guest; support for AVX512_4VNNIW and AVX512_FMAPS in KVM; infrastructure support for virtual Intel GPUs. PPC: support for KVM guests on POWER9; improved support for interrupt polling; optimizations and cleanups. s390: two small optimizations, more stuff is in flight and will be in 4.11. ARM: support for the GICv3 ITS on 32bit platforms. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJYTkP0FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D lZIH/iT1n9OQXcuTpYYnQhuCenzI3GZZOIMTbCvK2i5bo0FIJKxVn0EiAAqZSXvO nO185FqjOgLuJ1AD1kJuxzye5suuQp4HIPWWgNHcexLuy43WXWKZe0IQlJ4zM2Xf u31HakpFmVDD+Cd1qN3yDXtDrRQ79/xQn2kw7CWb8olp+pVqwbceN3IVie9QYU+3 gCz0qU6As0aQIwq2PyalOe03sO10PZlm4XhsoXgWPG7P18BMRhNLTDqhLhu7A/ry qElVMANT7LSNLzlwNdpzdK8rVuKxETwjlc1UP8vSuhrwad4zM2JJ1Exk26nC2NaG D0j4tRSyGFIdx6lukZm7HmiSHZ0= =mkoB -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Small release, the most interesting stuff is x86 nested virt improvements. x86: - userspace can now hide nested VMX features from guests - nested VMX can now run Hyper-V in a guest - support for AVX512_4VNNIW and AVX512_FMAPS in KVM - infrastructure support for virtual Intel GPUs. PPC: - support for KVM guests on POWER9 - improved support for interrupt polling - optimizations and cleanups. s390: - two small optimizations, more stuff is in flight and will be in 4.11. ARM: - support for the GICv3 ITS on 32bit platforms" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest KVM: arm/arm64: timer: Check for properly initialized timer on init KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs KVM: x86: Handle the kthread worker using the new API KVM: nVMX: invvpid handling improvements KVM: nVMX: check host CR3 on vmentry and vmexit KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry KVM: nVMX: propagate errors from prepare_vmcs02 KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM: nVMX: support restore of VMX capability MSRs KVM: nVMX: generate non-true VMX MSRs based on true versions KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. KVM: x86: Add kvm_skip_emulated_instruction and use it. KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 KVM: VMX: Reorder some skip_emulated_instruction calls KVM: x86: Add a return value to kvm_emulate_cpuid KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h ...	2016-12-13 15:47:02 -08:00
Adam Borowski	334bb77387	x86/kbuild: enable modversions for symbols exported from asm Commit `4efca4ed` ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds modversion support for symbols exported from asm files. Architectures must include C-style declarations for those symbols in asm/asm-prototypes.h in order for them to be versioned. Add these declarations for x86, and an architecture-independent file that can be used for common symbols. With `f27c2f6` reverting `8ab2ae6` ("default exported asm symbols to zero") we produce a scary warning on x86, this commit fixes that. Signed-off-by: Adam Borowski <kilobyte@angband.pl> Tested-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Peter Wu <peter@lekensteyn.nl> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Michal Marek <mmarek@suse.com>	2016-12-14 00:35:35 +01:00
Linus Torvalds	c11a6cfb01	Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Mostly patches to initialize workqueue subsystem earlier and get rid of keventd_up(). The patches were headed for the last merge cycle but got delayed due to a bug found late minute, which is fixed now. Also, to help debugging, destroy_workqueue() is more chatty now on a sanity check failure." * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: move wq_numa_init() to workqueue_init() workqueue: remove keventd_up() debugobj, workqueue: remove keventd_up() usage slab, workqueue: remove keventd_up() usage power, workqueue: remove keventd_up() usage tty, workqueue: remove keventd_up() usage mce, workqueue: remove keventd_up() usage workqueue: make workqueue available early during boot workqueue: dump workqueue state on sanity check failures in destroy_workqueue()	2016-12-13 12:59:57 -08:00
Linus Torvalds	098c30557a	Driver core patches for 4.10-rc1 Here's the new driver core patches for 4.10-rc1. Big thing here is the nice addition of "functional dependencies" to the driver core. The idea has been talked about for a very long time, great job to Rafael for stepping up and implementing it. It's been tested for longer than the 4.9-rc1 date, we held off on merging it earlier in order to feel more comfortable about it. Other than that, it's just a handful of small other patches, some good cleanups to the mess that is the firmware class code, and we have a test driver for the deferred probe logic. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWFAvPQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ym3NgCgmhFeWEkp9SDt17YGGavmnzQUlBQAoJlUipJp PHeQkq15ZWw3wWC9FEvM =91M1 -----END PGP SIGNATURE----- Merge tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here's the new driver core patches for 4.10-rc1. Big thing here is the nice addition of "functional dependencies" to the driver core. The idea has been talked about for a very long time, great job to Rafael for stepping up and implementing it. It's been tested for longer than the 4.9-rc1 date, we held off on merging it earlier in order to feel more comfortable about it. Other than that, it's just a handful of small other patches, some good cleanups to the mess that is the firmware class code, and we have a test driver for the deferred probe logic. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits) firmware: Correct handling of fw_state_wait() return value driver core: Silence device links sphinx warning firmware: remove warning at documentation generation time drivers: base: dma-mapping: Fix typo in dmam_alloc_non_coherent comments driver core: test_async: fix up typo found by 0-day firmware: move fw_state_is_done() into UHM section firmware: do not use fw_lock for fw_state protection firmware: drop bit ops in favor of simple state machine firmware: refactor loading status firmware: fix usermode helper fallback loading driver core: firmware_class: convert to use class_groups driver core: devcoredump: convert to use class_groups driver core: class: add class_groups support kernfs: Declare two local data structures static driver-core: fix platform_no_drv_owner.cocci warnings drivers/base/memory.c: Remove unused 'first_page' variable driver core: add CLASS_ATTR_WO() drivers: base: cacheinfo: support DT overrides for cache properties drivers: base: cacheinfo: add pr_fmt logging drivers: base: cacheinfo: fix boot error message when acpi is enabled ...	2016-12-13 11:42:18 -08:00
Linus Torvalds	5266e70335	TTY/Serial patches for 4.10-rc1 Here's the tty/serial patchset for 4.10-rc1. It's been a quiet kernel cycle for this subsystem, just a small number of changes. A few new serial drivers, and some cleanups to the old vgacon logic, and other minor serial driver changes as well. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWFAwDQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylcngCgko5+aPLnHENLNIaHhHlfdMbhy+EAn2H8wkzY bEf+BG4CJDb6nZWERcUQ =STpQ -----END PGP SIGNATURE----- Merge tag 'tty-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here's the tty/serial patchset for 4.10-rc1. It's been a quiet kernel cycle for this subsystem, just a small number of changes. A few new serial drivers, and some cleanups to the old vgacon logic, and other minor serial driver changes as well. All of these have been in linux-next for a while with no reported issues" * tag 'tty-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (54 commits) serial: 8250_mid fix calltrace when hotplug 8250 serial controller console: Move userspace I/O out of console_lock to fix lockdep warning tty: nozomi: avoid sprintf buffer overflow serial: 8250_pci: Detach low-level driver during PCI error recovery serial: core: don't check port twice in a row mxs-auart: count FIFO overrun errors serial: 8250_dw: Add support for IrDA SIR mode serial: 8250: Expose set_ldisc function serial: 8250: Add IrDA to UART capabilities serial: 8250_dma: power off device after TX is done serial: 8250_port: export serial8250_rpm_{get\|put}_tx() serial: sunsu: Free memory when probe fails serial: sunhv: Free memory when remove() is called vt: fix Scroll Lock LED trigger name tty: typo in comments in drivers/tty/vt/keyboard.c tty: amba-pl011: Add earlycon support for SBSA UART tty: nozomi: use permission-specific DEVICE_ATTR variants tty: serial: Make the STM32 serial port depend on it's arch serial: ifx6x60: Free memory when probe fails serial: ioc4_serial: Free memory when kzalloc fails during probe ...	2016-12-13 11:18:24 -08:00
Linus Torvalds	a67485d4bf	ACPI material for v4.10-rc1 - ACPICA update including upstream revision 20160930 and several commits beyond it (Bob Moore, Lv Zheng). - Initial support for ACPI APEI on ARM64 (Tomasz Nowicki). - New document describing the handling of _OSI and _REV in Linux (Len Brown). - New document describing the usage rules for _DSD properties (Rafael Wysocki). - Update of the ACPI properties-parsing code to reflect recent changes in the (external) documentation it is based on (Rafael Wysocki). - Updates of the ACPI LPSS and ACPI APD SoC drivers for additional hardware support (Andy Shevchenko, Nehal Shah). - New blacklist entries for _REV and video handling (Alex Hung, Hans de Goede, Michael Pobega). - ACPI battery driver fix to fall back to _BIF if _BIX fails (Dave Lambley). - NMI notifications handling fix for APEI (Prarit Bhargava). - Error code path fix for the ACPI CPPC library (Dan Carpenter). - Assorted cleanups (Andy Shevchenko, Longpeng Mike). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYTx6WAAoJEILEb/54YlRxFksP/0oZUm4dxHJFT6ED1ogBLid6 o+T7PA46i7VpyyT64tq3YcBqccFAYq9jHvK0FasK6WA3GKF+fj8cc5FsFM0lfdlw pMFfkdVTVajzFAM1QcxxeNr+TNuAGhx1ENf3us4xOP1Nt++kESBMwA112emoqEJL kzb2M3sCWyHNUxLtbis5CpYXLNFifFf8PP+LgmfRk0u2EYYW2nOShd6A7w5USmDh cYsfKcrBHs+nmNh6uZrQbGg+6zTcQT7XORyqcIsgT2JoWooVfwOrBjgLymFvuLUc ShZ1dHqR+RwIu1ZTIWImpDcBz/dALGIDuGAxad1YRhx7N7Eg4jmmht3hASYKWabG lqU4PWMBERonIW0MCFJ7Pg8+Ny7+kAF/rZjDyw09P2DGGQjsG4aJGAdoG5Dtjidc 1W+OAJC6SY494U+r/kHnsR0/JWTX24H7sVP5IBCFxHkByhe5daSngtknrYzIV4kE dV4h8JJATrSyvdgwAEHmVSpTCR0tmFvsc5J87Mg/g/b6NM3tPVxb70eE9tRr4xw1 oW0X9YI9M8NFnRP6RbCVg6uO06xDD2SMfb0e8fiiAp+/eGGyjp1PVR9SreuUdqaJ XJwntAWxKOXBPXMRuCeOuXBUNe5mT+WkMF6AuQyfBoM7rIhkqJb328buVAsyAKBx 74gsPkkeA6/Z1n7HWUFn =Nzrb -----END PGP SIGNATURE----- Merge tag 'acpi-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "The ACPICA code in the kernel gets updated as usual (included is upstream revision 20160930 and a few commits from the next one, with the rest waiting for an issue discovered in linux-next to be addressed) which brings in a couple of fixes and cleanups On top of that initial support for APEI on ARM64 is added, two new pieces of documentation are introduced, the properties-parsing code is updated to follow changes in the (external) documentation it is based on and there are a few updates of SoC drivers, some new blacklist entries, plus some assorted fixes and cleanups Specifics: - ACPICA update including upstream revision 20160930 and several commits beyond it (Bob Moore, Lv Zheng) - Initial support for ACPI APEI on ARM64 (Tomasz Nowicki) - New document describing the handling of _OSI and _REV in Linux (Len Brown) - New document describing the usage rules for _DSD properties (Rafael Wysocki) - Update of the ACPI properties-parsing code to reflect recent changes in the (external) documentation it is based on (Rafael Wysocki) - Updates of the ACPI LPSS and ACPI APD SoC drivers for additional hardware support (Andy Shevchenko, Nehal Shah) - New blacklist entries for _REV and video handling (Alex Hung, Hans de Goede, Michael Pobega) - ACPI battery driver fix to fall back to _BIF if _BIX fails (Dave Lambley) - NMI notifications handling fix for APEI (Prarit Bhargava) - Error code path fix for the ACPI CPPC library (Dan Carpenter) - Assorted cleanups (Andy Shevchenko, Longpeng Mike)" * tag 'acpi-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (31 commits) ACPICA: Utilities: Add new decode function for parser values ACPI / osl: Refactor acpi_os_get_root_pointer() to drop 'else':s ACPI / osl: Propagate actual error code for kstrtoul() ACPI / property: Document usage rules for _DSD properties ACPI: Document _OSI and _REV for Linux BIOS writers ACPI / APEI / ARM64: APEI initial support for ARM64 ACPI / APEI: Fix NMI notification handling ACPICA: Tables: Add an error message complaining driver bugs ACPICA: Tables: Add acpi_tb_unload_table() ACPICA: Tables: Cleanup acpi_tb_install_and_load_table() ACPICA: Events: Fix acpi_ev_initialize_region() return value ACPICA: Back port of "ACPICA: Dispatcher: Tune interpreter lock around AcpiEvInitializeRegion()" ACPICA: Namespace: Add acpi_ns_handle_to_name() ACPI / CPPC: set an error code on probe error path ACPI / video: Add force_native quirk for HP Pavilion dv6 ACPI / video: Add force_native quirk for Dell XPS 17 L702X ACPI / property: Hierarchical properties support update ACPI / LPSS: enable hard LLP for DMA ACPI / battery: If _BIX fails, retry with _BIF ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h ..	2016-12-13 11:06:21 -08:00
Linus Torvalds	7b9dc3f75f	Power management material for v4.10-rc1 - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding for it (Markus Mayer). - Support for ARM Integrator/AP and Integrator/CP in the generic DT cpufreq driver and elimination of the old Integrator cpufreq driver (Linus Walleij). - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier, and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie, Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik). - cpufreq core fix to eliminate races that may lead to using inactive policy objects and related cleanups (Rafael Wysocki). - cpufreq schedutil governor update to make it use SCHED_FIFO kernel threads (instead of regular workqueues) for doing delayed work (to reduce the response latency in some cases) and related cleanups (Viresh Kumar). - New cpufreq sysfs attribute for resetting statistics (Markus Mayer). - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis, Viresh Kumar). - Support for using generic cpufreq governors in the intel_pstate driver (Rafael Wysocki). - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy Performance Preference/Energy Performance Bias) knobs in the intel_pstate driver (Srinivas Pandruvada). - New CPU ID for Knights Mill in intel_pstate (Piotr Luc). - intel_pstate driver modification to use the P-state selection algorithm based on CPU load on platforms with the system profile in the ACPI tables set to "mobile" (Srinivas Pandruvada). - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki, Srinivas Pandruvada). - cpufreq powernv driver updates including fast switching support (for the schedutil governor), fixes and cleanus (Akshay Adiga, Andrew Donnellan, Denis Kirjanov). - acpi-cpufreq driver rework to switch it over to the new CPU offline/online state machine (Sebastian Andrzej Siewior). - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth Prakash). - Idle injection rework (to make it use the regular idle path instead of a home-grown custom one) and related powerclamp thermal driver updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej Siewior). - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy Shevchenko, Piotr Luc). - intel_idle driver cleanups and switch over to using the new CPU offline/online state machine (Anna-Maria Gleixner, Sebastian Andrzej Siewior). - cpuidle DT driver update to support suspend-to-idle properly (Sudeep Holla). - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian, Rafael Wysocki). - Preliminary support for power domains including CPUs in the generic power domains (genpd) framework and related DT bindings (Lina Iyer). - Assorted fixes and cleanups in the generic power domains (genpd) framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven). - Preliminary support for devices with multiple voltage regulators and related fixes and cleanups in the Operating Performance Points (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd). - System sleep state selection interface rework to make it easier to support suspend-to-idle as the default system suspend method (Rafael Wysocki). - PM core fixes and cleanups, mostly related to the interactions between the system suspend and runtime PM frameworks (Ulf Hansson, Sahitya Tummala, Tony Lindgren). - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski). - New Knights Mill CPU ID for the Intel RAPL power capping driver (Piotr Luc). - Intel RAPL power capping driver fixes, cleanups and switch over to using the new CPU offline/online state machine (Jacob Pan, Thomas Gleixner, Sebastian Andrzej Siewior). - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc, rockchip-dfi devfreq drivers and the devfreq core (Axel Lin, Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar). - Fix for false-positive KASAN warnings during resume from ACPI S3 (suspend-to-RAM) on x86 (Josh Poimboeuf). - Memory map verification during resume from hibernation on x86 to ensure a consistent address space layout (Chen Yu). - Wakeup sources debugging enhancement (Xing Wei). - rockchip-io AVS driver cleanup (Shawn Lin). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYTx4+AAoJEILEb/54YlRx9f8P/2SlNHUENW5qh6FtCw00oC2u UqJerQJ2L38UgbgxbE/0VYblma9rFABDWC1eO2xN2XdcdW5UPBKPVvNcOgNe1Clh gjy3RxZXVpmjfzt2kGfsTLEuGnHqwvx51hTUkeA2LwvkOal45xb8ZESmy8opCtiv iG4LwmPHoxdX5Za5nA9ItFKzxyO1EoyNSnBYAVwALDHxmNOfxEcRevfurASt/0M9 brCCZJA0/sZxeL0lBdy8fNQPIBTUfCoTJG/MtmzGrObJ9wMFvEDfXrVEyZiWs/zA AAZ4kQL77enrIKgrLN8e0G6LzTLHoVcvn38Xjf24dKUqhd7ACBhYcnW+jK3+7EAd gjZ8efObQsiuyK/EDLUNw35tt96CHOqfrQCj2tIwRVvk9EekLqAGXdIndTCr2kYW RpefmP5kMljnm/nQFOVLwMEUQMuVkvUE7EgxADy7DoDmepBFC4ICRDWPye70R2kC 0O1Tn2PAQq4Fd1tyI9TYYz0YQQkRoaRb5rfYUSzbRbeCdsphUopp4Vhsiyn6IcnF XnLbg6pRAat82MoS9n4pfO/VCo8vkErKA8tut9G7TDakkrJoEE7l31PdKW0hP3f6 sBo6xXy6WTeivU/o/i8TbM6K4mA37pBaj78ooIkWLgg5fzRaS2+0xSPVy2H9x1m5 LymHcobCK9rSZ1l208Fe =vhxI -----END PGP SIGNATURE----- Merge tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Again, cpufreq gets more changes than the other parts this time (one new driver, one old driver less, a bunch of enhancements of the existing code, new CPU IDs, fixes, cleanups) There also are some changes in cpuidle (idle injection rework, a couple of new CPU IDs, online/offline rework in intel_idle, fixes and cleanups), in the generic power domains framework (mostly related to supporting power domains containing CPUs), and in the Operating Performance Points (OPP) library (mostly related to supporting devices with multiple voltage regulators) In addition to that, the system sleep state selection interface is modified to make it easier for distributions with unchanged user space to support suspend-to-idle as the default system suspend method, some issues are fixed in the PM core, the latency tolerance PM QoS framework is improved a bit, the Intel RAPL power capping driver is cleaned up and there are some fixes and cleanups in the devfreq subsystem Specifics: - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding for it (Markus Mayer) - Support for ARM Integrator/AP and Integrator/CP in the generic DT cpufreq driver and elimination of the old Integrator cpufreq driver (Linus Walleij) - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier, and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie, Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik) - cpufreq core fix to eliminate races that may lead to using inactive policy objects and related cleanups (Rafael Wysocki) - cpufreq schedutil governor update to make it use SCHED_FIFO kernel threads (instead of regular workqueues) for doing delayed work (to reduce the response latency in some cases) and related cleanups (Viresh Kumar) - New cpufreq sysfs attribute for resetting statistics (Markus Mayer) - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis, Viresh Kumar) - Support for using generic cpufreq governors in the intel_pstate driver (Rafael Wysocki) - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy Performance Preference/Energy Performance Bias) knobs in the intel_pstate driver (Srinivas Pandruvada) - New CPU ID for Knights Mill in intel_pstate (Piotr Luc) - intel_pstate driver modification to use the P-state selection algorithm based on CPU load on platforms with the system profile in the ACPI tables set to "mobile" (Srinivas Pandruvada) - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki, Srinivas Pandruvada) - cpufreq powernv driver updates including fast switching support (for the schedutil governor), fixes and cleanus (Akshay Adiga, Andrew Donnellan, Denis Kirjanov) - acpi-cpufreq driver rework to switch it over to the new CPU offline/online state machine (Sebastian Andrzej Siewior) - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth Prakash) - Idle injection rework (to make it use the regular idle path instead of a home-grown custom one) and related powerclamp thermal driver updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej Siewior) - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy Shevchenko, Piotr Luc) - intel_idle driver cleanups and switch over to using the new CPU offline/online state machine (Anna-Maria Gleixner, Sebastian Andrzej Siewior) - cpuidle DT driver update to support suspend-to-idle properly (Sudeep Holla) - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian, Rafael Wysocki) - Preliminary support for power domains including CPUs in the generic power domains (genpd) framework and related DT bindings (Lina Iyer) - Assorted fixes and cleanups in the generic power domains (genpd) framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven) - Preliminary support for devices with multiple voltage regulators and related fixes and cleanups in the Operating Performance Points (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd) - System sleep state selection interface rework to make it easier to support suspend-to-idle as the default system suspend method (Rafael Wysocki) - PM core fixes and cleanups, mostly related to the interactions between the system suspend and runtime PM frameworks (Ulf Hansson, Sahitya Tummala, Tony Lindgren) - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski) - New Knights Mill CPU ID for the Intel RAPL power capping driver (Piotr Luc) - Intel RAPL power capping driver fixes, cleanups and switch over to using the new CPU offline/online state machine (Jacob Pan, Thomas Gleixner, Sebastian Andrzej Siewior) - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc, rockchip-dfi devfreq drivers and the devfreq core (Axel Lin, Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar) - Fix for false-positive KASAN warnings during resume from ACPI S3 (suspend-to-RAM) on x86 (Josh Poimboeuf) - Memory map verification during resume from hibernation on x86 to ensure a consistent address space layout (Chen Yu) - Wakeup sources debugging enhancement (Xing Wei) - rockchip-io AVS driver cleanup (Shawn Lin)" * tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits) devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks devfreq: rk3399_dmc: Remove dangling rcu_read_unlock() devfreq: exynos: Don't use OPP structures outside of RCU locks Documentation: intel_pstate: Document HWP energy/performance hints cpufreq: intel_pstate: Support for energy performance hints with HWP cpufreq: intel_pstate: Add locking around HWP requests PM / sleep: Print active wakeup sources when blocking on wakeup_count reads PM / core: Fix bug in the error handling of async suspend PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend PM / Domains: Fix compatible for domain idle state PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators() PM / OPP: Allow platform specific custom set_opp() callbacks PM / OPP: Separate out _generic_set_opp() PM / OPP: Add infrastructure to manage multiple regulators PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage() PM / OPP: Manage supply's voltage/current in a separate structure PM / OPP: Don't use OPP structure outside of rcu protected section PM / OPP: Reword binding supporting multiple regulators per device PM / OPP: Fix incorrect cpu-supply property in binding cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state() ..	2016-12-13 10:41:53 -08:00
Linus Torvalds	9439b3710d	Main pull request for drm for 4.10 kernel -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYT3qqAAoJEAx081l5xIa+dLMP/2dqBybSAeWlPmAwVenIHRtS KFNktISezFSY/LBcIP2mHkFJmjTKBMZFxWnyEJL9NmFUD1cS2WMyNnC1282h/+rD +P8Bsmzmt/daV4UTFxVDpzlmVlavAyakNi6FnSQfAfmf+3PB1yzU3gn8ld9pU/if h7KEp9fDn9eYZreTRfCUloI2yoVpD9d0DG3uaGDN/N0kGUnCC6TZT5ig5j2JO016 fYf/DqoYAk3ItWF9WK/uG7qJIGi37afCpQq+kbSSJk+p3HjJqu8JUe9jzqYdl7j9 26TGSY5o9WLhZkxDgbcCIJzcFJhMmXgMdhjil9lqaHmnNG5FPFU7g8DK1CZqbel9 m8+aRPn1EgxIahMgdl8NblW1pfO2Kco0tZmoP5vXx1uqhivd67h0hiQqp66WxOJd i2yMLncaCEv8M161CVEgtzuI5a7nCfaZv7J9ArzbkD/huBwu51IZgTs7Dz4njgvz VPB5FBTB/ZYteErUNoh6gjF0hLngWvvJSPvuzT+EFO7yypek0IJ28GTdbxYSP+jR 13697s5Itigf/D3KUdRRGsWRzyVVN9n+djkl//sy5ddL9eOlKSKEga4ujOUjTWaW hTvAxpK9GmJS/Iun5jIP6f75zDbi+e8FWUeB/OI2lPtnApaSKdXBTPXsco2RnTEV +G6XrH8IMEIsTxOk7hWU =7s/c -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "This is the main pull request for drm for 4.10 kernel. New drivers: - ZTE VOU display driver (zxdrm) - Amlogic Meson Graphic Controller GXBB/GXL/GXM SoCs (meson) - MXSFB support (mxsfb) Core: - Format handling has been reworked - Better atomic state debugging - drm_mm leak debugging - Atomic explicit fencing support - fbdev helper ops - Documentation updates - MST fbcon fixes Bridge: - Silicon Image SiI8620 driver Panel: - Add support for new simple panels i915: - GVT Device model - Better HDMI2.0 support on skylake - More watermark fixes - GPU idling rework for suspend/resume - DP Audio workarounds - Scheduler prep-work - Opregion CADL handling - GPU scheduler and priority boosting amdgfx/radeon: - Support for virtual devices - New VM manager for non-contig VRAM buffers - UVD powergating - SI register header cleanup - Cursor fixes - Powermanagement fixes nouveau: - Powermangement reworks for better voltage/clock changes - Atomic modesetting support - Displayport Multistream (MST) support. - GP102/104 hang and cursor fixes - GP106 support hisilicon: - hibmc support (BMC chip for aarch64 servers) armada: - add tracing support for overlay change - refactor plane support - de-midlayer the driver omapdrm: - Timing code cleanups rcar-du: - R8A7792/R8A7796 support - Misc fixes. sunxi: - A31 SoC display engine support imx-drm: - YUV format support - Cleanup plane atomic update mali-dp: - Misc fixes dw-hdmi: - Add support for HDMI i2c master controller tegra: - IOMMU support fixes - Error handling fixes tda998x: - Fix connector registration - Improved robustness - Fix infoframe/audio compliance virtio: - fix busid issues - allocate more vbufs qxl: - misc fixes and cleanups. vc4: - Fragment shader threading - ETC1 support - VEC (tv-out) support msm: - A5XX GPU support - Lots of atomic changes tilcdc: - Misc fixes and cleanups. etnaviv: - Fix dma-buf export path - DRAW_INSTANCED support - fix driver on i.MX6SX exynos: - HDMI refactoring fsl-dcu: - fbdev changes" * tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux: (1343 commits) drm/nouveau/kms/nv50: fix atomic regression on original G80 drm/nouveau/bl: Do not register interface if Apple GMUX detected drm/nouveau/bl: Assign different names to interfaces drm/nouveau/bios/dp: fix handling of LevelEntryTableIndex on DP table 4.2 drm/nouveau/ltc: protect clearing of comptags with mutex drm/nouveau/gr/gf100-: handle GPC/TPC/MPC trap drm/nouveau/core: recognise GP106 chipset drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas drm/nouveau/gr/gf100-: FECS intr handling is not relevant on proprietary ucode drm/nouveau/gr/gf100-: properly ack all FECS error interrupts drm/nouveau/fifo/gf100-: recover from host mmu faults drm: Add fake controlD* symlinks for backwards compat drm/vc4: Don't use drm_put_dev drm/vc4: Document VEC DT binding drm/vc4: Add support for the VEC (Video Encoder) IP drm: Add TV connector states to drm_connector_state drm: Turn DRM_MODE_SUBCONNECTOR_xx definitions into an enum drm/vc4: Fix ->clock_select setting for the VEC encoder drm/amdgpu/dce6: Set MASTER_UPDATE_MODE to 0 in resume_mc_access as well drm/amdgpu: use pin rather than pin_restricted in a few cases ...	2016-12-13 09:35:09 -08:00
Thomas Gleixner	9d85eb9119	x86/smpboot: Make logical package management more robust The logical package management has several issues: - The APIC ids provided by ACPI are not required to be the same as the initial APIC id which can be retrieved by CPUID. The APIC ids provided by ACPI are those which are written by the BIOS into the APIC. The initial id is set by hardware and can not be changed. The hardware provided ids contain the real hardware package information. Especially AMD sets the effective APIC id different from the hardware id as they need to reserve space for the IOAPIC ids starting at id 0. As a consequence those machines trigger the currently active firmware bug printouts in dmesg, These are obviously wrong. - Virtual machines have their own interesting of enumerating APICs and packages which are not reliably covered by the current implementation. The sizing of the mapping array has been tweaked to be generously large to handle systems which provide a wrong core count when HT is disabled so the whole magic which checks for space in the physical hotplug case is not needed anymore. Simplify the whole machinery and do the mapping when the CPU starts and the CPUID derived physical package information is available. This solves the observed problems on AMD machines and works for the virtualization issues as well. Remove the extra call from XEN cpu bringup code as it is not longer required. Fixes: `d49597fd3b` ("x86/cpu: Deal with broken firmware (VMWare/XEN)") Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: M. Vefa Bicakci <m.v.b@runbox.com> Cc: xen-devel <xen-devel@lists.xen.org> Cc: Charles (Chas) Williams <ciwillia@brocade.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Alok Kataria <akataria@vmware.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-13 10:22:39 +01:00
Linus Torvalds	e7aa8c2eb1	These are the documentation changes for 4.10. It's another busy cycle for the docs tree, as the sphinx conversion continues. Highlights include: - Further work on PDF output, which remains a bit of a pain but should be more solid now. - Five more DocBook template files converted to Sphinx. Only 27 to go... Lots of plain-text files have also been converted and integrated. - Images in binary formats have been replaced with more source-friendly versions. - Various bits of organizational work, including the renaming of various files discussed at the kernel summit. - New documentation for the device_link mechanism. ...and, of course, lots of typo fixes and small updates. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9 sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/ T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7 RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf pmSJR0hVeRUmA4uw6+Su =A0EV -----END PGP SIGNATURE----- Merge tag 'docs-4.10' of git://git.lwn.net/linux Pull documentation update from Jonathan Corbet: "These are the documentation changes for 4.10. It's another busy cycle for the docs tree, as the sphinx conversion continues. Highlights include: - Further work on PDF output, which remains a bit of a pain but should be more solid now. - Five more DocBook template files converted to Sphinx. Only 27 to go... Lots of plain-text files have also been converted and integrated. - Images in binary formats have been replaced with more source-friendly versions. - Various bits of organizational work, including the renaming of various files discussed at the kernel summit. - New documentation for the device_link mechanism. ... and, of course, lots of typo fixes and small updates" * tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits) dma-buf: Extract dma-buf.rst Update Documentation/00-INDEX docs: 00-INDEX: document directories/files with no docs docs: 00-INDEX: remove non-existing entries docs: 00-INDEX: add missing entries for documentation files/dirs docs: 00-INDEX: consolidate process/ and admin-guide/ description scripts: add a script to check if Documentation/00-INDEX is sane Docs: change sh -> awk in REPORTING-BUGS Documentation/core-api/device_link: Add initial documentation core-api: remove an unexpected unident ppc/idle: Add documentation for powersave=off Doc: Correct typo, "Introdution" => "Introduction" Documentation/atomic_ops.txt: convert to ReST markup Documentation/local_ops.txt: convert to ReST markup Documentation/assoc_array.txt: convert to ReST markup docs-rst: parse-headers.pl: cleanup the documentation docs-rst: fix media cleandocs target docs-rst: media/Makefile: reorganize the rules docs-rst: media: build SVG from graphviz files docs-rst: replace bayer.png by a SVG image ...	2016-12-12 21:58:13 -08:00
Linus Torvalds	e34bac726d	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - various misc bits - most of MM (quite a lot of MM material is awaiting the merge of linux-next dependencies) - kasan - printk updates - procfs updates - MAINTAINERS - /lib updates - checkpatch updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits) init: reduce rootwait polling interval time to 5ms binfmt_elf: use vmalloc() for allocation of vma_filesz checkpatch: don't emit unified-diff error for rename-only patches checkpatch: don't check c99 types like uint8_t under tools checkpatch: avoid multiple line dereferences checkpatch: don't check .pl files, improve absolute path commit log test scripts/checkpatch.pl: fix spelling checkpatch: don't try to get maintained status when --no-tree is given lib/ida: document locking requirements a bit better lib/rbtree.c: fix typo in comment of ____rb_erase_color lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM MAINTAINERS: add drm and drm/i915 irc channels MAINTAINERS: add "C:" for URI for chat where developers hang out MAINTAINERS: add drm and drm/i915 bug filing info MAINTAINERS: add "B:" for URI where to file bugs get_maintainer: look for arbitrary letter prefixes in sections printk: add Kconfig option to set default console loglevel printk/sound: handle more message headers printk/btrfs: handle more message headers printk/kdb: handle more message headers ...	2016-12-12 20:50:02 -08:00
Linus Torvalds	9465d9cc31	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...	2016-12-12 19:56:15 -08:00
Linus Torvalds	e71c3978d6	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...	2016-12-12 19:25:04 -08:00
Andrey Ryabinin	8d5341a626	x86/ldt: use vfree_atomic() to free ldt entries vfree() is going to use sleeping lock. free_ldt_struct() may be called with disabled preemption, therefore we must use vfree_atomic() here. E.g. call trace: vfree() free_ldt_struct() destroy_context_ldt() __mmdrop() finish_task_switch() schedule_tail() ret_from_fork() Link: http://lkml.kernel.org/r/1479474236-4139-7-git-send-email-hch@lst.de Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-12 18:55:08 -08:00
Reza Arbab	39fa104d5b	mm: remove x86-only restriction of movable_node In commit `c5320926e3` ("mem-hotplug: introduce movable_node boot option"), the memblock allocation direction is changed to bottom-up and then back to top-down like this: 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node(). 2. memblock_set_bottom_up(false), called by x86's numa_init(). Even though (1) occurs in generic mm code, it is wrapped by #ifdef CONFIG_MOVABLE_NODE, which depends on X86_64. This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches, things will be unbalanced. (1) will happen for them, but (2) will not. This toggle was added in the first place because x86 has a delay between adding memblocks and marking them as hotpluggable. Since other arches do this marking either immediately or not at all, they do not require the bottom-up toggle. So, resolve things by moving (1) from cmdline_parse_movable_node() to x86's setup_arch(), immediately after the movable_node parameter has been parsed. Link: http://lkml.kernel.org/r/1479160961-25840-3-git-send-email-arbab@linux.vnet.ibm.com Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alistair Popple <apopple@au1.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-12 18:55:07 -08:00
Linus Torvalds	f797484c26	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "Two changes: - implement various VMWare guest OS improvements/fixes (Alexey Makhalov) - unexport a spurious export from the intel-mid platform driver (Lukas Wunner)" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmware: Add paravirt sched clock x86/vmware: Add basic paravirt ops support x86/vmware: Use tsc_khz value for calibrate_cpu() x86/platform/intel-mid: Unexport intel_mid_pci_set_power_state() x86/vmware: Read tsc_khz only once at boot time	2016-12-12 15:29:06 -08:00
Linus Torvalds	991bc36254	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode update from Ingo Molnar: "The biggest change (by Borislav Petkov) is a thorough rewrite of the Intel microcode loader and its interactions with the core code. The biggest conceptual change is the decoupling of the microcode loading on boot and application processors (which load the microcode in different scenarios), so that both parse the input patches with as few assumptions as possible - this also fixes various kernel address space randomization bugs. (The AP side then goes on and caches the result to improve boot performance.) Since the AMD side already did this, this change also opened up the path towards more unification/simplification of the core microcode loading infrastructure: 10 files changed, 647 insertions(+), 940 deletions(-) which speaks for itself" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Bump driver version, update copyrights x86/microcode: Rework microcode loading x86/microcode/intel: Remove intel_lib.c x86/microcode/amd: Move private inlines to .c and mark local functions static x86/microcode: Collect CPU info on resume x86/microcode: Issue the debug printk on resume only on success x86/microcode/amd: Hand down the CPU family x86/microcode: Export the microcode cache linked list x86/microcode: Remove one #ifdef clause x86/microcode/intel: Simplify generic_load_microcode() x86/microcode: Move driver authors to CREDITS x86/microcode: Run the AP-loading routine only on the application processors	2016-12-12 15:23:02 -08:00
Linus Torvalds	212f30008a	Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 idle updates from Ingo Molnar: "There were two bigger changes in this development cycle: - remove idle notifiers: 32 files changed, 74 insertions(+), 803 deletions(-) These notifiers were of questionable value and the main usecase, the i7300 driver, was essentially unmaintained and can be removed, plus modern power management concepts don't need the callback - so use this golden opportunity and get rid of this opaque and fragile callback from a latency sensitive code path. (Len Brown, Thomas Gleixner) - improve the AMD Erratum 400 workaround that used high overhead MSR polling in the idle loop (Borisla Petkov, Thomas Gleixner)" * 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove empty idle.h header x86/amd: Simplify AMD E400 aware idle routine x86/amd: Check for the C1E bug post ACPI subsystem init x86/bugs: Separate AMD E400 erratum and C1E bug x86/cpufeature: Provide helper to set bugs bits x86/idle: Remove enter_idle(), exit_idle() x86: Remove x86_test_and_clear_bit_percpu() x86/idle: Remove is_idle flag x86/idle: Remove idle_notifier i7300_idle: Remove this driver	2016-12-12 14:55:04 -08:00
Linus Torvalds	6f3be0f043	Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 header fixlet from Ingo Molnar: "Remove unnecessary module.h inclusion from core code (Paul Gortmaker)" * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/percpu: Remove unnecessary include of module.h, add asm/desc.h	2016-12-12 14:53:24 -08:00
Linus Torvalds	518bacf5a5	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "The main changes in this cycle were: - do a large round of simplifications after all CPUs do 'eager' FPU context switching in v4.9: remove CR0 twiddling, remove leftover eager/lazy bts, etc (Andy Lutomirski) - more FPU code simplifications: remove struct fpu::counter, clarify nomenclature, remove unnecessary arguments/functions and better structure the code (Rik van Riel)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Remove clts() x86/fpu: Remove stts() x86/fpu: Handle #NM without FPU emulation as an error x86/fpu, lguest: Remove CR0.TS support x86/fpu, kvm: Remove host CR0.TS manipulation x86/fpu: Remove irq_ts_save() and irq_ts_restore() x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs() x86/fpu: Get rid of two redundant clts() calls x86/fpu: Finish excising 'eagerfpu' x86/fpu: Split old_fpu & new_fpu handling into separate functions x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state() x86/fpu: Split old & new FPU code paths x86/fpu: Remove __fpregs_(de)activate() x86/fpu: Rename lazy restore functions to "register state valid" x86/fpu, kvm: Remove KVM vcpu->fpu_counter x86/fpu: Remove struct fpu::counter x86/fpu: Remove use_eager_fpu() x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction x86/fpu: Hard-disable lazy FPU mode x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code	2016-12-12 14:27:49 -08:00
Linus Torvalds	535b2f73f6	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU updates from Ingo Molnar: "The changes in this development cycle were: - AMD CPU topology enhancements that are cleanups on current CPUs but which enable future Fam17 hardware. (Yazen Ghannam) - unify bugs.c and bugs_64.c (Borislav Petkov) - remove the show_msr= boot option (Borislav Petkov) - simplify a boot message (Borislav Petkov)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature x86/cpu: Get rid of the show_msr= boot option x86/cpu: Merge bugs.c and bugs_64.c x86/cpu: Remove the printk format specifier in "CPU0: "	2016-12-12 14:25:21 -08:00
Linus Torvalds	ef486c599a	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Two cleanups in the LDT handling code, by Dan Carpenter and Thomas Gleixner" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Make all size computations unsigned x86/ldt: Make a size argument unsigned	2016-12-12 14:20:14 -08:00
Linus Torvalds	5fc0363d43	Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "The main changes in this cycle were: - Makefile improvements (Paul Bolle) - KConfig cleanups to better separate 32-bit only, 64-bit only and generic feature enablement sections (Ingo Molnar)" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Remove three unneeded genhdr-y entries x86/build: Don't use $(LINUXINCLUDE) twice x86/kconfig: Sort the 'config X86' selects alphabetically x86/kconfig: Clean up 32-bit compat options x86/kconfig: Clean up IA32_EMULATION select x86/kconfig, x86/pkeys: Move pkeys selects to X86_INTEL_MEMORY_PROTECTION_KEYS x86/kconfig: Move 64-bit only arch Kconfig selects to 'config X86_64' x86/kconfig: Move 32-bit only arch Kconfig selects to 'config X86_32'	2016-12-12 14:16:19 -08:00
Linus Torvalds	06cc6b969c	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Misc cleanups/simplifications by Borislav Petkov, Paul Bolle and Wei Yang" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Optimize fixmap page fixup x86/boot: Simplify the GDTR calculation assembly code a bit x86/boot/build: Remove always empty $(USERINCLUDE)	2016-12-12 14:13:30 -08:00
Linus Torvalds	5645688f9d	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this development cycle were: - a large number of call stack dumping/printing improvements: higher robustness, better cross-context dumping, improved output, etc. (Josh Poimboeuf) - vDSO getcpu() performance improvement for future Intel CPUs with the RDPID instruction (Andy Lutomirski) - add two new Intel AVX512 features and the CPUID support infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela, He Chen) - more copy-user unification (Borislav Petkov) - entry code assembly macro simplifications (Alexander Kuleshov) - vDSO C/R support improvements (Dmitry Safonov) - misc fixes and cleanups (Borislav Petkov, Paul Bolle)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) scripts/decode_stacktrace.sh: Fix address line detection on x86 x86/boot/64: Use defines for page size x86/dumpstack: Make stack name tags more comprehensible selftests/x86: Add test_vdso to test getcpu() x86/vdso: Use RDPID in preference to LSL when available x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() x86/cpufeatures: Enable new AVX512 cpu features x86/cpuid: Provide get_scattered_cpuid_leaf() x86/cpuid: Cleanup cpuid_regs definitions x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants x86/unwind: Ensure stack grows down x86/vdso: Set vDSO pointer only after success x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE x86/unwind: Detect bad stack return address x86/dumpstack: Warn on stack recursion x86/unwind: Warn on bad frame pointer x86/decoder: Use stderr if insn sanity test fails x86/decoder: Use stdout if insn decoder test is successful mm/page_alloc: Remove kernel address exposure in free_reserved_area() x86/dumpstack: Remove raw stack dump ...	2016-12-12 13:49:57 -08:00
Linus Torvalds	4ade5b2268	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "Misc changes: - optimize (reduce) IRQ handler tracing overhead (Wanpeng Li) - clean up MSR helpers (Borislav Petkov) - fix build warning on some configs (Sebastian Andrzej Siewior)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Cleanup/streamline MSR helpers x86/apic: Prevent tracing on apic_msr_write_eoi() x86/msr: Add wrmsr_notrace() x86/apic: Get rid of "warning: 'acpi_ioapic_lock' defined but not used"	2016-12-12 13:24:04 -08:00
Linus Torvalds	df5f0f0a02	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Ingo Molnar: "The main changes in this development cycle were: - more AMD northbridge support work, mostly in preparation for Fam17h CPUs (Yazen Ghannam, Borislav Petkov) - cleanups/refactorings and fixes (Borislav Petkov, Tony Luck, Yinghai Lu)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Include the PPIN in MCE records when available x86/mce/AMD: Add system physical address translation for AMD Fam17h x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h x86/amd_nb: Add Fam17h Data Fabric as "Northbridge" x86/amd_nb: Make all exports EXPORT_SYMBOL_GPL x86/amd_nb: Make amd_northbridges internal to amd_nb.c x86/mce/AMD: Reset Threshold Limit after logging error x86/mce/AMD: Fix HWID_MCATYPE calculation by grouping arguments x86/MCE: Correct TSC timestamping of error records x86/RAS: Hide SMCA bank names x86/RAS: Rename smca_bank_names to smca_names x86/RAS: Simplify SMCA HWID descriptor struct x86/RAS: Simplify SMCA bank descriptor struct x86/MCE: Dump MCE to dmesg if no consumers x86/RAS: Add TSC timestamp to the injected MCE x86/MCE: Do not look at panic_on_oops in the severity grading	2016-12-12 12:58:50 -08:00
Linus Torvalds	92c020d08d	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a notion of 'better cores', which the scheduler will prefer to schedule single threaded workloads on. (Tim Chen, Srinivas Pandruvada) - enhance the handling of asymmetric capacity CPUs further (Morten Rasmussen) - improve/fix load handling when moving tasks between task groups (Vincent Guittot) - simplify and clean up the cputime code (Stanislaw Gruszka) - improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent Guittot) - make struct kthread kmalloc()ed and related fixes (Oleg Nesterov) - add uaccess atomicity debugging (when using access_ok() in the wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra) - implement various fixes, cleanups and other enhancements (Daniel Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched/core: Use load_avg for selecting idlest group sched/core: Fix find_idlest_group() for fork kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() kthread: Don't use to_live_kthread() in kthread_[un]park() kthread: Don't use to_live_kthread() in kthread_stop() Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function" kthread: Make struct kthread kmalloc'ed x86/uaccess, sched/preempt: Verify access_ok() context sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> cpufreq/intel_pstate: Use CPPC to get max performance acpi/bus: Set _OSC for diverse core support acpi/bus: Enable HWP CPPC objects x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU x86/sysctl: Add sysctl for ITMT scheduling feature x86: Enable Intel Turbo Boost Max Technology 3.0 x86/topology: Define x86's arch_update_cpu_topology sched: Extend scheduler's asym packing sched/fair: Clean up the tunable parameter definitions ...	2016-12-12 12:15:10 -08:00
Rafael J. Wysocki	d2c2ba6901	Merge branches 'acpi-soc', 'acpi-battery', 'acpi-video', 'acpi-cppc' and 'acpi-apei' * acpi-soc: ACPI / LPSS: enable hard LLP for DMA ACPI / APD: Add clock frequency for future AMD I2C controller * acpi-battery: ACPI / battery: If _BIX fails, retry with _BIF * acpi-video: ACPI / video: Add force_native quirk for HP Pavilion dv6 ACPI / video: Add force_native quirk for Dell XPS 17 L702X ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h * acpi-cppc: ACPI / CPPC: set an error code on probe error path * acpi-apei: ACPI / APEI / ARM64: APEI initial support for ARM64 ACPI / APEI: Fix NMI notification handling	2016-12-12 20:48:01 +01:00
Rafael J. Wysocki	631ddaba59	Merge branches 'pm-sleep' and 'powercap' * pm-sleep: PM / sleep: Print active wakeup sources when blocking on wakeup_count reads x86/suspend: fix false positive KASAN warning on suspend/resume PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag PM / sleep: System sleep state selection interface rework PM / hibernate: Verify the consistent of e820 memory map by md5 digest * powercap: powercap / RAPL: Add Knights Mill CPUID powercap/intel_rapl: fix and tidy up error handling powercap/intel_rapl: Track active CPUs internally powercap/intel_rapl: Cleanup duplicated init code powercap/intel rapl: Convert to hotplug state machine powercap/intel_rapl: Propagate error code when registration fails powercap/intel_rapl: Add missing domain data update on hotplug	2016-12-12 20:46:35 +01:00
Linus Torvalds	bca13ce455	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This update is pretty big and almost exclusively includes tooling changes, because v4.9's LTS status forced to completion most of the pending kernel side hardware enablement work and because we tried to freeze core perf work a bit to give a time window for the fuzzing efforts. The diff is large mostly due to the JSON hardware event tables added for Intel and Power8 CPUs. This was a popular feature request from people working close to hardware and from the HPC community. Tree size is big because this added the CPU event tables for over a decade of Intel CPUs. Future changes for a CPU vendor alrady support should be much smaller, as events for new models are added. The new events are listed in 'perf list', for the CPU model the tool is running on. If you find an interesting event it can be used as-is: $ perf stat -a -e l2_lines_out.pf_clean sleep 1 Performance counter stats for 'system wide': 7,860,403 l2_lines_out.pf_clean 1.000624918 seconds time elapsed The event lists can be searched the usual 'perf list' fashion for (case insensitive) substrings as well: $ perf list l2_lines_out List of pre-defined events (to be used in -e): cache: l2_lines_out.demand_clean [Clean L2 cache lines evicted by demand] l2_lines_out.demand_dirty [Dirty L2 cache lines evicted by demand] l2_lines_out.dirty_all [Dirty L2 cache lines filling the L2] l2_lines_out.pf_clean [Clean L2 cache lines evicted by L2 prefetch] l2_lines_out.pf_dirty [Dirty L2 cache lines evicted by L2 prefetch] etc. There's a few high level categories as well that can be listed: 'cache', 'floating point', 'frontend', 'memory', 'pipeline', 'virtual memory'. Existing generic events and workflows should work as-is. The only kernel side change is a late breaking fix for an older regression, related to Intel BTS, LBR and PT feature interaction. On the tooling side there are three new tools / major features: - The new 'perf c2c' tool provides means for Shared Data C2C/HITM analysis. This allows you to track down cacheline contention. The tool is based on x86's load latency and precise store facility events provided by Intel CPUs. It was tested by Joe Mario and has proven to be useful, finding some cacheline contentions. Joe also wrote a blog about c2c tool with examples: https://joemario.github.io/blog/2016/09/01/c2c-blog/ excerpt of the content on this site: At a high level, “perf c2c” will show you: * The cachelines where false sharing was detected. * The readers and writers to those cachelines, and the offsets where those accesses occurred. * The pid, tid, instruction addr, function name, binary object name for those readers and writers. * The source file and line number for each reader and writer. * The average load latency for the loads to those cachelines. * Which numa nodes the samples a cacheline came from and which CPUs were involved. Using perf c2c is similar to using the Linux perf tool today. First collect data with “perf c2c record”, then generate a report output with “perf c2c report” There one finds extensive details on using the tool, with tips on reducing the volume of samples while still capturing enough to do its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa) - The new 'perf sched timehist' tool provides tailored analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ------ ---------------- --------- --------- -------- 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 1.874604 [0011] <idle> 1.148 0.000 0.035 1.874723 [0005] <idle> 0.016 0.000 1.383 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. (David Ahern, Namhyung Kim) - Add CPU vendor hardware event tables: Add JSON files with vendor event naming for Intel and Power8 processors, allowing users of tools like oprofile to keep using the event names they are used to, as well as people reading vendor documentation, where such naming is used. (Andi Kleen, Sukadev Bhattiprolu) You should see all the new events with 'perf list' and you should be able to search them, for example 'perf list miss' will list all the myriads of miss events. Other tooling features added were: - Cross-arch annotation support: o Improve ARM support in the annotation code, affecting 'perf annotate', 'perf report' and live annotation in 'perf top' (Kim Phillips) o Initial support for PowerPC in the annotation code (Ravi Bangoria) o Support AArch64 in the 'annotate' code, native/local and cross-arch/remote (Kim Phillips) - Allow considering just events in a given time interval, via the '--time start.s.ms,end.s.ms' command line, added to 'perf kmem', 'perf report', 'perf sched timehist' and 'perf script' (David Ahern) - Add option to stop printing a callchain at one of a given group of symbol names (David Ahern) - Track memory freed in 'perf kmem stat' (David Ahern) - Allow querying and setting .perfconfig variables (Taeung Song) - Show branch information in callchains (predicted, TSX aborts, loop iteractions, etc) (Jin Yao) - Dynamicly change verbosity level by pressing 'V' in the 'perf top/report' hists TUI browser (Alexis Berlemont) - Implement 'perf trace --delay' in the same fashion as in 'perf record --delay', to skip sampling workload initialization events (Alexis Berlemont) - Make vendor named events case insensitive in 'perf list', i.e. 'perf list LONGEST_LAT' works just the same as 'perf list longest_lat' (Andi Kleen) - Add unwinding support for jitdump (Stefano Sanfilippo) Tooling infrastructure changes: - Support linking perf with clang and LLVM libraries, initially statically, but this limitation will be lifted and shared libraries, when available, will be preferred to the static build, that should, as with other features, be enabled explicitly (Wang Nan) - Add initial support (and perf test entry) for tooling hooks, starting with 'record_start' and 'record_end', that will have as its initial user the eBPF infrastructure, where perf_ prefixed functions will be JITed and run when such hooks are called (Wang Nan) - Implement assorted libbpf improvements (Wang Nan)" ... and lots of other changes, features, cleanups and refactorings I did not list, see the shortlog and the git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (220 commits) perf/x86: Fix exclusion of BTS and LBR for Goldmont perf tools: Explicitly document that --children is enabled by default perf sched timehist: Cleanup idle_max_cpu handling perf sched timehist: Handle zero sample->tid properly perf callchain: Introduce callchain_cursor__copy() perf sched: Cleanup option processing perf sched timehist: Improve error message when analyzing wrong file perf tools: Move perf build related variables under non fixdep leg perf tools: Force fixdep compilation at the start of the build perf tools: Move PERF-VERSION-FILE target into rules area perf build: Check LLVM version in feature check perf annotate: Show raw form for jump instruction with indirect target perf tools: Add non config targets perf tools: Cleanup build directory before each test perf tools: Move python/perf.so target into rules area perf tools: Move install-gtk target into rules area tools build: Move tabs to spaces where suitable tools build: Make the .cmd file more readable perf clang: Compile BPF script using builtin clang support perf clang: Support compile IR to BPF object and add testcase ...	2016-12-12 11:46:21 -08:00
Linus Torvalds	0719dbf5e1	Merge branch 'mm-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mm/PAT cleanup from Ingo Molnar: "A single cleanup for a generic interface that was originally introduced for PAT" * 'mm-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pat, mm: Make track_pfn_insert() return void	2016-12-12 11:14:52 -08:00
Linus Torvalds	6cdf89b1ca	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The tree got pretty big in this development cycle, but the net effect is pretty good: 115 files changed, 673 insertions(+), 1522 deletions(-) The main changes were: - Rework and generalize the mutex code to remove per arch mutex primitives. (Peter Zijlstra) - Add vCPU preemption support: add an interface to query the preemption status of vCPUs and use it in locking primitives - this optimizes paravirt performance. (Pan Xinhui, Juergen Gross, Christian Borntraeger) - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to clean up and improve the s390 lock yielding machinery and its core kernel impact. (Christian Borntraeger) - Micro-optimize mutexes some more. (Waiman Long) - Reluctantly add the to-be-deprecated mutex_trylock_recursive() interface on a temporary basis, to give the DRM code more time to get rid of its locking hacks. Any other users will be NAK-ed on sight. (We turned off the deprecation warning for the time being to not pollute the build log.) (Peter Zijlstra) - Improve the rtmutex code a bit, in light of recent long lived bugs/races. (Thomas Gleixner) - Misc fixes, cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/paravirt: Fix bool return type for PVOP_CALL() x86/paravirt: Fix native_patch() locking/ww_mutex: Use relaxed atomics locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() Documentation/virtual/kvm: Support the vCPU preemption check x86/xen: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check kvm: Introduce kvm_write_guest_offset_cached() locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests locking/spinlocks, s390: Implement vcpu_is_preempted(cpu) locking/core, powerpc: Implement vcpu_is_preempted(cpu) sched/core: Introduce the vcpu_is_preempted(cpu) interface sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q locking/core: Provide common cpu_relax_yield() definition locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily ...	2016-12-12 10:48:02 -08:00
Linus Torvalds	3940cf0b3d	Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this development cycle were: - Implement EFI dev path parser and other changes to fully support thunderbolt devices on Apple Macbooks (Lukas Wunner) - Add RNG seeding via the EFI stub, on ARM/arm64 (Ard Biesheuvel) - Expose EFI framebuffer configuration to user-space, to improve tooling (Peter Jones) - Misc fixes and cleanups (Ivan Hu, Wei Yongjun, Yisheng Xie, Dan Carpenter, Roy Franz)" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit thunderbolt: Compile on x86 only thunderbolt, efi: Fix Kconfig dependencies harder thunderbolt, efi: Fix Kconfig dependencies thunderbolt: Use Device ROM retrieved from EFI x86/efi: Retrieve and assign Apple device properties efi: Allow bitness-agnostic protocol calls efi: Add device path parser efi/arm/libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table efi/libstub: Add random.c to ARM build efi: Add support for seeding the RNG from a UEFI config table MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem efi/libstub: Fix allocation size calculations efi/efivar_ssdt_load: Don't return success on allocation failure efifb: Show framebuffer layout as device attributes efi/efi_test: Use memdup_user() as a cleanup efi/efi_test: Fix uninitialized variable 'rv' efi/efi_test: Fix uninitialized variable 'datasize' efi/arm: Fix efi_init() error handling efi: Remove unused include of <linux/version.h>	2016-12-12 10:03:44 -08:00
Linus Torvalds	9ad1aeecdb	Merge branch 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP bootup updates from Ingo Molnar: "Three changes to unify/standardize some of the bootup message printing in kernel/smp.c between architectures" * 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/smp: Tell the user we're bringing up secondary CPUs kernel/smp: Make the SMP boot message common on all arches kernel/smp: Define pr_fmt() for smp.c	2016-12-12 10:02:01 -08:00
Ingo Molnar	6643aab30f	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:10:40 +01:00
Peter Zijlstra	11f254dbb3	x86/paravirt: Fix bool return type for PVOP_CALL() Commit: `3cded41794` ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") introduced a paravirt op with bool return type [] It turns out that the PVOP_CALL() macros miscompile when rettype is bool. Code that looked like: 83 ef 01 sub $0x1,%edi ff 15 32 a0 d8 00 callq 0xd8a032(%rip) # ffffffff81e28120 <pv_lock_ops+0x20> 84 c0 test %al,%al ended up looking like so after PVOP_CALL1() was applied: 83 ef 01 sub $0x1,%edi 48 63 ff movslq %edi,%rdi ff 14 25 20 81 e2 81 callq 0xffffffff81e28120 48 85 c0 test %rax,%rax Note how it tests the whole of %rax, even though a typical bool return function only sets %al, like: 0f 95 c0 setne %al c3 retq This is because ____PVOP_CALL() does: __ret = (rettype)__eax; and while regular integer type casts truncate the result, a cast to bool tests for any !0 value. Fix this by explicitly truncating to sizeof(rettype) before casting. [*] The actual bug should've been exposed in commit: `446f3dc8cc` ("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests") but that didn't properly implement the paravirt call. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `3cded41794` ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.346057680@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:09:20 +01:00
Peter Zijlstra	45dbea5f55	x86/paravirt: Fix native_patch() While chasing a regression I noticed we potentially patch the wrong code in native_patch(). If we do not select the native code sequence, we must use the default patcher, not fall-through the switch case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel test robot <xiaolong.ye@intel.com> Fixes: `3cded41794` ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.270616999@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:09:19 +01:00
Ingo Molnar	6f38751510	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:07:13 +01:00
Andi Kleen	b0c1ef5295	perf/x86: Fix exclusion of BTS and LBR for Goldmont An earlier patch allowed enabling PT and LBR at the same time on Goldmont. However it also allowed enabling BTS and LBR at the same time, which is still not supported. Fix this by bypassing the check only for PT. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alexander.shishkin@intel.com Cc: kan.liang@intel.com Cc: <stable@vger.kernel.org> Fixes: `ccbebba4c6` ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") Link: http://lkml.kernel.org/r/20161209001417.4713-1-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:06:09 +01:00
David S. Miller	821781a9f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-12-10 16:21:55 -05:00
Thomas Gleixner	990e9dc381	x86/ldt: Make all size computations unsigned ldt->size can never be negative. The helper functions take 'unsigned int' arguments which are assigned from ldt->size. The related user space user_desc struct member entry_number is unsigned as well. But ldt->size itself and a few local variables which are related to ldt->size are type 'int' which makes no sense whatsoever and results in typecasts which make the eyes bleed. Clean it up and convert everything which is related to ldt->size to unsigned it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dan Carpenter <dan.carpenter@oracle.com>	2016-12-10 00:24:39 +01:00
Dan Carpenter	296dc5806d	x86/ldt: Make a size argument unsigned My static checker complains that we put an upper bound on the "size" argument but not a lower bound. The checker is not smart enough to know the possible ranges of "old_mm->context.ldt->size" from init_new_context_ldt() so it thinks maybe it could be negative. Let's make it unsigned to silence the warning and future proof the code a bit. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: kernel-janitors@vger.kernel.org Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161208105602.GA11382@elgon.mountain Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-10 00:24:39 +01:00
Thomas Gleixner	34bc3560c6	x86: Remove empty idle.h header One include less is always a good thing(tm). Good riddance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:22 +01:00
Borislav Petkov	07c94a3812	x86/amd: Simplify AMD E400 aware idle routine Reorganize the E400 detection now that we have everything in place: switch the CPUs to broadcast mode after the LAPIC has been initialized and remove the facilities that were used previously on the idle path. Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine because alternatives have been applied when the actual detection happens, so the static switching does not take effect and the test will stay false. Use boot_cpu_has_bug() instead which is definitely an improvement over the RDMSR and the cpumask handling. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:21 +01:00
Thomas Gleixner	e7ff3a4763	x86/amd: Check for the C1E bug post ACPI subsystem init AMD CPUs affected by the E400 erratum suffer from the issue that the local APIC timer stops when the CPU goes into C1E. Unfortunately there is no way to detect the affected CPUs on early boot. It's only possible to determine the range of possibly affected CPUs from the family/model range. The actual decision whether to enter C1E and thus cause the bug is done by the firmware and we need to detect that case late, after ACPI has been initialized. The current solution is to check in the idle routine whether the CPU is affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is affected and the system is switched into forced broadcast mode. This is ineffective and on non-affected CPUs every entry to idle does the extra RDMSR. After doing some research it turns out that the bits are visible on the boot CPU right after the ACPI subsystem is initialized in the early boot process. So instead of polling for the bits in the idle loop, add a detection function after acpi_subsystem_init() and check for the MSR bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it will stop in C1E state as well. The switch to broadcast mode cannot be done at this point because the boot CPU still uses HPET as a clockevent device and the local APIC timer is not yet calibrated and installed. The switch to broadcast mode on the affected CPUs needs to be done when the local APIC timer is actually set up. This allows to cleanup the amd_e400_idle() function in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:21 +01:00
Thomas Gleixner	3344ed3079	x86/bugs: Separate AMD E400 erratum and C1E bug The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E state) is a two step process: - Selection of the E400 aware idle routine - Detection whether the platform is affected The idle routine selection happens for possibly affected CPUs depending on family/model/stepping information. These range of CPUs is not necessarily affected as the decision whether to enable the C1E feature is made by the firmware. Unfortunately there is no way to query this at early boot. The current implementation polls a MSR in the E400 aware idle routine to detect whether the CPU is affected. This is inefficient on non affected CPUs because every idle entry has to do the MSR read. There is a better way to detect this before going idle for the first time which requires to seperate the bug flags: X86_BUG_AMD_E400 - Selects the E400 aware idle routine and enables the detection X86_BUG_AMD_APIC_C1E - Set when the platform is affected by E400 Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400 bug bit to select the idle routine which currently does an unconditional detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches to remove the MSR polling and simplify the handling of this misfeature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:20 +01:00
Borislav Petkov	a588b98364	x86/cpufeature: Provide helper to set bugs bits Will be used in a later patch to set bug bits for bugs which need late detection. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:20 +01:00
Steven Rostedt (Red Hat)	847fa1a6d3	ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it With new binutils, gcc may get smart with its optimization and change a jmp from a 5 byte jump to a 2 byte one even though it was jumping to a global function. But that global function existed within a 2 byte radius, and gcc was able to optimize it. Unfortunately, that jump was also being modified when function graph tracing begins. Since ftrace expected that jump to be 5 bytes, but it was only two, it overwrote code after the jump, causing a crash. This was fixed for x86_64 with commit `8329e818f1`, with the same subject as this commit, but nothing was done for x86_32. Cc: stable@vger.kernel.org Fixes: `d61f82d066` ("ftrace: use dynamic patching for updating mcount calls") Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-12-09 09:17:10 -05:00
Steven Rostedt (Red Hat)	8cf868affd	tracing: Have the reg function allow to fail Some tracepoints have a registration function that gets enabled when the tracepoint is enabled. There may be cases that the registraction function must fail (for example, can't allocate enough memory). In this case, the tracepoint should also fail to register, otherwise the user would not know why the tracepoint is not working. Cc: David Howells <dhowells@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Anton Blanchard <anton@samba.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-12-09 09:13:30 -05:00
Shaohua Li	76ae054c69	x86/intel_rdt: Implement show_options() for resctrlfs Implement show_options() callback for intel resource control filesystem to expose the active mount options in /proc/ Signed-off-by: Shaohua Li <shli@fb.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/7dce7c1886ac9289442d254ea18322c92bd968da.1480717072.git.shli@fb.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 14:12:18 +01:00
Alex Thorlton	738662c35c	xen/x86: Increase xen_e820_map to E820_X_MAX possible entries On systems with sufficiently large e820 tables, and several IOAPICs, it is possible for the XENMEM_machine_memory_map callback (and its counterpart, XENMEM_memory_map) to attempt to return an e820 table with more than 128 entries. This callback adds entries to the BIOS-provided e820 table to account for IOAPIC registers, which, on sufficiently large systems, can result in an e820 table that is too large to copy back into xen_e820_map. This change simply increases the size of xen_e820_map to E820_X_MAX to ensure that there is enough room to store the entire e820 map returned from this callback. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>	2016-12-09 10:59:08 +01:00
Alex Thorlton	9d2f86c6ca	x86: Make E820_X_MAX unconditionally larger than E820MAX It's really not necessary to limit E820_X_MAX to 128 in the non-EFI case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that E820_X_MAX is always at least slightly larger than E820MAX. The real motivation behind this is actually to prevent some issues in the Xen kernel, where the XENMEM_machine_memory_map hypercall can produce an e820 map larger than 128 entries, even on systems where the original e820 table was quite a bit smaller than that, depending on how many IOAPICs are installed on the system. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Ingo Molnar <mingo@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>	2016-12-09 10:59:04 +01:00
Martin KaFai Lau	17bedab272	bpf: xdp: Allow head adjustment in XDP prog This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-08 14:25:13 -05:00
Petr Mladek	36da91bdf5	KVM: x86: Handle the kthread worker using the new API Use the new API to create and destroy the "kvm-pit" kthread worker. The API hides some implementation details. In particular, kthread_create_worker() allocates and initializes struct kthread_worker. It runs the kthread the right way and stores task_struct into the worker structure. kthread_destroy_worker() flushes all pending works, stops the kthread and frees the structure. This patch does not change the existing behavior except for dynamically allocating struct kthread_worker and storing only the pointer of this structure. It is compile tested only because I did not find an easy way how to run the code. Well, it should be pretty safe given the nature of the change. Signed-off-by: Petr Mladek <pmladek@suse.com> Message-Id: <1476877847-11217-1-git-send-email-pmladek@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-08 15:31:11 +01:00
Jan Dakinevich	16c2aec6a2	KVM: nVMX: invvpid handling improvements - Expose all invalidation types to the L1 - Reject invvpid instruction, if L1 passed zero vpid value to single context invalidations Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-08 15:31:11 +01:00
Ladi Prosek	1dc35dacc1	KVM: nVMX: check host CR3 on vmentry and vmexit This commit adds missing host CR3 checks. Before entering guest mode, the value of CR3 is checked for reserved bits. After returning, nested_vmx_load_cr3 is called to set the new CR3 value and check and load PDPTRs. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:10 +01:00
Ladi Prosek	9ed38ffad4	KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry Loading CR3 as part of emulating vmentry is different from regular CR3 loads, as implemented in kvm_set_cr3, in several ways. * different rules are followed to check CR3 and it is desirable for the caller to distinguish between the possible failures * PDPTRs are not loaded if PAE paging and nested EPT are both enabled * many MMU operations are not necessary This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of nested vmentry and vmexit, and makes use of it on the nested vmentry path. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:10 +01:00
Ladi Prosek	ee146c1c10	KVM: nVMX: propagate errors from prepare_vmcs02 It is possible that prepare_vmcs02 fails to load the guest state. This patch adds the proper error handling for such a case. L1 will receive an INVALID_STATE vmexit with the appropriate exit qualification if it happens. A failure to set guest CR3 is the only error propagated from prepare_vmcs02 at the moment. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:09 +01:00
Ladi Prosek	7ca29de213	KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM does not correctly handle L1 hypervisors that emulate L2 real mode with PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two related issues: 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are overwritten in ept_load_pdptrs because the registers are believed to have been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is running with PAE enabled but PDPTRs have been set up by L1. 2) When L2 is about to enable paging and loads its CR3, we, again, attempt to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees that this will succeed (it's just a CR3 load, paging is not enabled yet) and if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is then lost and L2 crashes right after it enables paging. This patch replaces the kvm_set_cr3 call with a simple register write if PAE and EPT are both on. CR3 is not to be interpreted in this case. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:09 +01:00
David Matlack	5a6a9748b4	KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry is more faithful to VMCS12. This patch correctly causes VM-entry to fail when "IA-32e mode guest" is 1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and "IA-32e mode guest" would silently be disabled by KVM. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:08 +01:00
David Matlack	8322ebbb24	KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID MSR_IA32_CR{0,4}_FIXED1 define which bits in CR0 and CR4 are allowed to be 1 during VMX operation. Since the set of allowed-1 bits is the same in and out of VMX operation, we can generate these MSRs entirely from the guest's CPUID. This lets userspace avoiding having to save/restore these MSRs. This patch also initializes MSR_IA32_CR{0,4}_FIXED1 from the CPU's MSRs by default. This is a saner than the current default of -1ull, which includes bits that the host CPU does not support. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:08 +01:00
David Matlack	3899152ccb	KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning all CR0 and CR4 bits are allowed to be 1 during VMX operation. This does not match real hardware, which disallows the high 32 bits of CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits which are defined in the SDM but missing according to CPUID). A guest can induce a VM-entry failure by setting these bits in GUEST_CR0 and GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are valid. Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing checks on these registers do not verify must-be-0 bits. Fix these checks to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1. This patch should introduce no change in behavior in KVM, since these MSRs are still -1ULL. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:07 +01:00
David Matlack	62cc6b9dc6	KVM: nVMX: support restore of VMX capability MSRs The VMX capability MSRs advertise the set of features the KVM virtual CPU can support. This set of features varies across different host CPUs and KVM versions. This patch aims to addresses both sources of differences, allowing VMs to be migrated across CPUs and KVM versions without guest-visible changes to these MSRs. Note that cross-KVM- version migration is only supported from this point forward. When the VMX capability MSRs are restored, they are audited to check that the set of features advertised are a subset of what KVM and the CPU support. Since the VMX capability MSRs are read-only, they do not need to be on the default MSR save/restore lists. The userspace hypervisor can set the values of these MSRs or read them from KVM at VCPU creation time, and restore the same value after every save/restore. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:07 +01:00
David Matlack	0115f9cbac	KVM: nVMX: generate non-true VMX MSRs based on true versions The "non-true" VMX capability MSRs can be generated from their "true" counterparts, by OR-ing the default1 bits. The default1 bits are fixed and defined in the SDM. Since we can generate the non-true VMX MSRs from the true versions, there's no need to store both in struct nested_vmx. This also lets userspace avoid having to restore the non-true MSRs. Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so, we simply need to set all the default1 bits in the true MSRs (such that the true MSRs and the generated non-true MSRs are equal). Signed-off-by: David Matlack <dmatlack@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:06 +01:00
Kyle Huey	ea07e42dec	KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. The trap flag stays set until software clears it. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:06 +01:00
Kyle Huey	6affcbedca	KVM: x86: Add kvm_skip_emulated_instruction and use it. kvm_skip_emulated_instruction calls both kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep, skipping the emulated instruction and generating a trap if necessary. Replacing skip_emulated_instruction calls with kvm_skip_emulated_instruction is straightforward, except for: - ICEBP, which is already inside a trap, so avoid triggering another trap. - Instructions that can trigger exits to userspace, such as the IO insns, MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a KVM_GUESTDBG_SINGLESTEP exit, and the handling code for IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will take precedence. The singlestep will be triggered again on the next instruction, which is the current behavior. - Task switch instructions which would require additional handling (e.g. the task switch bit) and are instead left alone. - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction, which do not trigger singlestep traps as mentioned previously. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:05 +01:00
Kyle Huey	eb27756217	KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 We can't return both the pass/fail boolean for the vmcs and the upcoming continue/exit-to-userspace boolean for skip_emulated_instruction out of nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead. Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when they advance the IP to the following instruction, not when they a) succeed, b) fail MSR validation or c) throw an exception. Add a separate call to skip_emulated_instruction that will later not be converted to the variant that checks the singlestep flag. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:04 +01:00
Kyle Huey	09ca3f2049	KVM: VMX: Reorder some skip_emulated_instruction calls The functions being moved ahead of skip_emulated_instruction here don't need updated IPs, and skipping the emulated instruction at the end will make it easier to return its value. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:04 +01:00
Kyle Huey	6a908b628c	KVM: x86: Add a return value to kvm_emulate_cpuid Once skipping the emulated instruction can potentially trigger an exit to userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to propagate a return value. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-08 15:31:03 +01:00
Konrad Rzeszutek Wilk	577f79e411	xen/pci: Bubble up error and fix description. The function is never called under PV guests, and only shows up when MSI (or MSI-X) cannot be allocated. Convert the message to include the error value. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2016-12-08 07:54:04 +01:00
Linus Torvalds	ea5a9eff96	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a core dumping crash fix, a guess-unwinder regression fix, plus three build warning fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind: Fix guess-unwinder regression x86/build: Annotate die() with noreturn to fix build warning on clang x86/platform/olpc: Fix resume handler build warning x86/apic/uv: Silence a shift wrapping warning x86/coredump: Always use user_regs_struct for compat_elf_gregset_t	2016-12-07 11:39:27 -08:00
Peter Zijlstra	7c4788950b	x86/uaccess, sched/preempt: Verify access_ok() context I recently encountered wreckage because access_ok() was used where it should not be, add an explicit WARN when access_ok() is used wrongly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-06 10:32:40 +01:00
Peter Zijlstra (Intel)	7f612a7f0b	perf/x86: Fix full width counter, counter overflow Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM. Both these parts have full_width_write set, and that does indeed have a problem. In order to deal with counter wrap, we must sample the counter at at least half the counter period (see also the sampling theorem) such that we can unambiguously reconstruct the count. However commit: `069e0c3c40` ("perf/x86/intel: Support full width counting") sets the sampling interval to the full period, not half. Fixing that exposes another issue, in that we must not sign extend the delta value when we shift it right; the counter cannot have decremented after all. With both these issues fixed, counter overflow functions correctly again. Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Tested-by: Liang, Kan <kan.liang@intel.com> Tested-by: Odzioba, Lukasz <lukasz.odzioba@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org Fixes: `069e0c3c40` ("perf/x86/intel: Support full width counting") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-06 09:44:28 +01:00
Piotr Luc	1dba23b12f	perf/x86/intel: Enable C-state residency events for Knights Mill The Knights Mill is enough close to Knights Landing so the path reuses C-state residency support of the latter. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20161201000853.18260-1-piotr.luc@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-06 09:44:27 +01:00
Josh Poimboeuf	b53f40db59	x86/suspend: fix false positive KASAN warning on suspend/resume Resuming from a suspend operation is showing a KASAN false positive warning: BUG: KASAN: stack-out-of-bounds in unwind_get_return_address+0x11d/0x130 at addr ffff8803867d7878 Read of size 8 by task pm-suspend/7774 page:ffffea000e19f5c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x2ffff0000000000() page dumped because: kasan: bad access detected CPU: 0 PID: 7774 Comm: pm-suspend Tainted: G B 4.9.0-rc7+ #8 Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016 Call Trace: dump_stack+0x63/0x82 kasan_report_error+0x4b4/0x4e0 ? acpi_hw_read_port+0xd0/0x1ea ? kfree_const+0x22/0x30 ? acpi_hw_validate_io_request+0x1a6/0x1a6 __asan_report_load8_noabort+0x61/0x70 ? unwind_get_return_address+0x11d/0x130 unwind_get_return_address+0x11d/0x130 ? unwind_next_frame+0x97/0xf0 __save_stack_trace+0x92/0x100 save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 ? kasan_kmalloc+0xad/0xe0 ? kasan_slab_alloc+0x12/0x20 ? acpi_hw_read+0x2b6/0x3aa ? acpi_hw_validate_register+0x20b/0x20b ? acpi_hw_write_port+0x72/0xc7 ? acpi_hw_write+0x11f/0x15f ? acpi_hw_read_multiple+0x19f/0x19f ? memcpy+0x45/0x50 ? acpi_hw_write_port+0x72/0xc7 ? acpi_hw_write+0x11f/0x15f ? acpi_hw_read_multiple+0x19f/0x19f ? kasan_unpoison_shadow+0x36/0x50 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc_trace+0xbc/0x1e0 ? acpi_get_sleep_type_data+0x9a/0x578 acpi_get_sleep_type_data+0x9a/0x578 acpi_hw_legacy_wake_prep+0x88/0x22c ? acpi_hw_legacy_sleep+0x3c7/0x3c7 ? acpi_write_bit_register+0x28d/0x2d3 ? acpi_read_bit_register+0x19b/0x19b acpi_hw_sleep_dispatch+0xb5/0xba acpi_leave_sleep_state_prep+0x17/0x19 acpi_suspend_enter+0x154/0x1e0 ? trace_suspend_resume+0xe8/0xe8 suspend_devices_and_enter+0xb09/0xdb0 ? printk+0xa8/0xd8 ? arch_suspend_enable_irqs+0x20/0x20 ? try_to_freeze_tasks+0x295/0x600 pm_suspend+0x6c9/0x780 ? finish_wait+0x1f0/0x1f0 ? suspend_devices_and_enter+0xdb0/0xdb0 state_store+0xa2/0x120 ? kobj_attr_show+0x60/0x60 kobj_attr_store+0x36/0x70 sysfs_kf_write+0x131/0x200 kernfs_fop_write+0x295/0x3f0 __vfs_write+0xef/0x760 ? handle_mm_fault+0x1346/0x35e0 ? do_iter_readv_writev+0x660/0x660 ? __pmd_alloc+0x310/0x310 ? do_lock_file_wait+0x1e0/0x1e0 ? apparmor_file_permission+0x18/0x20 ? security_file_permission+0x73/0x1c0 ? rw_verify_area+0xbd/0x2b0 vfs_write+0x149/0x4a0 SyS_write+0xd9/0x1c0 ? SyS_read+0x1c0/0x1c0 entry_SYSCALL_64_fastpath+0x1e/0xad Memory state around the buggy address: ffff8803867d7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8803867d7780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8803867d7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f4 ^ ffff8803867d7880: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 ffff8803867d7900: 00 00 00 f1 f1 f1 f1 04 f4 f4 f4 f3 f3 f3 f3 00 KASAN instrumentation poisons the stack when entering a function and unpoisons it when exiting the function. However, in the suspend path, some functions never return, so their stack never gets unpoisoned, resulting in stale KASAN shadow data which can cause later false positive warnings like the one above. Reported-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-06 02:22:44 +01:00
Dave Airlie	f03ee46be9	Linux 4.9-rc8 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYRIGyAAoJEHm+PkMAQRiG2ksH/jwMUT9j6glbwESxbn1YTqTM QcBT5AMc7D0wNuidQe0hWZMtG4RbC+4ZhxzZl2wPgA2gueJ+rBnyX7bgtA7ka8ka Fdc3u/Q1v38HPzf8iBnxcdCs40VgsoMLjFYCXrpOxuGDNKYzRd+Q8aI2TeGvzbyi X8+6oAWifBwo2oA06jfcuUncEWbyDDyK9aQksmfKOpjHdb26yELPEhsPOlds1g7E jYLnvUVnU2CoFaumta+rZQ0kzLdc4Ntu0wEao6WzJuQKsgoID+tS/6iudi8cUhDp YowGAVoOfr6rAJB0mwrDVfugpamaT3386XKyocdNsK0/jR60UIJ8x+WzvvSU+lY= =JTBj -----END PGP SIGNATURE----- Backmerge tag 'v4.9-rc8' into drm-next Linux 4.9-rc8 Daniel requested this so we could apply some follow on fixes cleanly to -next.	2016-12-05 17:11:48 +10:00
Ingo Molnar	1b95b1a06c	Merge branch 'locking/urgent' into locking/core, to pick up dependent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-02 11:13:44 +01:00
Fenghua Yu	74fcdae1a7	x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled intel_rdt_sched_in() must be called with preemption disabled because the function accesses percpu variables (pqr_state and closid). If a task moves itself via move_myself() preemption is enabled, which violates the calling convention and can result in incorrect closid selection when the task gets preempted or migrated. Add the required protection and a comment about the calling convention. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Marcelo Tosatti" <mtosatti@redhat.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1480625714-54246-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-02 01:13:02 +01:00
Tomasz Nowicki	9f9a35a7b6	ACPI / APEI / ARM64: APEI initial support for ARM64 This patch provides APEI arch-specific bits for ARM64 Meanwhile, (1) Move HEST type (ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) checking to a generic place. (2) Select HAVE_ACPI_APEI when EFI and ACPI is set on ARM64, because arch_apei_get_mem_attribute is using efi_mem_attributes() on ARM64. Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Fu Wei <fu.wei@linaro.org> [ Fu Wei: improve && upstream ] Acked-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-02 00:24:34 +01:00
Thomas Gleixner	31f8a651fc	x86/tsc: Validate cpumask pointer before accessing it 0-day testing encountered a NULL pointer dereference in a cpumask access from tsc_store_and_check_tsc_adjust(). This happens when the function is called on the boot CPU and the topology masks are not yet available due to CPUMASK_OFFSTACK=y. Add a NULL pointer check for the mask pointer. If NULL it's safe to assume that the CPU is the boot CPU and the first one in the package. Fixes: `8b223bc7ab` ("x86/tsc: Store and check TSC ADJUST MSR") Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-01 14:40:52 +01:00
Thiago Jung Bauermann	ec2b9bfaac	kexec_file: Change kexec_add_buffer to take kexec_buf as argument. This is done to simplify the kexec_add_buffer argument list. Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer. In addition, change the type of kexec_buf.buffer from char * to void . There is no particular reason for it to be a char , and the change allows us to get rid of 3 existing casts to char * in the code. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-11-30 23:14:59 +11:00
Thomas Gleixner	b836554386	x86/tsc: Fix broken CONFIG_X86_TSC=n build Add the missing return statement to the inline stub tsc_store_and_check_tsc_adjust() and add the other stubs to make a SMP=y,TSC=n build happy. While at it, remove the unused variable from the UP variant of tsc_store_and_check_tsc_adjust(). Fixes: commit ba75fb646931 ("x86/tsc: Sync test only for the first cpu in a package") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-30 09:44:52 +01:00
Ingo Molnar	0a21fc1214	sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable Right now CONFIG_SCHED_MC_PRIO has X86_INTEL_PSTATE as a dependency, which is not enabled by default and which hides the CONFIG_SCHED_MC_PRIO hardware-enabling feature. Select X86_INTEL_PSTATE instead, plus its dependency (CPU_FREQ), if the user enables CONFIG_SCHED_MC_PRIO=y. (Also align the CONFIG_SCHED_MC_PRIO Kconfig help text in standard style.) Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: bp@suse.de Cc: jolsa@redhat.com Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: rjw@rjwysocki.net Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-30 08:36:10 +01:00
Tim Chen	de966cf4a4	sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0 to CONFIG_SCHED_MC_PRIO. This makes the configuration extensible in future to other architectures that wish to similarly establish CPU core priorities support in the scheduler. The description in Kconfig is updated to reflect this change with added details for better clarity. The configuration is explicitly default-y, to enable the feature on CPUs that have this feature. It has no effect on non-TBM3 CPUs. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: jolsa@redhat.com Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-30 08:27:08 +01:00
Dave Airlie	35838b470a	Merge tag 'drm-intel-next-2016-11-21' of git://anongit.freedesktop.org/git/drm-intel into drm-next Final 4.10 updates: - fine-tune fb flushing and tracking (Chris Wilson) - refactor state check dumper code for more conciseness (Tvrtko) - roll out dev_priv all over the place (Tvrkto) - finally remove __i915__ magic macro (Tvrtko) - more gvt bugfixes (Zhenyu&team) - better opregion CADL handling (Jani) - refactor/clean up wm programming (Maarten) - gpu scheduler + priority boosting for flips as first user (Chris Wilson) - make fbc use more atomic (Paulo) - initial kvm-gvt framework, but not yet complete (Zhenyu&team) * tag 'drm-intel-next-2016-11-21' of git://anongit.freedesktop.org/git/drm-intel: (127 commits) drm/i915: Update DRIVER_DATE to 20161121 drm/i915: Skip final clflush if LLC is coherent drm/i915: Always flush the dirty CPU cache when pinning the scanout drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error drm/i915: Check that each request phase is completed before retiring drm/i915: i915_pages_create_for_stolen should return err ptr drm/i915: Enable support for nonblocking modeset drm/i915: Be more careful to drop the GT wakeref drm/i915: Move frontbuffer CS write tracking from ggtt vma to object drm/i915: Only dump dp_m2_n2 configuration when drrs is used drm/i915: don't leak global_timeline drm/i915: add i915_address_space_fini drm/i915: Add a few more sanity checks for stolen handling drm/i915: Waterproof verification of gen9 forcewake table ranges drm/i915: Introduce enableddisabled helper drm/i915: Only dump possible panel fitter config for the platform drm/i915: Only dump scaler config where supported drm/i915: Compact a few pipe config debug lines drm/i915: Don't log pipe config kernel pointer and duplicated pipe name drm/i915: Dump FDI config only where applicable ...	2016-11-30 14:21:35 +10:00
Thomas Gleixner	cc4db26899	x86/tsc: Try to adjust TSC if sync test fails If the first CPU of a package comes online, it is necessary to test whether the TSC is in sync with a CPU on some other package. When a deviation is observed (time going backwards between the two CPUs) the TSC is marked unstable, which is a problem on large machines as they have to fall back to the HPET clocksource, which is insanely slow. It has been attempted to compensate the TSC by adding the offset to the TSC and writing it back some time ago, but this never was merged because it did not turn out to be stable, especially not on older systems. Modern systems have become more stable in that regard and the TSC_ADJUST MSR allows us to compensate for the time deviation in a sane way. If it's available allow up to three synchronization runs and if a time warp is detected the starting CPU can compensate the time warp via the TSC_ADJUST MSR and retry. If the third run still shows a deviation or when random time warps are detected the test terminally fails. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134018.048237517@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:18 +01:00
Thomas Gleixner	76d3b85158	x86/tsc: Prepare warp test for TSC adjustment To allow TSC compensation cross nodes its necessary to know in which direction the TSC warp was observed. Return the maximum observed value on the calling CPU so the caller can determine the direction later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.970859287@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:18 +01:00
Thomas Gleixner	4c5e3c6375	x86/tsc: Move sync cleanup to a safe place Cleaning up the stop marker on the control CPU is wrong when we want to add retry support. Move the cleanup to the starting CPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.892095627@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:18 +01:00
Thomas Gleixner	a36f513681	x86/tsc: Sync test only for the first cpu in a package If the TSC_ADJUST MSR is available all CPUs in a package are forced to the same value. So TSCs cannot be out of sync when the first CPU in the package was in sync. That allows to skip the sync test for all CPUs except the first starting CPU in a package. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.809901363@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:17 +01:00
Thomas Gleixner	1d0095feea	x86/tsc: Verify TSC_ADJUST from idle When entering idle, it's a good oportunity to verify that the TSC_ADJUST MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is detected, emit a warning and restore it to the previous value. This is especially important for machines, which mark the TSC reliable because there is no watchdog clocksource available (SoCs). This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never goes idle, but adding a timer to do the check periodically is not an option either. On a machine, which has this issue, the check triggeres right during boot, so there is a decent chance that the sysadmin will notice. Rate limit the check to once per second and warn only once per cpu. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:16 +01:00
Thomas Gleixner	8b223bc7ab	x86/tsc: Store and check TSC ADJUST MSR The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful in a two aspects: 1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the cycles spent by storing the TSC value at SMM entry and restoring it at SMM exit. On affected machines the TSCs run slowly out of sync up to the point where the clocksource watchdog (if available) detects it. The TSC_ADJUST MSR allows to detect the TSC modification before that and eventually restore it. This is also important for SoCs which have no watchdog clocksource and therefore TSC wreckage cannot be detected and acted upon. 2) All threads in a package are required to have the same TSC_ADJUST value. Broken BIOSes break that and as a result the TSC synchronization check fails. The TSC_ADJUST MSR allows to detect the deviation when a CPU comes online. If detected set it to the value of an already online CPU in the same package. This also allows to reduce the number of sync tests because with that in place the test is only required for the first CPU in a package. In principle all CPUs in a system should have the same TSC_ADJUST value even across packages, but with physical CPU hotplug this assumption is not true because the TSC starts with power on, so physical hotplug has to do some trickery to bring the TSC into sync with already running packages, which requires to use an TSC_ADJUST value different from CPUs which got powered earlier. A final enhancement is the opportunity to compensate for unsynced TSCs accross nodes at boot time and make the TSC usable that way. It won't help for TSCs which run apart due to frequency skew between packages, but this gets detected by the clocksource watchdog later. The first step toward this is to store the TSC_ADJUST value of a starting CPU and compare it with the value of an already online CPU in the same package. If they differ, emit a warning and adjust it to the reference value. The !SMP version just stores the boot value for later verification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:16 +01:00
Thomas Gleixner	bec8520dca	x86/tsc: Detect random warps If time warps can be observed then they should only ever be observed on one CPU. If they are observed on both CPUs then the system is completely hosed. Add a check for this condition and notify if it happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.574838461@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:15 +01:00
Thomas Gleixner	7b3d2f6e08	x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() The art detection uses rdmsrl_safe() to detect the availablity of the TSC_ADJUST MSR. That's pointless because we have a feature bit for this. Use it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.483561692@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:15 +01:00
Chen Yu	ba58d1020a	timekeeping: Ignore the bogus sleep time if pm_trace is enabled Power management suspend/resume tracing (ab)uses the RTC to store suspend/resume information persistently. As a consequence the RTC value is clobbered when timekeeping is resumed and tries to inject the sleep time. Commit `a4f8f6667f` ("timekeeping: Cap array access in timekeeping_debug") plugged a out of bounds array access in the timekeeping debug code which was caused by the clobbered RTC value, but we still use the clobbered RTC value for sleep time injection into kernel timekeeping, which will result in random adjustments depending on the stored "hash" value. To prevent this keep track of the RTC clobbering and ignore the invalid RTC timestamp at resume. If the system resumed successfully clear the flag, which marks the RTC as unusable, warn the user about the RTC clobber and recommend to adjust the RTC with 'ntpdate' or 'rdate'. [jstultz: Fixed up pr_warn formating, and implemented suggestions from Ingo] [ tglx: Rewrote changelog ] Originally-from: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1480372524-15181-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 18:02:58 +01:00
Herbert Xu	85671860ca	crypto: aesni - Convert to skcipher This patch converts aesni (including fpu) over to the skcipher interface. The LRW implementation has been removed as the generic LRW code can now be used directly on top of the accelerated ECB implementation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-28 21:23:20 +08:00
Herbert Xu	065ce32737	crypto: glue_helper - Add skcipher xts helpers This patch adds xts helpers that use the skcipher interface rather than blkcipher. This will be used by aesni_intel. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-28 21:23:20 +08:00
Fenghua Yu	0efc89be94	x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount When removing a sub directory/rdtgroup by rmdir or umount, closid in a task in the sub directory is set to default rdtgroup's closid which is 0. If the task is running on a CPU, the PQR_ASSOC MSR is only updated when the task runs through a context switch. Up to the context switch, the task runs with the wrong closid. Make the change immediately effective by invoking a smp function call on all CPUs which are running moved task. If one of the affected tasks was moved or scheduled out before the function call is executed on the CPU the only damage is the extra interruption of the CPU. [ tglx: Reworked it to avoid blindly interrupting all CPUs and extra loops ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1479511084-59727-2-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-28 11:07:50 +01:00
Fenghua Yu	2659f46da8	x86/intel_rdt: Fix setting of closid when adding CPUs to a group There was a cut & paste error when adding code to update the per-cpu closid when changing the bitmask of CPUs to an rdt group. The update erronously assigns the closid of the default group to the CPUs which are moved to a group instead of assigning the closid of their new group. Use the proper closid. Fixes: `f410770293` ("x86/intel_rdt: Update percpu closid immeditately on CPUs affected by change") Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1479511084-59727-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-28 11:07:50 +01:00
Ingo Molnar	a293b3954a	x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> asm/mutex.h is gone from the locking tree, which makes sched/core break the build. Use linux/mutex.h instead, which is the canonical method. Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: bp@suse.de Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 09:43:49 +01:00
Paul Bolle	9190e21780	x86/build: Remove three unneeded genhdr-y entries In x86's include/asm/Kbuild three entries are appended to the genhdr-y make variable: genhdr-y += unistd_32.h genhdr-y += unistd_64.h genhdr-y += unistd_x32.h The same entries are also appended to that variable in include/uapi/asm/Kbuild. So commit: `10b63956fc` ("UAPI: Plumb the UAPI Kbuilds into the user header installation and checking") ... removed these three entries from include/asm/Kbuild. But, apparently, some merge conflict resolution re-added them. The net effect is, in short, that the genhdr-y make variable contains these file names twice and, as a consequence, that the corresponding headers get installed twice. And so the build prints: INSTALL usr/include/asm/ (65 files) ... while in reality only 62 files are installed in that directory. Nothing breaks because of all that, but it's a good idea to finally remove these unneeded entries nevertheless. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1480077707-2837-1-git-send-email-pebolle@tiscali.nl Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:49:17 +01:00
Paul Bolle	06cbbac0f5	x86/build: Don't use $(LINUXINCLUDE) twice The make variable KBUILD_CFLAGS contains $(LINUXINCLUDE). But the build already picks up $(LINUXINCLUDE) from scripts/Makefile.lib. The net effect is that the (long) list of include directories is used twice. This is harmless but pointless. So stop using $(LINUXINCLUDE) twice. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1480077514-2586-1-git-send-email-pebolle@tiscali.nl Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:49:17 +01:00
Josh Poimboeuf	55f856e640	x86/unwind: Fix guess-unwinder regression My attempt at fixing some KASAN false positive warnings was rather brain dead, and it broke the guess unwinder. With frame pointers disabled, /proc/<pid>/stack is broken: # cat /proc/1/stack [<ffffffffffffffff>] 0xffffffffffffffff Restore the code flow to more closely resemble its previous state, while still using READ_ONCE_NOCHECK() macros to silence KASAN false positives. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `c2d75e03d6` ("x86/unwind: Prevent KASAN false positive warnings in guess unwinder") Link: http://lkml.kernel.org/r/b824f92c2c22eca5ec95ac56bd2a7c84cf0b9df9.1480309971.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:47:54 +01:00
Peter Foley	adee8705d2	x86/build: Annotate die() with noreturn to fix build warning on clang Fixes below warning with clang: In file included from ../arch/x86/tools/relocs_64.c:17: ../arch/x86/tools/relocs.c:977:6: warning: variable 'do_reloc' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] Signed-off-by: Peter Foley <pefoley2@pefoley.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161126222229.673-1-pefoley2@pefoley.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:47:22 +01:00
Borislav Petkov	20ab667771	x86/platform/olpc: Fix resume handler build warning Fix: arch/x86/platform/olpc/olpc-xo15-sci.c:199:12: warning: ‘xo15_sci_resume’ defined but not used [-Wunused-function] static int xo15_sci_resume(struct device *dev) ^ which I see in randconfig builds here. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161126142706.13602-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:46:03 +01:00
Borislav Petkov	6248f45674	x86/boot/64: Optimize fixmap page fixup Single-stepping through head_64.S made me look at the fixmap page PTEs fixup loop: So we're going through the whole level2_fixmap_pgt 4K page, looking at whether PAGE_PRESENT is set in those PTEs and add the delta between where we're compiled to run and where we actually end up running. However, if that delta is 0 (most cases) we go through all those 512 PTEs for no reason at all. Oh well, we add 0 but that's no reason to me. Skipping that useless fixup gives us a boot speedup of 0.004 seconds in my guest. Not a lot but considering how cheap it is, I'll take it. Here is the printk time difference: before: ... [ 0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized [ 0.013590] Calibrating delay loop (skipped), value calculated using timer frequency.. 8027.17 BogoMIPS (lpj=16054348) [ 0.017094] pid_max: default: 32768 minimum: 301 ... after: ... [ 0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized [ 0.009587] Calibrating delay loop (skipped), value calculated using timer frequency.. 8026.86 BogoMIPS (lpj=16053724) [ 0.013090] pid_max: default: 32768 minimum: 301 ... For the other two changes converting naked numbers to defines: # arch/x86/kernel/head_64.o: text data bss dec hex filename 1124 290864 4096 296084 48494 head_64.o.before 1124 290864 4096 296084 48494 head_64.o.after md5: 87086e202588939296f66e892414ffe2 head_64.o.before.asm 87086e202588939296f66e892414ffe2 head_64.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161125111448.23623-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:45:17 +01:00
Linus Torvalds	fc13ca191e	KVM fixes for v4.9-rc7 Four fixes for bugs found by syzkaller on x86, all for stable. -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJYObr8AAoJEED/6hsPKofocbIH/j3p7QB73rDM2OCBhzTgGoOb hcMLXnYEBD5C48ym2QW+wTEWJNNBikKOknYDX8wD1fIsaf8QoMqjEOSyxLPlexWI mfTZnRAqSqYY9sPdlexpGAQV1uusCoIf2q9A+kW9Yy5q9ngzimiimRtFXgb/u6o5 mXZc7WcM8ZYSYdS+0Bz1lL6k1MGt1Yn207tQ3QNdWi4Pn6aWZp3+8C7rLjWu5zq8 LkMRsgedyxjULnyXedF+/IaXlC7qVO2LVwdxuHWsmeAPp/GmrNbAD+/4JKNk/Sgz DPcPOWB/cCcCbWVY/8k+gRm0mnknX4bqYnwHwju++gwiUmJXIg3vWKfCDUw2SN0= =MnV8 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "Four fixes for bugs found by syzkaller on x86, all for stable" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: check for pic and ioapic presence before use KVM: x86: fix out-of-bounds accesses of rtc_eoi map KVM: x86: drop error recovery in em_jmp_far and em_ret_far KVM: x86: fix out-of-bounds access in lapic	2016-11-26 12:18:59 -08:00
Borislav Petkov	9b032d21f6	x86/boot/64: Use defines for page size ... instead of naked numbers like the rest of the asm does in this file. No code changed: # arch/x86/kernel/head_64.o: text data bss dec hex filename 1124 290864 4096 296084 48494 head_64.o.before 1124 290864 4096 296084 48494 head_64.o.after md5: 87086e202588939296f66e892414ffe2 head_64.o.before.asm 87086e202588939296f66e892414ffe2 head_64.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161124210550.15025-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-25 07:11:29 +01:00
Tim Chen	d3d37d850d	x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Some Intel cores in a package can be boosted to a higher turbo frequency with ITMT 3.0 technology. The scheduler can use the asymmetric packing feature to move tasks to the more capable cores. If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core sched domains to enable asymmetric packing. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:20 +01:00
Tim Chen	f9793e3495	x86/sysctl: Add sysctl for ITMT scheduling feature Intel Turbo Boost Max Technology 3.0 (ITMT) feature allows some cores to be boosted to higher turbo frequency than others. Add /proc/sys/kernel/sched_itmt_enabled so operator can enable/disable scheduling of tasks that favor cores with higher turbo boost frequency potential. By default, system that is ITMT capable and single socket has this feature turned on. It is more likely to be lightly loaded and operates in Turbo range. When there is a change in the ITMT scheduling operation desired, a rebuild of the sched domain is initiated so the scheduler can set up sched domains with appropriate flag to enable/disable ITMT scheduling operations. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:19 +01:00
Tim Chen	5e76b2ab36	x86: Enable Intel Turbo Boost Max Technology 3.0 On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum turbo frequencies of some cores in a CPU package may be higher than for the other cores in the same package. In that case, better performance (and possibly lower energy consumption as well) can be achieved by making the scheduler prefer to run tasks on the CPUs with higher max turbo frequencies. To that end, set up a core priority metric to abstract the core preferences based on the maximum turbo frequency. In that metric, the cores with higher maximum turbo frequencies are higher-priority than the other cores in the same package and that causes the scheduler to favor them when making load-balancing decisions using the asymmertic packing approach. At the same time, the priority of SMT threads with a higher CPU number is reduced so as to avoid scheduling tasks on all of the threads that belong to a favored core before all of the other cores have been given a task to run. The priority metric will be initialized by the P-state driver with the help of the sched_set_itmt_core_prio() function. The P-state driver will also determine whether or not ITMT is supported by the platform and will call sched_set_itmt_support() to indicate that. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:19 +01:00
Tim Chen	7d25127cef	x86/topology: Define x86's arch_update_cpu_topology The scheduler calls arch_update_cpu_topology() to check whether the scheduler domains have to be rebuilt. So far x86 has no requirement for this, but the upcoming ITMT support makes this necessary. Request the rebuild when the x86 internal update flag is set. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:19 +01:00
Radim Krčmář	df492896e6	KVM: x86: check for pic and ioapic presence before use Split irqchip allows pic and ioapic routes to be used without them being created, which results in NULL access. Check for NULL and avoid it. (The setup is too racy for a nicer solutions.) Found by syzkaller: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events irqfd_inject task: ffff88006a06c7c0 task.stack: ffff880068638000 RIP: 0010:[...] [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221 RSP: 0000:ffff88006863ea20 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0 R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0 Stack: ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000 ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082 0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000 Call Trace: [...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746 [...] __raw_spin_lock include/linux/spinlock_api_smp.h:144 [...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151 [...] spin_lock include/linux/spinlock.h:302 [...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379 [...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52 [...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101 [...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60 [...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096 [...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230 [...] kthread+0x328/0x3e0 kernel/kthread.c:209 [...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: `49df6397ed` ("KVM: x86: Split the APIC from the rest of IRQCHIP.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-24 18:39:28 +01:00
Radim Krčmář	81cdb259fb	KVM: x86: fix out-of-bounds accesses of rtc_eoi map KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be bigger that the maximal number of VCPUs, resulting in out-of-bounds access. Found by syzkaller: BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...] Write of size 1 by task a.out/27101 CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905 [...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495 [...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86 [...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360 [...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222 [...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235 [...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670 [...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668 [...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999 [...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: `af1bae5497` ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-24 18:37:19 +01:00
Radim Krčmář	2117d5398c	KVM: x86: drop error recovery in em_jmp_far and em_ret_far em_jmp_far and em_ret_far assumed that setting IP can only fail in 64 bit mode, but syzkaller proved otherwise (and SDM agrees). Code segment was restored upon failure, but it was left uninitialized outside of long mode, which could lead to a leak of host kernel stack. We could have fixed that by always saving and restoring the CS, but we take a simpler approach and just break any guest that manages to fail as the error recovery is error-prone and modern CPUs don't need emulator for this. Found by syzkaller: WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480 Kernel panic - not syncing: panic_on_warn set ... CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __dump_stack lib/dump_stack.c:15 [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [...] panic+0x1b7/0x3a3 kernel/panic.c:179 [...] __warn+0x1c4/0x1e0 kernel/panic.c:542 [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217 [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227 [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294 [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545 [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116 [...] complete_emulated_io arch/x86/kvm/x86.c:6870 [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934 [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978 [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557 [...] vfs_ioctl fs/ioctl.c:43 [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679 [...] SYSC_ioctl fs/ioctl.c:694 [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685 [...] entry_SYSCALL_64_fastpath+0x1f/0xc2 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: `d1442d85cc` ("KVM: x86: Handle errors when RIP is set during far jumps") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-24 18:36:54 +01:00
Radim Krčmář	444fdad88f	KVM: x86: fix out-of-bounds access in lapic Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff. With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a userspace can send an interrupt with dest_id that results in out-of-bounds access. Found by syzkaller: BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750 Read of size 8 by task syz-executor/22923 CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [...] Call Trace: [...] __dump_stack lib/dump_stack.c:15 [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156 [...] print_address_description mm/kasan/report.c:194 [...] kasan_report_error mm/kasan/report.c:283 [...] kasan_report+0x231/0x500 mm/kasan/report.c:303 [...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329 [...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824 [...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72 [...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157 [...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74 [...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015 [...] vfs_ioctl fs/ioctl.c:43 [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679 [...] SYSC_ioctl fs/ioctl.c:694 [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685 [...] entry_SYSCALL_64_fastpath+0x1f/0xc2 Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: `e45115b62f` ("KVM: x86: use physical LAPIC array for logical x2APIC") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-24 18:35:53 +01:00
Tom Lendacky	8370c3d08b	kvm: svm: Add kvm_fast_pio_in support Update the I/O interception support to add the kvm_fast_pio_in function to speed up the in instruction similar to the out instruction. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-24 18:32:45 +01:00
Tom Lendacky	147277540b	kvm: svm: Add support for additional SVM NPF error codes AMD hardware adds two additional bits to aid in nested page fault handling. Bit 32 - NPF occurred while translating the guest's final physical address Bit 33 - NPF occurred while translating the guest page tables The guest page tables fault indicator can be used as an aid for nested virtualization. Using V0 for the host, V1 for the first level guest and V2 for the second level guest, when both V1 and V2 are using nested paging there are currently a number of unnecessary instruction emulations. When V2 is launched shadow paging is used in V1 for the nested tables of V2. As a result, KVM marks these pages as RO in the host nested page tables. When V2 exits and we resume V1, these pages are still marked RO. Every nested walk for a guest page table is treated as a user-level write access and this causes a lot of NPFs because the V1 page tables are marked RO in the V0 nested tables. While executing V1, when these NPFs occur KVM sees a write to a read-only page, emulates the V1 instruction and unprotects the page (marking it RW). This patch looks for cases where we get a NPF due to a guest page table walk where the page was marked RO. It immediately unprotects the page and resumes the guest, leading to far fewer instruction emulations when nested virtualization is used. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-24 18:32:26 +01:00
Dan Carpenter	c4597fd756	x86/apic/uv: Silence a shift wrapping warning 'm_io' is stored in 6 bits so it's a number in the 0-63 range. Static analysis tools complain that 1 << 63 will wrap so I have changed it to 1ULL << m_io. This code is over three years old so presumably the bug doesn't happen very frequently in real life or someone would have complained by now. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Fixes: `b15cc4a12b` ("x86, uv, uv3: Update x2apic Support for SGI UV3") Link: http://lkml.kernel.org/r/20161123221908.GA23997@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-24 06:01:05 +01:00
Dmitry Safonov	7b2dd36828	x86/coredump: Always use user_regs_struct for compat_elf_gregset_t Commit: `90954e7b94` ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag") changed the coredumping code to construct the elf coredump file according to register set size - and that's good: if binary crashes with 32-bit code selector, generate 32-bit ELF core, otherwise - 64-bit core. That was made for restoring 32-bit applications on x86_64: we want 32-bit application after restore to generate 32-bit ELF dump on crash. All was quite good and recently I started reworking 32-bit applications dumping part of CRIU: now it has two parasites (32 and 64) for seizing compat/native tasks, after rework it'll have one parasite, working in 64-bit mode, to which 32-bit prologue long-jumps during infection. And while it has worked for my work machine, in VM with !CONFIG_X86_X32_ABI during reworking I faced that segfault in 32-bit binary, that has long-jumped to 64-bit mode results in dereference of garbage: 32-victim[19266]: segfault at f775ef65 ip 00000000f775ef65 sp 00000000f776aa50 error 14 BUG: unable to handle kernel paging request at ffffffffffffffff IP: [<ffffffff81332ce0>] strlen+0x0/0x20 [...] Call Trace: [] elf_core_dump+0x11a9/0x1480 [] do_coredump+0xa6b/0xe60 [] get_signal+0x1a8/0x5c0 [] do_signal+0x23/0x660 [] exit_to_usermode_loop+0x34/0x65 [] prepare_exit_to_usermode+0x2f/0x40 [] retint_user+0x8/0x10 That's because we have 64-bit registers set (with according total size) and we're writing it to elf_thread_core_info which has smaller size on !CONFIG_X86_X32_ABI. That lead to overwriting ELF notes part. Tested on 32-, 64-bit ELF crashes and on 32-bit binaries that have jumped with 64-bit code selector - all is readable with gdb. Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: `90954e7b94` ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-24 06:01:05 +01:00
Linus Torvalds	ded9b5dd20	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Six fixes for bugs that were found via fuzzing, and a trivial hw-enablement patch for AMD Family-17h CPU PMUs" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Allow only a single PMU/box within an events group perf/x86/intel: Cure bogus unwind from PEBS entries perf/x86: Restore TASK_SIZE check on frame pointer perf/core: Fix address filter parser perf/x86: Add perf support for AMD family-17h processors perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB client uncore IMC perf/core: Do not set cpuctx->cgrp for unscheduled cgroups	2016-11-23 08:09:21 -08:00
Tony Luck	3f5a7896a5	x86/mce: Include the PPIN in MCE records when available Intel Xeons from Ivy Bridge onwards support a processor identification number set in the factory. To the user this is a handy unique number to identify a particular CPU. Intel can decode this to the fab/production run to track errors. On systems that have it, include it in the machine check record. I'm told that this would be helpful for users that run large data centers with multi-socket servers to keep track of which CPUs are seeing errors. Boris: * Add some clarifying comments and spacing. * Mask out [63:2] in the disabled-but-not-locked case * Call the MSR variable "val" for more readability. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20161123114855.njguoaygp3qnbkia@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-23 16:51:52 +01:00
Ingo Molnar	ec84f00567	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-23 10:23:09 +01:00
Ingo Molnar	064e6a8ba6	Merge branch 'linus' into x86/fpu, to resolve conflicts Conflicts: arch/x86/kernel/fpu/core.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-23 07:18:09 +01:00
Sebastian Andrzej Siewior	c8b877a5e5	x86/pci/amd-bus: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. The smp_call_function_single() is dropped because the ONLINE callback is invoked on the target CPU since commit `1cf4f629d9` ("cpu/hotplug: Move online calls to hotplugged cpu"). smp_call_function_single() invokes the invoked function with interrupts disabled, but this calling convention is not preserved as the MSR is not modified by anything else than this code. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Cc: rt@linuxtronix.de Link: http://lkml.kernel.org/r/20161117183541.8588-21-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:43 +01:00
Sebastian Andrzej Siewior	89666c5047	x86/oprofile/nmi: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Robert Richter <rric@kernel.org> Cc: oprofile-list@lists.sf.net Cc: rt@linuxtronix.de Link: http://lkml.kernel.org/r/20161117183541.8588-20-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:43 +01:00
Anna-Maria Gleixner	08ed487c81	x86/oprofile/nmi: Remove superfluous smp_function_call_single() Since commit `1cf4f629d9` ("cpu/hotplug: Move online calls to hotplugged cpu") the CPU_ONLINE and CPU_DOWN_PREPARE notifiers are always run on the hot plugged CPU, and as of commit `3b9d6da67e` ("cpu/hotplug: Fix rollback during error-out in __cpu_disable()") the CPU_DOWN_FAILED notifier also runs on the hot plugged CPU. This patch converts the SMP functional calls into direct calls. smp_call_function_single() executes the function with interrupts disabled. This calling convention is preserved. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Robert Richter <rric@kernel.org> Cc: rt@linuxtronix.de Cc: oprofile-list@lists.sf.net Link: http://lkml.kernel.org/r/20161117183541.8588-19-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:42 +01:00
Sebastian Andrzej Siewior	8fba38c937	x86/msr: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Move the callbacks to online/offline as there is no point in having the files around before the cpu is online and until its completely gone. [ tglx: Move the callbacks to online/offline ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: rt@linuxtronix.de Link: http://lkml.kernel.org/r/20161117183541.8588-4-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:39 +01:00
Thomas Gleixner	ee92be9b0d	x86/cpuid: Move the hotplug callbacks to online No point to have this file around before the cpu is online and no point to have it around until the cpu is dead. Get rid of the explicit state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de>	2016-11-22 23:34:39 +01:00
Sebastian Andrzej Siewior	8c07b494ab	x86/cpuid: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: rt@linuxtronix.de Link: http://lkml.kernel.org/r/20161117183541.8588-3-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:39 +01:00
Thomas Gleixner	33d97302eb	x86/mce/therm_throt: Move hotplug callbacks to online No point to have the sysfs files around before the cpu is online and no point to have them around until the cpu is dead. Get rid of the explicit state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de>	2016-11-22 23:34:38 +01:00
Sebastian Andrzej Siewior	d6526e73db	x86/mce/therm_throt: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linuxtronix.de Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161117183541.8588-2-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:38 +01:00
Linus Torvalds	7cfc4317ea	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - two fixes to make (very) old Intel CPUs boot reliably - fix the intel-mid driver and rename it - two KASAN false positive fixes - an FPU fix - two sysfb fixes - two build fixes related to new toolchain versions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well x86/platform/intel-mid: Register watchdog device after SCU x86/fpu: Fix invalid FPU ptrace state after execve() x86/boot: Fail the boot if !M486 and CPUID is missing x86/traps: Ignore high word of regs->cs in early_fixup_exception() x86/dumpstack: Prevent KASAN false positive warnings x86/unwind: Prevent KASAN false positive warnings in guess unwinder x86/boot: Avoid warning for zero-filling .bss x86/sysfb: Fix lfb_size calculation x86/sysfb: Add support for 64bit EFI lfb_base	2016-11-22 12:17:49 -08:00
Bandan Das	ae0f549951	kvm: x86: don't print warning messages for unimplemented msrs Change unimplemented msrs messages to use pr_debug. If CONFIG_DYNAMIC_DEBUG is set, then these messages can be enabled at run time or else -DDEBUG can be used at compile time to enable them. These messages will still be printed if ignore_msrs=1. Signed-off-by: Bandan Das <bsd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-22 18:29:10 +01:00
Jan Dakinevich	bcdde302b8	KVM: nVMX: invvpid handling improvements - Expose all invalidation types to the L1 - Reject invvpid instruction, if L1 passed zero vpid value to single context invalidations Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Tested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-22 17:26:42 +01:00
Jan Dakinevich	63f3ac4813	KVM: VMX: clean up declaration of VPID/EPT invalidation types - Remove VMX_EPT_EXTENT_INDIVIDUAL_ADDR, since there is no such type of EPT invalidation - Add missing VPID types names Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Tested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-22 17:26:15 +01:00
Jim Mattson	c7dd15b337	kvm: x86: CPUID.01H:EDX.APIC[bit 9] should mirror IA32_APIC_BASE[11] From the Intel SDM, volume 3, section 10.4.3, "Enabling or Disabling the Local APIC," When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent to an IA-32 processor without an on-chip APIC. The CPUID feature flag for the APIC (see Section 10.4.2, "Presence of the Local APIC") is also set to 0. Signed-off-by: Jim Mattson <jmattson@google.com> [Changed subject tag from nVMX to x86.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-22 14:51:55 +01:00
Peter Zijlstra	3cded41794	x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted() when a paravirt spinlock enabled kernel is ran on native hardware. Do this by patching out the CALL instruction with "XOR %RAX,%RAX" which has the same effect (0 return value). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:11 +01:00
Juergen Gross	de7689cf8f	x86/xen: Support the vCPU preemption check Support the vcpu_is_preempted() functionality under Xen. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. A quick test (4 vCPUs on 1 physical CPU doing a parallel build job with "make -j 8") reduced system time by about 5% with this patch. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-11-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:09 +01:00
Pan Xinhui	1885aa7041	x86/kvm: Support the vCPU preemption check Support the vcpu_is_preempted() functionality under KVM. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. struct kvm_steal_time::preempted indicates that if one vCPU is running or not after commit "x86, kvm/x86.c: support vCPU preempted check". unix benchmark result: host: kernel 4.8.1, i5-4570, 4 cpus guest: kernel 4.8.1, 8 vcpus test-case after-patch before-patch Execl Throughput \| 18307.9 lps \| 11701.6 lps File Copy 1024 bufsize 2000 maxblocks \| 1352407.3 KBps \| 790418.9 KBps File Copy 256 bufsize 500 maxblocks \| 367555.6 KBps \| 222867.7 KBps File Copy 4096 bufsize 8000 maxblocks \| 3675649.7 KBps \| 1780614.4 KBps Pipe Throughput \| 11872208.7 lps \| 11855628.9 lps Pipe-based Context Switching \| 1495126.5 lps \| 1490533.9 lps Process Creation \| 29881.2 lps \| 28572.8 lps Shell Scripts (1 concurrent) \| 23224.3 lpm \| 22607.4 lpm Shell Scripts (8 concurrent) \| 3531.4 lpm \| 3211.9 lpm System Call Overhead \| 10385653.0 lps \| 10419979.0 lps Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-10-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:08 +01:00
Pan Xinhui	0b9f6c4615	x86/kvm: Support the vCPU preemption check Support the vcpu_is_preempted() functionality under KVM. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. Use struct kvm_steal_time::preempted to indicate that if a vCPU is running or not. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-9-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:08 +01:00
Pan Xinhui	446f3dc8cc	locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests Optimize spinlock and mutex busy-loops by providing a vcpu_is_preempted(cpu) function on KVM and Xen platforms. Extend the pv_lock_ops interface accordingly and implement the callbacks on KVM and Xen. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Translated to English. ] Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-7-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:07 +01:00
Ingo Molnar	02cb689b2c	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:37:38 +01:00
Peter Zijlstra	033ac60c7f	perf/x86/intel/uncore: Allow only a single PMU/box within an events group Group validation expects all events to be of the same PMU; however is_uncore_pmu() is too wide, it matches _all_ uncore events, even across PMUs. This triggers failure when we group different events from different uncore PMUs, like: perf stat -vv -e '{uncore_cbox_0/config=0x0334/,uncore_qpi_0/event=1/}' -a sleep 1 Fix is_uncore_pmu() by only matching events to the box at hand. Note that generic code; ran after this step; will disallow this mixture of PMU events. Reported-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20161118125354.GQ3117@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:36:59 +01:00
Peter Zijlstra	b8000586c9	perf/x86/intel: Cure bogus unwind from PEBS entries Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event unwinds sometimes do 'weird' things. In particular, we seemed to be ending up unwinding from random places on the NMI stack. While it was somewhat expected that the event record BP,SP would not match the interrupt BP,SP in that the interrupt is strictly later than the record event, it was overlooked that it could be on an already overwritten stack. Therefore, don't copy the recorded BP,SP over the interrupted BP,SP when we need stack unwinds. Note that its still possible the unwind doesn't full match the actual event, as its entirely possible to have done an (I)RET between record and interrupt, but on average it should still point in the general direction of where the event came from. Also, it's the best we can do, considering. The particular scenario that triggered the bogus NMI stack unwind was a PEBS event with very short period, upon enabling the event at the tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly triggers a record (while still on the NMI stack) which in turn triggers the next PMI. This then causes back-to-back NMIs and we'll try and unwind the stack-frame from the last NMI, which obviously is now overwritten by our own. Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk> Cc: dvyukov@google.com <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: `ca037701a0` ("perf, x86: Add PEBS infrastructure") Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:36:58 +01:00
Johannes Weiner	ae31fe51a3	perf/x86: Restore TASK_SIZE check on frame pointer The following commit: `75925e1ad7` ("perf/x86: Optimize stack walk user accesses") ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual access_ok() check. Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE, whereas the access_ok() uses whatever the current address limit of the task is. We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and then see vmalloc faults when we access what looks like pointers into vmalloc space: [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290 [] CPU: 3 PID: 3685731 Comm: sh Tainted: G W 4.6.0-5_fbk1_223_gdbf0f40 #1 [] Call Trace: [] <NMI> [<ffffffff814717d1>] dump_stack+0x4d/0x6c [] [<ffffffff81076e43>] __warn+0xd3/0xf0 [] [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20 [] [<ffffffff8104a899>] vmalloc_fault+0x289/0x290 [] [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490 [] [<ffffffff8104b70c>] do_page_fault+0xc/0x10 [] [<ffffffff81794e82>] page_fault+0x22/0x30 [] [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0 [] [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190 [] [<ffffffff811512c7>] perf_callchain+0x67/0x80 [] [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370 [] [<ffffffff8114e840>] perf_event_output+0x20/0x60 [] [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130 [] [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0 [] [<ffffffff8114f484>] perf_event_overflow+0x14/0x20 [] [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0 [] [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20 [] [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0 [] [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20 [] [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50 [] [<ffffffff8101ea31>] nmi_handle+0x61/0x110 [] [<ffffffff8101ef94>] default_do_nmi+0x44/0x110 [] [<ffffffff8101f13b>] do_nmi+0xdb/0x150 [] [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] <<EOE>> <IRQ> [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0 Fix this by moving the valid_user_frame() check to before the uaccess that loads the return address and the pointer to the next frame. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Fixes: `75925e1ad7` ("perf/x86: Optimize stack walk user accesses") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:36:58 +01:00
Yazen Ghannam	f5382de9d4	x86/mce/AMD: Add system physical address translation for AMD Fam17h The Unified Memory Controllers (UMCs) on Fam17h log a normalized address in their MCA_ADDR registers. We need to convert that normalized address to a system physical address in order to support a few facilities: 1) To offline poisoned pages in DRAM proactively in the deferred error handler. 2) To print sysaddr and page info for DRAM ECC errors in EDAC. [ Boris: fixes/cleanups ontop: * hi_addr_offset = 0 - no need for that branch. Stick it all under the HiAddrOffsetEn case. It confines hi_addr_offset's declaration too. * Move variables to the innermost scope they're used at so that we save on stack and not blow it up immediately on function entry. * Do not modify sys_addr prematurely - we want to not exit early and have modified sys_addr some, which callers get to see. We either convert to a sys_addr or we don't do anything. And we signal that with the retval of the function. * Rename label out -> out_err - because it is the error path. * No need to pr_err of the conversion failed case: imagine a sparsely-populated machine with UMCs which don't have DIMMs. Callers should look at the retval instead and issue a printk only when really necessary. No need for useless info in dmesg. * s/temp_reg/tmp/ and other variable names shortening => shorter code. * Use BIT() everywhere. * Make error messages more informative. * Small build fix for the !CONFIG_X86_MCE_AMD case. * ... and more minor cleanups. ] Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:30:16 +01:00
Josh Poimboeuf	3d02a9c48d	x86/dumpstack: Make stack name tags more comprehensible NMI stack dumps are bracketed by the following tags: <NMI> ... <EOE> The ending tag is kind of confusing if you don't already know what "EOE" means (end of exception). The same ending tag is also used to mark the end of all other exceptions' stacks. For example: <#DF> ... <EOE> And similarly, "EOI" is used as the ending tag for interrupts: <IRQ> ... <EOI> Change the tags to be more comprehensible by making them symmetrical and more XML-esque: <NMI> ... </NMI> <#DF> ... </#DF> <IRQ> ... </IRQ> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/180196e3754572540b595bc56b947d43658979a7.1479491159.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 13:00:42 +01:00
Andy Shevchenko	e5dce28688	x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt Rename the watchdog platform library file to explicitly show that is used only on Intel Merrifield platforms. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161118172723.179761-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 11:07:11 +01:00
H.J. Lu	a980ce352f	x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well Since the bootloader may load the compressed x86 kernel at any address, it should always be built as PIE, not just when CONFIG_RELOCATABLE=y. Otherwise, linker in binutils 2.27 will optimize GOT load into the absolute address when building the compressed x86 kernel as a non-PIE executable. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org [ Small wording changes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 11:05:28 +01:00
Borislav Petkov	254fe9c7a4	x86/MCE/AMD: Fix thinko about thresholding_en So adding thresholding_en et al was a good thing for removing the per-CPU thresholding callback, i.e., threshold_cpu_callback. But, in order for it to work and especially that test in mce_threshold_create_device() so that all thresholding banks get properly created and not the whole thing to fail with a NULL ptr dereference at mce_cpu_pre_down() when we offline the CPUs, we need to set the thresholding_en flag before we start creating the devices. Yap, it failed because thresholding_en wasn't set at the time we were creating the banks so we didn't create any and then at mce_cpu_pre_down() -> mce_threshold_remove_device() time, we would blow up. And the fix is actually easy: we have thresholding on the system when we have managed to set the thresholding vector to amd_threshold_interrupt() earlier in mce_amd_feature_init() while we were picking apart the thresholding banks and what is set and what not. So let's do that. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Fixes: `4d7b02d58c` ("x86/mcheck: Split threshold_cpu_callback into two callbacks") Link: http://lkml.kernel.org/r/20161119103402.5227-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 11:02:12 +01:00
Andy Shevchenko	8c5c86fb6a	x86/platform/intel-mid: Register watchdog device after SCU Watchdog device in Intel Tangier relies on SCU to be present. It uses the SCU IPC channel to send commands and receive responses. If watchdog driver is initialized quite before SCU and a command has been sent the result is always an error like the following: intel_mid_wdt: Error stopping watchdog: 0xffffffed Register watchdog device whne SCU is ready to avoid described issue. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161118165224.175514-1-andriy.shevchenko@linux.intel.com [ Small cleanups. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 10:59:14 +01:00
Yu-cheng Yu	b22cbe404a	x86/fpu: Fix invalid FPU ptrace state after execve() Robert O'Callahan reported that after an execve PTRACE_GETREGSET NT_X86_XSTATE continues to return the pre-exec register values until the exec'ed task modifies FPU state. The test code is at: https://bugzilla.redhat.com/attachment.cgi?id=1164286. What is happening is fpu__clear() does not properly clear fpstate. Fix it by doing just that. Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479402695-6553-1-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 10:38:35 +01:00
Andy Lutomirski	ed68d7e9b9	x86/boot: Fail the boot if !M486 and CPUID is missing Linux will have all kinds of sporadic problems on systems that don't have the CPUID instruction unless CONFIG_M486=y. In particular, sync_core() will explode. I believe that these kernels had a better chance of working before commit `05fb3c199b` ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID"). That commit inadvertently fixed a serious bug: we used to fail to detect the FPU if CPUID wasn't present. Because we also used to forget to set X86_FEATURE_ALWAYS, we end up with no cpu feature bits set at all. This meant that alternative patching didn't do anything and, if paravirt was disabled, we could plausibly finish the entire boot process without calling sync_core(). Rather than trying to work around these issues, just have the kernel fail loudly if it's running on a CPUID-less 486, doesn't have CPUID, and doesn't have CONFIG_M486 set. Reported-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/70eac6639f23df8be5fe03fa1984aedd5d40077a.1479598603.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 09:04:32 +01:00
Andy Lutomirski	fc0e81b2be	x86/traps: Ignore high word of regs->cs in early_fixup_exception() On the 80486 DX, it seems that some exceptions may leave garbage in the high bits of CS. This causes sporadic failures in which early_fixup_exception() refuses to fix up an exception. As far as I can tell, this has been buggy for a long time, but the problem seems to have been exacerbated by commits: `1e02ce4ccc` ("x86: Store a per-cpu shadow copy of CR4") `e1bfc11c5a` ("x86/init: Fix cr4_init_shadow() on CR4-less machines") This appears to have broken for as long as we've had early exception handling. [ Note to stable maintainers: This patch is needed all the way back to 3.4, but it will only apply to 4.6 and up, as it depends on commit: `0e861fbb5b` ("x86/head: Move early exception panic code into early_fixup_exception()") If you want to backport to kernels before 4.6, please don't backport the prerequisites (there was a big chain of them that rewrote a lot of the early exception machinery); instead, ask me and I can send you a one-liner that will apply. ] Reported-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: `4c5023a3fa` ("x86-32: Handle exception table entries during early boot") Link: http://lkml.kernel.org/r/cb32c69920e58a1a58e7b5cad975038a69c0ce7d.1479609510.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 08:06:54 +01:00
Linus Torvalds	dce9ce3615	KVM fixes for v4.9-rc6 ARM: - Fix handling of the 32bit cycle counter - Fix cycle counter filtering x86: - Fix a race leading to double unregistering of user notifiers - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead - Use SRCU around kvm_lapic_set_vapic_addr - Avoid recursive flushing of asynchronous page faults - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP - Let userspace know that KVM_GET_CLOCK is useful with master clock; 4.9 changed the return value to better match the guest clock, but didn't provide means to let guests take advantage of it -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJYMKbdAAoJEED/6hsPKofoPcEIAJF7hsuO3B2dMfUTz1EK+4IH B7JXr9mlAAEG61y82EY06Es+3gt69XBiE5iKBpxlL6jIJJiUOd+oOdygV0hv4D0K G6A03DsCWX16yJKjS7oGq4WOAiDGOpk7SU5YYlFZGqCzhaqScY2ecQFKEUYayJtt nXG+i22eFKccrD8wlkm3ZYEjl1Hif7bUmHfxL/CBec1cDNxOys1dB24VsZl90n89 7pMUtzOTskUXjbNX+cKmFtR18/XUdlucnn0w9AApf3M8GnmUxIjIaeFSLbzuNz84 U2o3LdxrYysSKSsc7VleHtWVfCbPbC62vpUI51XdNw0u7BHlKkVdvBfJEUmSpkw= =Crjd -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "ARM: - Fix handling of the 32bit cycle counter - Fix cycle counter filtering x86: - Fix a race leading to double unregistering of user notifiers - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead - Use SRCU around kvm_lapic_set_vapic_addr - Avoid recursive flushing of asynchronous page faults - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP - Let userspace know that KVM_GET_CLOCK is useful with master clock; 4.9 changed the return value to better match the guest clock, but didn't provide means to let guests take advantage of it" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr KVM: async_pf: avoid recursive flushing of work items kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use KVM: Disable irq while unregistering user notifier KVM: x86: do not go through vcpu in __get_kvmclock_ns KVM: arm64: Fix the issues when guest PMCCFILTR is configured arm64: KVM: pmu: Fix AArch32 cycle counter access	2016-11-19 13:31:40 -08:00
Paolo Bonzini	a2b07739ff	kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic kvm_arch_set_irq is unused since commit `b97e6de9c9`. Merge its functionality with kvm_arch_set_irq_inatomic. Reported-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-19 19:04:19 +01:00
Paolo Bonzini	7301d6abae	KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr Reported by syzkaller: [ INFO: suspicious RCU usage. ] 4.9.0-rc4+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage! stack backtrace: CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000 0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9 ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51 [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445 [< inline >] __kvm_memslots include/linux/kvm_host.h:534 [< inline >] kvm_memslots include/linux/kvm_host.h:541 [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941 [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217 Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: `fda4e2e855` Cc: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-19 19:04:18 +01:00
Paolo Bonzini	e3fd9a93a1	kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use Userspace can read the exact value of kvmclock by reading the TSC and fetching the timekeeping parameters out of guest memory. This however is brittle and not necessary anymore with KVM 4.11. Provide a mechanism that lets userspace know if the new KVM_GET_CLOCK semantics are in effect, and---since we are at it---if the clock is stable across all VCPUs. Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-19 19:04:16 +01:00
Ignacio Alvarado	1650b4ebc9	KVM: Disable irq while unregistering user notifier Function user_notifier_unregister should be called only once for each registered user notifier. Function kvm_arch_hardware_disable can be executed from an IPI context which could cause a race condition with a VCPU returning to user mode and attempting to unregister the notifier. Signed-off-by: Ignacio Alvarado <ikalvarado@google.com> Cc: stable@vger.kernel.org Fixes: `18863bdd60` ("KVM: x86 shared msr infrastructure") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-19 19:04:04 +01:00
Paolo Bonzini	8b95344064	KVM: x86: do not go through vcpu in __get_kvmclock_ns Going through the first VCPU is wrong if you follow a KVM_SET_CLOCK with a KVM_GET_CLOCK immediately after, without letting the VCPU run and call kvm_guest_time_update. To fix this, compute the kvmclock value ourselves, using the master clock (tsc, nsec) pair as the base and the host CPU frequency as the scale. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-19 18:03:03 +01:00
Linus Torvalds	04e36857d6	Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild fixes from Michal Marek: "Here are some regression fixes for kbuild: - modversion support for exported asm symbols (Nick Piggin). The affected architectures need separate patches adding asm-prototypes.h. - fix rebuilds of lib-ksyms.o (Nick Piggin) - -fno-PIE builds (Sebastian Siewior and Borislav Petkov). This is not a kernel regression, but one of the Debian gcc package. Nevertheless, it's quite annoying, so I think it should go into mainline and stable now" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild: Steal gcc's pie from the very beginning kbuild: be more careful about matching preprocessed asm ___EXPORT_SYMBOL x86/kexec: add -fno-PIE scripts/has-stack-protector: add -fno-PIE kbuild: add -fno-PIE kbuild: modversions for EXPORT_SYMBOL() for asm kbuild: prevent lib-ksyms.o rebuilds	2016-11-18 16:45:21 -08:00
Jonathan Corbet	917fef6f7e	Linux 4.9-rc4 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8 FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI= =2KUF -----END PGP SIGNATURE----- Merge tag 'v4.9-rc4' into sound Bring in -rc4 patches so I can successfully merge the sound doc changes.	2016-11-18 16:13:41 -07:00
Len Brown	7a3e686e1b	x86/idle: Remove enter_idle(), exit_idle() Upon removal of the is_idle flag, these routines became NOPs. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/822f2c22cc5890f7b8ea0eeec60277eb44505b4e.1479449716.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 12:07:57 +01:00
Len Brown	9694be731d	x86: Remove x86_test_and_clear_bit_percpu() Upon removal of the "is_idle" flag, x86_test_and_clear_bit_percpu() is no longer used. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/b334ae6819507e3dfc0a4b33ed974714d067eb4a.1479449716.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 12:07:57 +01:00
Len Brown	f08b5fe2d4	x86/idle: Remove is_idle flag Upon removal of the idle_notifier, all accesses to the "is_idle" flag serve no purpose. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/e4a24197cf9c227fcd1ca2df09999eaec9052f49.1479449716.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 12:07:57 +01:00
Len Brown	8e7a7ee9dd	x86/idle: Remove idle_notifier Upon removal of the i7300_idle driver, the idle_notifer is unused. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/f15385a82ec4bf51f4f06777193d83f03b28cfdd.1479449716.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 12:07:56 +01:00
Thomas Gleixner	984fecebda	x86/tsc: Finalize the split of the TSC_RELIABLE flag All places which used the TSC_RELIABLE to skip the delayed calibration have been converted to use the TSC_KNOWN_FREQ flag. Make the immeditate clocksource registration, which skips the long term calibration, solely depend on TSC_KNOWN_FREQ. The TSC_RELIABLE now merily removes the requirement for a watchdog clocksource. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bin Gao <bin.gao@intel.com> Cc: Peter Zijlstra <peterz@infradead.org>	2016-11-18 10:58:31 +01:00

... 2 3 4 5 6 ...

25985 Commits