and a few PPC bug fixes too.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJWBSn7AAoJEL/70l94x66Dxd4H/RT6kWWj9x4grEYUkcJUDyK2
AXm7XcKQm04auwAic8Otr+ts/Qix/50kWmBe/TU0QLgqb8rj5Dj3yGFK6Z1y6mAz
KvaxqMJd4tZGTqN0DDvC2ItEdzjfAdeJZo/FHXqPHVspG0G14T7STLna02LTBBEJ
tNzY9qor8nFhg2fT2szqKaudUNgTqkCTpo57o2BrHE96SHG+m0WdpQCV1F5hPVpg
Te0Pb7qX9xng5n3sQ7IV/t3QYbrza1ACwNQS9XJa0Yu6iEz7JdmVmzHQASK9ynn6
hUHhsNYGx4IsPjPtfJk2GroNaRDZL+VMzw07tfcOvPx8xkS9hS63pwzmSBqfLrM=
=Ywqn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
bug fixes too"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: disable halt_poll_ns as default for s390x
KVM: x86: fix off-by-one in reserved bits check
KVM: x86: use correct page table format to check nested page table reserved bits
KVM: svm: do not call kvm_set_cr0 from init_vmcb
KVM: x86: trap AMD MSRs for the TSeg base and mask
KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
kvm: svm: reset mmu on VCPU reset
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.
Architectures are now free to set their own halt_poll_ns
default value.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove
with not-instrumented analogues __memset/__memcpy/__memove.
However, on x86 the EFI stub is not linked with the kernel. It uses
not-instrumented mem*() functions from arch/x86/boot/compressed/string.c
So we don't replace them with __mem*() variants in EFI stub.
On ARM64 the EFI stub is linked with the kernel, so we should replace
mem*() functions with __mem*(), because the EFI stub runs before KASAN
sets up early shadow.
So let's move these #undef mem* into arch's asm/efi.h which is also
included by the EFI stub.
Also, this will fix the warning in 32-bit build reported by kbuild test
robot:
efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy'
[akpm@linux-foundation.org: use 80 cols in comment]
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These have roughly the same purpose as the SMRR, which we do not need
to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at
boot, which causes problems when running a Xen dom0 under KVM.
Just return 0, meaning that processor protection of SMRAM is not
in effect.
Reported-by: M A Young <m.a.young@durham.ac.uk>
Cc: stable@vger.kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJV+/ucAAoJEL/70l94x66DV8YH/1KDym/1GJ+/Br/YkHZnM53l
3Q0PwSLu9cNcIL9lUuDLwGTaVj+y8ud1Hjr/uzvKwivktmUYVZhkdtnZmnanvGOM
qKB9K3nFXCPx8uqy8Dn7fOwEKcg9FmDOTTkWy13HDnXO+V4crSVVt+rPw+6FUMld
NV5tYdw9Lu7y3XrveDebPWaPtyDL7OAagzmeK47eMffxG7X9Hf1H2aT7HueRi7x/
SkLIe3gmiOWmHVJDPE9TOmFYIj19gywDFysKes1gdVJLVUIXiELMT7SrvAYnToVB
zISIEj7Zx4SINPxpf2dUn8REm7NsmJY+PffLIl/Nv+ozGggFQGFH0SMZ08p0bxw=
=tfmn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Mostly stable material, a lot of ARM fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
sched: access local runqueue directly in single_task_running
arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
arm64: KVM: Remove all traces of the ThumbEE registers
arm: KVM: Disable virtual timer even if the guest is not using it
arm64: KVM: Disable virtual timer even if the guest is not using it
arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
KVM: s390: Replace incorrect atomic_or with atomic_andnot
arm: KVM: Fix incorrect device to IPA mapping
arm64: KVM: Fix user access for debug registers
KVM: vmx: fix VPID is 0000H in non-root operation
KVM: add halt_attempted_poll to VCPU stats
kvm: fix zero length mmio searching
kvm: fix double free for fast mmio eventfd
kvm: factor out core eventfd assign/deassign logic
kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
KVM: make the declaration of functions within 80 characters
KVM: arm64: add workaround for Cortex-A57 erratum #852523
KVM: fix polling for guest halt continued even if disable it
arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
arm64: KVM: set {v,}TCR_EL2 RES1 bits
...
Pull x86 fixes from Ingo Molnar:
- misc fixes all around the map
- block non-root vm86(old) if mmap_min_addr != 0
- two small debuggability improvements
- removal of obsolete paravirt op
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform: Fix Geode LX timekeeping in the generic x86 build
x86/apic: Serialize LVTT and TSC_DEADLINE writes
x86/ioapic: Force affinity setting in setup_ioapic_dest()
x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method
x86/ldt: Fix small LDT allocation for Xen
x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text
x86/cpu: Print family/model/stepping in hex
x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0
x86/alternatives: Make optimize_nops() interrupt safe and synced
x86/mm/srat: Print non-volatile flag in SRAT
x86/cpufeatures: Enable cpuid for Intel SHA extensions
Pull locking fixes from Ingo Molnar:
"Spinlock performance regression fix, plus documentation fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/static_keys: Fix up the static keys documentation
locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support
locking/qspinlock/x86: Fix performance regression under unaccelerated VMs
locking/static_keys: Fix a silly typo
This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.
For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling. This would show as an abnormally high number of
attempted polling compared to the successful polls.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Only emit the test-and-set fallback for Hypervisors lacking
PARAVIRT_SPINLOCKS support when building for guests.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 4.2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave ran into horrible performance on a VM without PARAVIRT_SPINLOCKS
set and Linus noted that the test-and-set implementation was retarded.
One should spin on the variable with a load, not a RMW.
While there, remove 'queued' from the name, as the lock isn't queued
at all, but a simple test-and-set.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lkml.kernel.org/r/20150904152523.GR18673@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge third patch-bomb from Andrew Morton:
- even more of the rest of MM
- lib/ updates
- checkpatch updates
- small changes to a few scruffy filesystems
- kmod fixes/cleanups
- kexec updates
- a dma-mapping cleanup series from hch
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
dma-mapping: consolidate dma_set_mask
dma-mapping: consolidate dma_supported
dma-mapping: cosolidate dma_mapping_error
dma-mapping: consolidate dma_{alloc,free}_noncoherent
dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd()
mm: make sure all file VMAs have ->vm_ops set
mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()
mm: mark most vm_operations_struct const
namei: fix warning while make xmldocs caused by namei.c
ipc: convert invalid scenarios to use WARN_ON
zlib_deflate/deftree: remove bi_reverse()
lib/decompress_unlzma: Do a NULL check for pointer
lib/decompressors: use real out buf size for gunzip with kernel
fs/affs: make root lookup from blkdev logical size
sysctl: fix int -> unsigned long assignments in INT_MIN case
kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo
kexec: align crash_notes allocation to make it be inside one physical page
kexec: remove unnecessary test in kimage_alloc_crash_control_pages()
kexec: split kexec_load syscall from kexec core code
...
- Use the correct GFN/BFN terms more consistently.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV8VRMAAoJEFxbo/MsZsTRiGQH/i/jrAJUJfrFC2PINaA2gDwe
O0dlrkCiSgAYChGmxxxXZQSPM5Po5+EbT/dLjZ/uvSooeorM9RYY/mFo7ut/qLep
4pyQUuwGtebWGBZTrj9sygUVXVhgJnyoZxskNUbhj9zvP7hb9++IiI78mzne6cpj
lCh/7Z2dgpfRcKlNRu+qpzP79Uc7OqIfDK+IZLrQKlXa7IQDJTQYoRjbKpfCtmMV
BEG3kN9ESx5tLzYiAfxvaxVXl9WQFEoktqe9V8IgOQlVRLgJ2DQWS6vmraGrokWM
3HDOCHtRCXlPhu1Vnrp0R9OgqWbz8FJnmVAndXT8r3Nsjjmd0aLwhJx7YAReO/4=
=JDia
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen terminology fixes from David Vrabel:
"Use the correct GFN/BFN terms more consistently"
* tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn
xen/privcmd: Further s/MFN/GFN/ clean-up
hvc/xen: Further s/MFN/GFN clean-up
video/xen-fbfront: Further s/MFN/GFN clean-up
xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn
xen: Use correctly the Xen memory terminologies
arm/xen: implement correctly pfn_to_mfn
xen: Make clear that swiotlb and biomerge are dealing with DMA address
Almost everyone implements dma_set_mask the same way, although some time
that's hidden in ->set_dma_mask methods.
This patch consolidates those into a common implementation that either
calls ->set_dma_mask if present or otherwise uses the default
implementation. Some architectures used to only call ->set_dma_mask
after the initial checks, and those instance have been fixed to do the
full work. h8300 implemented dma_set_mask bogusly as a no-ops and has
been fixed.
Unfortunately some architectures overload unrelated semantics like changing
the dma_ops into it so we still need to allow for an architecture override
for now.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most architectures just call into ->dma_supported, but some also return 1
if the method is not present, or 0 if no dma ops are present (although
that should never happeb). Consolidate this more broad version into
common code.
Also fix h8300 which inorrectly always returned 0, which would have been
a problem if it's dma_set_mask implementation wasn't a similarly buggy
noop.
As a few architectures have much more elaborate implementations, we
still allow for arch overrides.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently there are three valid implementations of dma_mapping_error:
(1) call ->mapping_error
(2) check for a hardcoded error code
(3) always return 0
This patch provides a common implementation that calls ->mapping_error
if present, then checks for DMA_ERROR_CODE if defined or otherwise
returns 0.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most architectures do not support non-coherent allocations and either
define dma_{alloc,free}_noncoherent to their coherent versions or stub
them out.
Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips
implements them directly.
This patch moves the Openrisc version to common code, and handles the
DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance.
Note that actual non-coherent allocations require a dma_cache_sync
implementation, so if non-coherent allocations didn't work on
an architecture before this patch they still won't work after it.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 2009 we have a nice asm-generic header implementing lots of DMA API
functions for architectures using struct dma_map_ops, but unfortunately
it's still missing a lot of APIs that all architectures still have to
duplicate.
This series consolidates the remaining functions, although we still need
arch opt outs for two of them as a few architectures have very
non-standard implementations.
This patch (of 5):
The coherent DMA allocator works the same over all architectures supporting
dma_map operations.
This patch consolidates them and converges the minor differences:
- the debug_dma helpers are now called from all architectures, including
those that were previously missing them
- dma_alloc_from_coherent and dma_release_from_coherent are now always
called from the generic alloc/free routines instead of the ops
dma-mapping-common.h always includes dma-coherent.h to get the defintions
for them, or the stubs if the architecture doesn't support this feature
- checks for ->alloc / ->free presence are removed. There is only one
magic instead of dma_map_ops without them (mic_dma_ops) and that one
is x86 only anyway.
Besides that only x86 needs special treatment to replace a default devices
if none is passed and tweak the gfp_flags. An optional arch hook is provided
for that.
[linux@roeck-us.net: fix build]
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.
And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.
The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.
Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.
Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
The changes with more meat are:
o Allowing the trace event filters to filter on CPU number and process ids
o Two new markers for trace output latency were added
(10 and 100 msec latencies)
o Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future
work, and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression
as well as the code to revert that change without touching the other
changes that were made on top of it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat
dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9
g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r
RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A
Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr
SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M=
=baZ3
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing update from Steven Rostedt:
"Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
- Allowing the trace event filters to filter on CPU number and
process ids
- Two new markers for trace output latency were added (10 and 100
msec latencies)
- Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future work,
and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression as
well as the code to revert that change without touching the other
changes that were made on top of it"
* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
tracing: Don't make assumptions about length of string on task rename
tracing: Allow triggers to filter for CPU ids and process names
ftrace: Format MCOUNT_ADDR address as type unsigned long
tracing: Introduce two additional marks for delay
ftrace: Fix function_graph duration spacing with 7-digits
ftrace: add tracing_thresh to function profile
tracing: Clean up stack tracing and fix fentry updates
ring-buffer: Reorganize function locations
ring-buffer: Make sure event has enough room for extend and padding
ring-buffer: Get timestamp after event is allocated
ring-buffer: Move the adding of the extended timestamp out of line
ring-buffer: Add event descriptor to simplify passing data
ftrace: correct the counter increment for trace_buffer data
tracing: Fix for non-continuous cpu ids
tracing: Prefer kcalloc over kzalloc with multiply
- Convert xen-blkfront to the multiqueue API
- [arm] Support binding event channels to different VCPUs.
- [x86] Support > 512 GiB in a PV guests (off by default as such a
guest cannot be migrated with the current toolstack).
- [x86] PMU support for PV dom0 (limited support for using perf with
Xen and other guests).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV7wIdAAoJEFxbo/MsZsTR0hEH/04HTKLKGnSJpZ5WbMPxqZxE
UqGlvhvVWNAmFocZmbPcEi9T1qtcFrX5pM55JQr6UmAp3ovYsT2q1Q1kKaOaawks
pSfc/YEH3oQW5VUQ9Lm9Ru5Z8Btox0WrzRREO92OF36UOgUOBOLkGsUfOwDinNIM
lSk2djbYwDYAsoeC3PHB32wwMI//Lz6B/9ZVXcyL6ULynt1ULdspETjGnptRPZa7
JTB5L4/soioKOn18HDwwOhKmvaFUPQv9Odnv7dc85XwZreajhM/KMu3qFbMDaF/d
WVB1NMeCBdQYgjOrUjrmpyr5uTMySiQEG54cplrEKinfeZgKlEyjKvjcAfJfiac=
=Ktjl
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from David Vrabel:
"Xen features and fixes for 4.3:
- Convert xen-blkfront to the multiqueue API
- [arm] Support binding event channels to different VCPUs.
- [x86] Support > 512 GiB in a PV guests (off by default as such a
guest cannot be migrated with the current toolstack).
- [x86] PMU support for PV dom0 (limited support for using perf with
Xen and other guests)"
* tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits)
xen: switch extra memory accounting to use pfns
xen: limit memory to architectural maximum
xen: avoid another early crash of memory limited dom0
xen: avoid early crash of memory limited dom0
arm/xen: Remove helpers which are PV specific
xen/x86: Don't try to set PCE bit in CR4
xen/PMU: PMU emulation code
xen/PMU: Intercept PMU-related MSR and APIC accesses
xen/PMU: Describe vendor-specific PMU registers
xen/PMU: Initialization code for Xen PMU
xen/PMU: Sysfs interface for setting Xen PMU mode
xen: xensyms support
xen: remove no longer needed p2m.h
xen: allow more than 512 GB of RAM for 64 bit pv-domains
xen: move p2m list if conflicting with e820 map
xen: add explicit memblock_reserve() calls for special pages
mm: provide early_memremap_ro to establish read-only mapping
xen: check for initrd conflicting with e820 map
xen: check pre-allocated page tables for conflict with memory map
xen: check for kernel memory conflicting with memory layout
...
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.
For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.
For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.
Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.
Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.
[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The swiotlb is required when programming a DMA address on ARM when a
device is not protected by an IOMMU.
In this case, the DMA address should always be equal to the machine address.
For DOM0 memory, Xen ensure it by have an identity mapping between the
guest address and host address. However, when mapping a foreign grant
reference, the 1:1 model doesn't work.
For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN
(Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point
of view given that all ARM guest are auto-translated.
Even though the name pfn_to_mfn is misleading, we need to ensure that
those caller get a GFN and not by mistake a MFN. In pratical, I haven't
seen error related to this but we should fix it for the sake of
correctness.
In order to fix the implementation of pfn_to_mfn on ARM in a follow-up
patch, we have to introduce new helpers to return the DMA from a PFN and
the invert.
On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn.
The helpers will be used in swiotlb and xen_biovec_phys_mergeable.
This is necessary in the latter because we have to ensure that the
biovec code will not try to merge a biovec using foreign page and
another using Linux memory.
Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn
given that the only usage was in swiotlb.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs. There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.
On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high. This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped. When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.
Architectures wishing to do this must give the following guarantee.
If a clean page is unmapped and not immediately flushed, the
architecture must guarantee that a write to that linear address
from a CPU with a cached TLB entry will trap a page fault.
This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting. The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry. An additional
architecture helper called flush_tlb_local is required. It's a trivial
wrapper with some accounting in the x86 case.
The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure. The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite. The test case uses NR_CPU
readers of mapped files that consume 10*RAM.
Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%)
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 581.00 611.43
System 5804.93 4111.76
Elapsed 161.03 122.12
This is showing that the readers completed 24.40% faster with 29% less
system CPU time. From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.
The impact is lower on a single socket machine.
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%)
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 58.09 57.64
System 111.82 76.56
Elapsed 27.29 22.55
It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.
The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages. It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes. Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.
[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull locking and atomic updates from Ingo Molnar:
"Main changes in this cycle are:
- Extend atomic primitives with coherent logic op primitives
(atomic_{or,and,xor}()) and deprecate the old partial APIs
(atomic_{set,clear}_mask())
The old ops were incoherent with incompatible signatures across
architectures and with incomplete support. Now every architecture
supports the primitives consistently (by Peter Zijlstra)
- Generic support for 'relaxed atomics':
- _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
- atomic_read_acquire()
- atomic_set_release()
This came out of porting qwrlock code to arm64 (by Will Deacon)
- Clean up the fragile static_key APIs that were causing repeat bugs,
by introducing a new one:
DEFINE_STATIC_KEY_TRUE(name);
DEFINE_STATIC_KEY_FALSE(name);
which define a key of different types with an initial true/false
value.
Then allow:
static_branch_likely()
static_branch_unlikely()
to take a key of either type and emit the right instruction for the
case. To be able to know the 'type' of the static key we encode it
in the jump entry (by Peter Zijlstra)
- Static key self-tests (by Jason Baron)
- qrwlock optimizations (by Waiman Long)
- small futex enhancements (by Davidlohr Bueso)
- ... and misc other changes"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
jump_label/x86: Work around asm build bug on older/backported GCCs
locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
locking/static_keys: Make verify_keys() static
jump label, locking/static_keys: Update docs
locking/static_keys: Provide a selftest
jump_label: Provide a self-test
s390/uaccess, locking/static_keys: employ static_branch_likely()
x86, tsc, locking/static_keys: Employ static_branch_likely()
locking/static_keys: Add selftest
locking/static_keys: Add a new static_key interface
locking/static_keys: Rework update logic
locking/static_keys: Add static_key_{en,dis}able() helpers
...
- ACPICA update to upstream revision 20150818 including method
tracing extensions to allow more in-depth AML debugging in the
kernel and a number of assorted fixes and cleanups (Bob Moore,
Lv Zheng, Markus Elfring).
- ACPI sysfs code updates and a documentation update related to
AML method tracing (Lv Zheng).
- ACPI EC driver fix related to serialized evaluations of _Qxx
methods and ACPI tools updates allowing the EC userspace tool
to be built from the kernel source (Lv Zheng).
- ACPI processor driver updates preparing it for future
introduction of CPPC support and ACPI PCC mailbox driver
updates (Ashwin Chaugule).
- ACPI interrupts enumeration fix for a regression related
to the handling of IRQ attribute conflicts between MADT
and the ACPI namespace (Jiang Liu).
- Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar).
- ACPI device registration code reorganization to separate the
sysfs-related code and bus type operations from the rest (Rafael
J Wysocki).
- Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).
- ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups
(Pan Xinhui, Rafael J Wysocki).
- cpufreq core cleanups on top of the previous changes allowing it
to preseve its sysfs directories over system suspend/resume (Viresh
Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).
- cpufreq fixes and cleanups related to governors (Viresh Kumar).
- cpufreq updates (core and the cpufreq-dt driver) related to the
turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).
- New DT bindings for Operating Performance Points (OPP), support
for them in the OPP framework and in the cpufreq-dt driver plus
related OPP framework fixes and cleanups (Viresh Kumar).
- cpufreq powernv driver updates (Shilpasri G Bhat).
- New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).
- Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).
- intel_pstate driver updates including Skylake-S support, support
for enabling HW P-states per CPU and an additional vendor bypass
list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).
- cpuidle core fixes related to the handling of coupled idle states
(Xunlei Pang).
- intel_idle driver updates including Skylake Client support and
support for freeze-mode-specific idle states (Len Brown).
- Driver core updates related to power management (Andy Shevchenko,
Rafael J Wysocki).
- Generic power domains framework fixes and cleanups (Jon Hunter,
Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).
- Device PM QoS framework update to allow the latency tolerance
setting to be exposed to user space via sysfs (Mika Westerberg).
- devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).
- System sleep support updates (Alan Stern, Len Brown, SungEun Kim).
- rockchip-io AVS support updates (Heiko Stuebner).
- PM core clocks support fixup (Colin Ian King).
- Power capping RAPL driver update including support for Skylake H/S
and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).
- Generic device properties framework fixes related to the handling
of static (driver-provided) property sets (Andy Shevchenko).
- turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
Shreyas B Prabhu).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJV5hhGAAoJEILEb/54YlRxs+EQAK51iFk48+IbpHYaZZ50Yo4m
ZZc2zBcbwRcBlU9vKERrhG+jieSl8J/JJNxT8vBjKqyvNw038mCjewQh02ol0HuC
R7nlDiVJkmZ50sLO4xwE/1UBZr/XqbddwCUnYzvFMkMTA0ePzFtf8BrJ1FXpT8S/
fkwSXQty6hvJDwxkfrbMSaA730wMju9lahx8D6MlmUAedWYZOJDMQKB4WKa/St5X
9uckBPHUBB2KiKlXxdbFPwKLNxHvLROq5SpDLc6cM/7XZB+QfNFy85CUjCUtYo1O
1W8k0qnztvZ6UEv27qz5dejGyAGOarMWGGNsmL9evoeGeHRpQL+dom7HcTnbAfUZ
walyhYSm/zKkdy7Vl3xWUUQkMG48+PviMI6K0YhHXb3Rm5wlR/yBNZTwNIty9SX/
fKCHEa8QynWwLxgm53c3xRkiitJxMsHNK03moLD9zQMjshTyTNvpNbZoahyKQzk6
H+9M1DBRHhkkREDWSwGutukxfEMtWe2vcZcyERrFiY7l5k1j58DwDBMPqjPhRv6q
P/1NlCzr0XYf83Y86J18LbDuPGDhTjjIEn6CqbtI2mmWqTg3+rF7zvS2ux+FzMnA
gisv8l6GT9JiWhxKFqqL/rrVpwtyHebWLYE/RpNUW6fEzLziRNj1qyYO9dqI/GGi
I3rfxlXoc/5xJWCgNB8f
=fTgI
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"From the number of commits perspective, the biggest items are ACPICA
and cpufreq changes with the latter taking the lead (over 50 commits).
On the cpufreq front, there are many cleanups and minor fixes in the
core and governors, driver updates etc. We also have a new cpufreq
driver for Mediatek MT8173 chips.
ACPICA mostly updates its debug infrastructure and adds a number of
fixes and cleanups for a good measure.
The Operating Performance Points (OPP) framework is updated with new
DT bindings and support for them among other things.
We have a few updates of the generic power domains framework and a
reorganization of the ACPI device enumeration code and bus type
operations.
And a lot of fixes and cleanups all over.
Included is one branch from the MFD tree as it contains some
PM-related driver core and ACPI PM changes a few other commits are
based on.
Specifics:
- ACPICA update to upstream revision 20150818 including method
tracing extensions to allow more in-depth AML debugging in the
kernel and a number of assorted fixes and cleanups (Bob Moore, Lv
Zheng, Markus Elfring).
- ACPI sysfs code updates and a documentation update related to AML
method tracing (Lv Zheng).
- ACPI EC driver fix related to serialized evaluations of _Qxx
methods and ACPI tools updates allowing the EC userspace tool to be
built from the kernel source (Lv Zheng).
- ACPI processor driver updates preparing it for future introduction
of CPPC support and ACPI PCC mailbox driver updates (Ashwin
Chaugule).
- ACPI interrupts enumeration fix for a regression related to the
handling of IRQ attribute conflicts between MADT and the ACPI
namespace (Jiang Liu).
- Fixes related to ACPI device PM (Mika Westerberg, Srinidhi
Kasagar).
- ACPI device registration code reorganization to separate the
sysfs-related code and bus type operations from the rest (Rafael J
Wysocki).
- Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).
- ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan
Xinhui, Rafael J Wysocki).
- cpufreq core cleanups on top of the previous changes allowing it to
preseve its sysfs directories over system suspend/resume (Viresh
Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).
- cpufreq fixes and cleanups related to governors (Viresh Kumar).
- cpufreq updates (core and the cpufreq-dt driver) related to the
turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).
- New DT bindings for Operating Performance Points (OPP), support for
them in the OPP framework and in the cpufreq-dt driver plus related
OPP framework fixes and cleanups (Viresh Kumar).
- cpufreq powernv driver updates (Shilpasri G Bhat).
- New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).
- Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).
- intel_pstate driver updates including Skylake-S support, support
for enabling HW P-states per CPU and an additional vendor bypass
list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).
- cpuidle core fixes related to the handling of coupled idle states
(Xunlei Pang).
- intel_idle driver updates including Skylake Client support and
support for freeze-mode-specific idle states (Len Brown).
- Driver core updates related to power management (Andy Shevchenko,
Rafael J Wysocki).
- Generic power domains framework fixes and cleanups (Jon Hunter,
Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).
- Device PM QoS framework update to allow the latency tolerance
setting to be exposed to user space via sysfs (Mika Westerberg).
- devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).
- System sleep support updates (Alan Stern, Len Brown, SungEun Kim).
- rockchip-io AVS support updates (Heiko Stuebner).
- PM core clocks support fixup (Colin Ian King).
- Power capping RAPL driver update including support for Skylake H/S
and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).
- Generic device properties framework fixes related to the handling
of static (driver-provided) property sets (Andy Shevchenko).
- turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
Shreyas B Prabhu)"
* tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits)
cpufreq: speedstep-lib: Use monotonic clock
cpufreq: powernv: Increase the verbosity of OCC console messages
cpufreq: sfi: use kmemdup rather than duplicating its implementation
cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
cpufreq: remove redundant 'policy' field from user_policy
cpufreq: remove redundant 'governor' field from user_policy
cpufreq: update user_policy.* on success
cpufreq: use memcpy() to copy policy
cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
cpufreq: mediatek: Add MT8173 cpufreq driver
dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
PM / Domains: Fix typo in description of genpd_dev_pm_detach()
PM / Domains: Remove unusable governor dummies
PM / Domains: Make pm_genpd_init() available to modules
PM / domains: Align column headers and data in pm_genpd_summary output
powercap / RAPL: disable the 2nd power limit properly
tools: cpupower: Fix error when running cpupower monitor
PM / OPP: Drop unlikely before IS_ERR(_OR_NULL)
PM / OPP: Fix static checker warning (broken 64bit big endian systems)
...
Pull x86 apic updates from Thomas Gleixner:
"This udpate contains:
- rework the irq vector array to store a pointer to the irq
descriptor instead of the irq number to avoid a lookup of the irq
descriptor in the irq entry path
- lguest interrupt handling cleanups
- conversion of the local apic timer to the new clockevent callbacks
- preparatory changes for the irq argument removal of interrupt flow
handlers"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Do not dereference irq descriptor before checking it
tools/lguest: Clean up include dir
tools/lguest: Fix redefinition of struct virtio_pci_cfg_cap
x86/irq: Store irq descriptor in vector array
genirq: Provide irq_desc_has_action
x86/irq: Get rid of an indentation level
x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED
x86/irq: Replace numeric constant
x86/irq: Protect smp_cleanup_move
x86/lguest: Do not setup unused irq vectors
x86/lguest: Clean up lguest_setup_irq
x86/apic: Drop local_irq_save/restore in timer callbacks
x86/apic: Migrate apic timer to new set_state interface
x86/irq: Use access helper irq_data_get_affinity_mask()
x86/irq: Use accessor irq_data_get_irq_handler_data()
x86/irq: Use accessor irq_data_get_node()
Pull x86 core platform updates from Ingo Molnar:
"The main changes are:
- Intel Atom platform updates. (Andy Shevchenko)
- modularity fixlets. (Paul Gortmaker)
- x86 platform clockevents driver updates for lguest, uv and Xen.
(Viresh Kumar)
- Microsoft Hyper-V TSC fixlet. (Vitaly Kuznetsov)"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform: Make atom/pmc_atom.c explicitly non-modular
x86/hyperv: Mark the Hyper-V TSC as unstable
x86/xen/time: Migrate to new set-state interface
x86/uv/time: Migrate to new set-state interface
x86/lguest/timer: Migrate to new set-state interface
x86/pci/intel_mid_pci: Use proper constants for irq polarity
x86/pci/intel_mid_pci: Make intel_mid_pci_ops static
x86/pci/intel_mid_pci: Propagate actual return code
x86/pci/intel_mid_pci: Work around for IRQ0 assignment
x86/platform/iosf_mbi: Add Intel Tangier PCI id
x86/platform/iosf_mbi: Source cleanup
x86/platform/iosf_mbi: Remove NULL pointer checks for pci_dev_put()
x86/platform/iosf_mbi: Check return value of debugfs_create properly
x86/platform/iosf_mbi: Move to dedicated folder
x86/platform/intel/pmc_atom: Move the PMC-Atom code to arch/x86/platform/atom
x86/platform/intel/pmc_atom: Add Cherrytrail PMC interface
x86/platform/intel/pmc_atom: Supply register mappings via PMC object
x86/platform/intel/pmc_atom: Print index of device in loop
x86/platform/intel/pmc_atom: Export accessors to PMC registers
Pull x86 mm updates from Ingo Molnar:
"The dominant change in this cycle was the continued work to isolate
kernel drivers from MTRR legacies: this tree gets rid of all kernel
internal driver interfaces to MTRRs (mostly by rewriting it to proper
PAT interfaces), the only access left is the /proc/mtrr ABI.
This work was done by Luis R Rodriguez.
There's also some related PCI interface additions for which I've
Cc:-ed Bjorn"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()
s390/io: Add pci_iomap_wc() and pci_iomap_wc_range()
drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style
drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc()
drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc()
drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc()
PCI: Add pci_iomap_wc() variants
drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer
drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
PCI: Add pci_ioremap_wc_bar()
x86/mm: Make kernel/check.c explicitly non-modular
x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular
x86/mm/pat: Add comments to cachemode translation tables
arch/*/io.h: Add ioremap_uc() to all architectures
drivers/video/fbdev/atyfb: Use arch_phys_wc_add() and ioremap_wc()
drivers/video/fbdev/atyfb: Replace MTRR UC hole with strong UC
drivers/video/fbdev/atyfb: Clarify ioremap() base and length used
drivers/video/fbdev/atyfb: Carve out framebuffer length fudging into a helper
x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default
...
Pull x86 cpu updates from Ingo Molnar:
"Two changes: a suspend/resume quirk and a new CPUID bit definition"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpufeature: Add feature bit for Intel's Silicon Debug CPUID bit
x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume
Pull x86 boot updates from Ingo Molnar:
"The main x86 bootup related changes in this cycle were:
- more boot time optimizations. (Len Brown)
- implement hex output to allow the debugging of early bootup
parameters. (Kees Cook)
- remove obsolete MCA leftovers. (Paolo Pisati)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
x86/smpboot: Remove SIPI delays from cpu_up()
x86/smpboot: Remove udelay(100) when polling cpu_callin_map
x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
x86/boot: Obsolete the MCA sys_desc_table
x86/boot: Add hex output for debugging
Pull x86 asm changes from Ingo Molnar:
"The biggest changes in this cycle were:
- Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC)
primitives. (Andy Lutomirski)
- Add new, comprehensible entry and exit handlers written in C.
(Andy Lutomirski)
- vm86 mode cleanups and fixes. (Brian Gerst)
- 32-bit compat code cleanups. (Brian Gerst)
The amount of simplification in low level assembly code is already
palpable:
arch/x86/entry/entry_32.S | 130 +----
arch/x86/entry/entry_64.S | 197 ++-----
but more simplifications are planned.
There's also the usual laudry mix of low level changes - see the
changelog for details"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits)
x86/asm: Drop repeated macro of X86_EFLAGS_AC definition
x86/asm/msr: Make wrmsrl() a function
x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
x86/asm: Add MONITORX/MWAITX instruction support
x86/traps: Weaken context tracking entry assertions
x86/asm/tsc: Add rdtscll() merge helper
selftests/x86: Add syscall_nt selftest
selftests/x86: Disable sigreturn_64
x86/vdso: Emit a GNU hash
x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks
x86/entry/32: Migrate to C exit path
x86/entry/32: Remove 32-bit syscall audit optimizations
x86/vm86: Rename vm86->v86flags and v86mask
x86/vm86: Rename vm86->vm86_info to user_vm86
x86/vm86: Clean up vm86.h includes
x86/vm86: Move the vm86 IRQ definitions to vm86.h
x86/vm86: Use the normal pt_regs area for vm86
x86/vm86: Eliminate 'struct kernel_vm86_struct'
x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
x86/vm86: Move vm86 fields out of 'thread_struct'
...
* pm-tools:
tools: cpupower: Fix error when running cpupower monitor
tools/power turbostat: fix typo on DRAM column in Joules-mode
cpupower: Do not change the frequency of offline cpu
tools/power turbostat: fix parameter passing for forked command
tools/power turbostat: dump CONFIG_TDP
tools/power turbostat: cpu0 is no longer hard-coded, so update output
tools/power turbostat: update turbostat(8)
* powercap:
powercap / RAPL: disable the 2nd power limit properly
powercap / RAPL: Add support for Broadwell-H
powercap / RAPL: Add support for Skylake H/S
Pull scheduler updates from Ingo Molnar:
"The biggest change in this cycle is the rewrite of the main SMP load
balancing metric: the CPU load/utilization. The main goal was to make
the metric more precise and more representative - see the changelog of
this commit for the gory details:
9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")
It is done in a way that significantly reduces complexity of the code:
5 files changed, 249 insertions(+), 494 deletions(-)
and the performance testing results are encouraging. Nevertheless we
need to keep an eye on potential regressions, since this potentially
affects every SMP workload in existence.
This work comes from Yuyang Du.
Other changes:
- SCHED_DL updates. (Andrea Parri)
- Simplify architecture callbacks by removing finish_arch_switch().
(Peter Zijlstra et al)
- cputime accounting: guarantee stime + utime == rtime. (Peter
Zijlstra)
- optimize idle CPU wakeups some more - inspired by Facebook server
loads. (Mike Galbraith)
- stop_machine fixes and updates. (Oleg Nesterov)
- Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra)
- sched/numa tweaks. (Srikar Dronamraju)
- misc fixes and small cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
sched/deadline: Fix comment in enqueue_task_dl()
sched/deadline: Fix comment in push_dl_tasks()
sched: Change the sched_class::set_cpus_allowed() calling context
sched: Make sched_class::set_cpus_allowed() unconditional
sched: Fix a race between __kthread_bind() and sched_setaffinity()
sched: Ensure a task has a non-normalized vruntime when returning back to CFS
sched/numa: Fix NUMA_DIRECT topology identification
tile: Reorganize _switch_to()
sched, sparc32: Update scheduler comments in copy_thread()
sched: Remove finish_arch_switch()
sched, tile: Remove finish_arch_switch
sched, sh: Fold finish_arch_switch() into switch_to()
sched, score: Remove finish_arch_switch()
sched, avr32: Remove finish_arch_switch()
sched, MIPS: Get rid of finish_arch_switch()
sched, arm: Remove finish_arch_switch()
sched/fair: Clean up load average references
sched/fair: Provide runnable_load_avg back to cfs_rq
sched/fair: Remove task and group entity load when they are dead
sched/fair: Init cfs_rq's sched_entity load average
...
Pull RAS updates from Ingo Molnar:
"MCE handling updates, but also some generic drivers/edac/ changes to
better organize the Kconfig space"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras: Move AMD MCE injector to arch/x86/ras/
x86/mce: Add a wrapper around mce_log() for injection
x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check()
RAS: Add a menuconfig option with descriptive text
x86/mce: Reenable CMCI banks when swiching back to interrupt mode
x86/mce: Clear Local MCE opt-in before kexec
x86/mce: Remove unused function declarations
x86/mce: Kill drain_mcelog_buffer()
x86/mce: Avoid potential deadlock due to printk() in MCE context
x86/mce: Remove the MCE ring for Action Optional errors
x86/mce: Don't use percpu workqueues
x86/mce: Provide a lockless memory pool to save error records
x86/mce: Reuse one of the u16 padding fields in 'struct mce'
Pull perf updates from Ingo Molnar:
"Main perf kernel side changes:
- uprobes updates/fixes. (Oleg Nesterov)
- Add PERF_RECORD_SWITCH to indicate context switches and use it in
tooling. (Adrian Hunter)
- Support BPF programs attached to uprobes and first steps for BPF
tooling support. (Wang Nan)
- x86 generic x86 MSR-to-perf PMU driver. (Andy Lutomirski)
- x86 Intel PT, LBR and BTS updates. (Alexander Shishkin)
- x86 Intel Skylake support. (Andi Kleen)
- x86 Intel Knights Landing (KNL) RAPL support. (Dasaratharaman
Chandramouli)
- x86 Intel Broadwell-DE uncore support. (Kan Liang)
- x86 hw breakpoints robustization (Andy Lutomirski)
Main perf tooling side changes:
- Support Intel PT in several tools, enabling the use of the
processor trace feature introduced in Intel Broadwell processors:
(Adrian Hunter)
# dmesg | grep Performance
# [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
# perf record -e intel_pt//u -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.216 MB perf.data ]
# perf script # then navigate in the tool output to some area, like this one:
184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so)
185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so)
186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so)
187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so)
188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so)
189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so)
190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so)
191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so)
192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so)
193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so)
194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so)
195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so)
196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so)
197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so)
198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so)
199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so)
200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so)
201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so)
- Add support for using several Intel PT features (CYC, MTC packets),
the relevant documentation was updated in:
tools/perf/Documentation/intel-pt.txt
briefly describing those packets, its purposes, how to configure
them in the event config terms and relevant external documentation
for further reading. (Adrian Hunter)
- Introduce support for probing at an absolute address, for user and
kernel 'perf probe's, useful when one have the symbol maps on a
developer machine but not on an embedded system. (Wang Nan)
- Add Intel BTS support, with a call-graph script to show it and PT
in use in a GUI using 'perf script' python scripting with
postgresql and Qt. (Adrian Hunter)
- Allow selecting the type of callchains per event, including
disabling callchains in all but one entry in an event list, to save
space, and also to ask for the callchains collected in one event to
be used in other events. (Kan Liang)
- Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho
de Melo)
* A bunch more translate file/pathnames from pointers to strings.
* Convert numbers to strings for the 'keyctl' syscall 'option'
arg.
* Add missing 'clockid' entries.
- Introduce 'srcfile' sort key: (Andi Kleen)
# perf record -F 10000 usleep 1
# perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile
<SNIP>
# Overhead Source File
26.49% copy_page_64.S
5.49% signal.c
0.51% msr.h
#
It can be combined with other fields, for instance, experiment with
'-s srcfile,symbol'.
There are some oddities in some distros and with some specific
DSOs, being investigated, so your mileage may vary.
- Support per-event 'freq' term: (Namhyung Kim)
$ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1
$ perf evlist -F
cpu/instructions,freq=1234/: sample_freq=1234
cycles: sample_period=1000
$
- Deref sys_enter pointer args with contents from probe:vfs_getname,
showing pathnames instead of pointers in many syscalls in 'perf
trace'. (Arnaldo Carvalho de Melo)
- Stop collecting /proc/kallsyms in perf.data files, saving about
4.5MB on a typical x86-64 system, use the the symbol resolution
routines used in all the other tools (report, top, etc) now that we
can ask libtraceevent to use perf's symbol resolution code.
(Arnaldo Carvalho de Melo)
- Allow filtering out of perf's PID via 'perf record --exclude-perf'.
(Wang Nan)
- 'perf trace' now supports syscall groups, like strace, i.e:
$ trace -e file touch file
Will expand 'file' into multiple, file related, syscalls. More
work needed to add extra groups for other syscall groups, and also
to complement what was added for the 'file' group, included as a
proof of concept. (Arnaldo Carvalho de Melo)
- Add lock_pi stresser to 'perf bench futex', to test the kernel code
related to FUTEX_(UN)LOCK_PI. (Davidlohr Bueso)
- Let user have timestamps with per-thread recording in 'perf record'
(Adrian Hunter)
- ... and tons of other changes, see the shortlog and the Git log for
details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits)
perf evlist: Add backpointer for perf_env to evlist
perf tools: Rename perf_session_env to perf_env
perf tools: Do not change lib/api/fs/debugfs directly
perf tools: Add tracing_path and remove unneeded functions
perf buildid: Introduce sysfs/filename__sprintf_build_id
perf evsel: Add a backpointer to the evlist a evsel is in
perf trace: Add header with copyright and background info
perf scripts python: Add new compaction-times script
perf stat: Get correct cpu id for print_aggr
tools lib traceeveent: Allow for negative numbers in print format
perf script: Add --[no-]-demangle/--[no-]-demangle-kernel
tracing/uprobes: Do not print '0x (null)' when offset is 0
perf probe: Support probing at absolute address
perf probe: Fix error reported when offset without function
perf probe: Fix list result when address is zero
perf probe: Fix list result when symbol can't be found
tools build: Allow duplicate objects in the object list
perf tools: Remove export.h from MANIFEST
perf probe: Prevent segfault when reading probe point with absolute address
perf tools: Update Intel PT documentation
...
Pull x86/kasan changes from Ingo Molnar:
"These are two KASAN changes that factor out (and generalize) x86
specific KASAN code from x86 to mm"
* 'mm-kasan-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kasan, mm: Introduce generic kasan_populate_zero_shadow()
x86/kasan: Define KASAN_SHADOW_OFFSET per architecture
Pull inlining tuning from Ingo Molnar:
"A handful of inlining optimizations inspired by x86 work but
applicable in general"
* 'core-types-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jiffies: Force inlining of {m,u}msecs_to_jiffies()
x86/hweight: Force inlining of __arch_hweight{32,64}()
linux/bitmap: Force inlining of bitmap weight functions
Here's the "big" char/misc driver update for 4.3-rc1.
Not much really interesting here, just a number of little changes all
over the place, and some nice consolidation of the nvmem drivers to a
common framework. As usual, the mei drivers stand out as the largest
"churn" to handle new devices and features in their hardware.
All have been in linux-next for a while with no issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlXV844ACgkQMUfUDdst+ymYfQCgmDKjq3fsVHCxNZPxnukFYzvb
xZkAnRb8fuub5gVQFP29A+rhyiuWD13v
=Bq9K
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver patches from Greg KH:
"Here's the "big" char/misc driver update for 4.3-rc1.
Not much really interesting here, just a number of little changes all
over the place, and some nice consolidation of the nvmem drivers to a
common framework. As usual, the mei drivers stand out as the largest
"churn" to handle new devices and features in their hardware.
All have been in linux-next for a while with no issues"
* tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
auxdisplay: ks0108: initialize local parport variable
extcon: palmas: Fix build break due to devm_gpiod_get_optional API change
extcon: palmas: Support GPIO based USB ID detection
extcon: Fix signedness bugs about break error handling
extcon: Drop owner assignment from i2c_driver
extcon: arizona: Simplify pdata symantics for micd_dbtime
extcon: arizona: Declare 3-pole jack if we detect open circuit on mic
extcon: Add exception handling to prevent the NULL pointer access
extcon: arizona: Ensure variables are set for headphone detection
extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio
extcon: arizona: Add basic microphone detection DT/ACPI bindings
extcon: arizona: Update to use the new device properties API
extcon: palmas: Remove the mutually_exclusive array
extcon: Remove optional print_state() function pointer of struct extcon_dev
extcon: Remove duplicate header file in extcon.h
extcon: max77843: Clear IRQ bits state before request IRQ
toshiba laptop: replace ioremap_cache with ioremap
misc: eeprom: max6875: clean up max6875_read()
misc: eeprom: clean up eeprom_read()
misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write
...
s390: timekeeping changes, cleanups and fixes
x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups.
One interesting feature that was planned for 4.3 (emulating the local
APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to
be delayed because Intel complained about my reading of the manual.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJVznW4AAoJEL/70l94x66Dt+gH/3vydhh6kv+mKhnR+kADaGfM
gaunw0CUpJLU6gkOkYOm5M32WGhsT9Hd3WtRTJO6PhSo7cQ88hMx24u4XAffoewo
Os5tDwAaHeV2enVSTri6xX8e2F2mgPDghGcYJPUBwnmMjRzZ8tj2VHUcbxqVT6Pb
pX3V8ZxOZ81+ACZU2tdNRzLUd2H1v4d74gtVS7ove1Vb0CvPOBdHf1KQuUCUa2Pi
73fvnaEuSaFYtSWZIP1PYxLnsQHpApH3Kco/5kHeqUPpYaGa/g2bnfncHRw20Svr
gb3opwbfyiq91xfGbRVR3+E63Cw4G6aTl5MDNv9UFJ+xFKuj8WJ72xXXTSwzUi4=
=HgT+
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"A very small release for x86 and s390 KVM.
- s390: timekeeping changes, cleanups and fixes
- x86: support for Hyper-V MSRs to report crashes, and a bunch of
cleanups.
One interesting feature that was planned for 4.3 (emulating the local
APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to
be delayed because Intel complained about my reading of the manual"
* tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
x86/kvm: Rename VMX's segment access rights defines
KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn
kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic
KVM: s390: Fix assumption that kvm_set_irq_routing is always run successfully
KVM: VMX: drop ept misconfig check
KVM: MMU: fully check zero bits for sptes
KVM: MMU: introduce is_shadow_zero_bits_set()
KVM: MMU: introduce the framework to check zero bits on sptes
KVM: MMU: split reset_rsvds_bits_mask_ept
KVM: MMU: split reset_rsvds_bits_mask
KVM: MMU: introduce rsvd_bits_validate
KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c
KVM: MMU: fix validation of mmio page fault
KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON
KVM: s390: host STP toleration for VMs
KVM: x86: clean/fix memory barriers in irqchip_in_kernel
KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus
KVM: x86: remove unnecessary memory barriers for shared MSRs
KVM: move code related to KVM_SET_BOOT_CPU_ID to x86
KVM: s390: log capability enablement and vm attribute changes
...
We just need one macro of X86_EFLAGS_AC_BIT and X86_EFLAGS_AC.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1440669844-21535-1-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Given that a write-back (WB) mapping plus non-temporal stores is
expected to be the most efficient way to access PMEM, update the
definition of ARCH_HAS_PMEM_API to imply arch support for
WB-mapped-PMEM. This is needed as a pre-requisite for adding PMEM to
the direct map and mapping it with struct page.
The above clarification for X86_64 means that memcpy_to_pmem() is
permitted to use the non-temporal arch_memcpy_to_pmem() rather than
needlessly fall back to default_memcpy_to_pmem() when the pcommit
instruction is not available. When arch_memcpy_to_pmem() is not
guaranteed to flush writes out of cache, i.e. on older X86_32
implementations where non-temporal stores may just dirty cache,
ARCH_HAS_PMEM_API is simply disabled.
The default fall back for persistent memory handling remains. Namely,
map it with the WT (write-through) cache-type and hope for the best.
arch_has_pmem_api() is updated to only indicate whether the arch
provides the proper helpers to meet the minimum "writes are visible
outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()". Code
that cares whether wmb_pmem() actually flushes writes to pmem must now
call arch_has_wmb_pmem() directly.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[hch: set ARCH_HAS_PMEM_API=n on x86_32]
Reviewed-by: Christoph Hellwig <hch@lst.de>
[toshi: x86_32 compile fixes]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This should result in a pretty sizeable performance gain for reads. For
rough comparison I did some simple read testing using PMEM to compare
reads of write combining (WC) mappings vs write-back (WB). This was
done on a random lab machine.
PMEM reads from a write combining mapping:
# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s
PMEM reads from a write-back mapping:
# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s
To be able to safely support a write-back aperture I needed to add
support for the "read flush" _DSM flag, as outlined in the DSM spec:
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
This flag tells the ND BLK driver that it needs to flush the cache lines
associated with the aperture after the aperture is moved but before any
new data is read. This ensures that any stale cache lines from the
previous contents of the aperture will be discarded from the processor
cache, and the new data will be read properly from the DIMM. We know
that the cache lines are clean and will be discarded without any
writeback because either a) the previous aperture operation was a read,
and we never modified the contents of the aperture, or b) the previous
aperture operation was a write and we must have written back the dirtied
contents of the aperture to the DIMM before the I/O was completed.
In order to add support for the "read flush" flag I needed to add a
generic routine to invalidate cache lines, mmio_flush_range(). This is
protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
only supported on x86.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>