Commit 88e957d6e4 ("xen: introduce xen_vcpu_id mapping") broke SMP
ARM guests on Xen. When FIFO-based event channels are in use (this is
the default), evtchn_fifo_alloc_control_block() is called on
CPU_UP_PREPARE event and this happens before we set up xen_vcpu_id
mapping in xen_starting_cpu. Temporary fix the issue by setting direct
Linux CPU id <-> Xen vCPU id mapping for all possible CPUs at boot. We
don't currently support kexec/kdump on Xen/ARM so these ids always
match.
In future, we have several ways to solve the issue, e.g.:
- Eliminate all hypercalls from CPU_UP_PREPARE, do them from the
starting CPU. This can probably be done for both x86 and ARM and, if
done, will allow us to get Xen's idea of vCPU id from CPUID/MPIDR on
the starting CPU directly, no messing with ACPI/device tree
required.
- Save vCPU id information from ACPI/device tree on ARM and use it to
initialize xen_vcpu_id mapping. This is the same trick we currently
do on x86.
Reported-by: Julien Grall <julien.grall@arm.com>
Tested-by: Wei Chen <Wei.Chen@arm.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The PE for root bus (root PE) can be removed because of PCI hot
remove in EEH recovery path for fenced PHB error. We need update
@phb->root_pe_populated accordingly so that the root PE can be
populated again in forthcoming PCI hot add path. Also, the PE
shouldn't be destroyed as it's global and reserved resource.
Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
really ugly, but apparently avr32 compilers turns access_ok() into
something so bad that they want it in assembler. Left that way,
zeroing added in inline wrapper.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It could be done in exception-handling bits in __get_user_b() et.al.,
but the surgery involved would take more knowledge of sh64 details
than I have or _want_ to have.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* should zero on any failure
* __get_user() should use __copy_from_user(), not copy_from_user()
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
should clear on access_ok() failures. Also remove the useless
range truncation logics.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... that should zero on faults. Also remove the <censored> helpful
logics wrt range truncation copied from ppc32. Where it had ever
been needed only in case of copy_from_user() *and* had not been merged
into the mainline until a month after the need had disappeared.
A decade before openrisc went into mainline, I might add...
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) should not leave crap on fault
b) should _not_ require access_ok() in any cases.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It's -EFAULT, not -1 (and contrary to the comment in there,
__strnlen_user() can return 0 - on faults).
Cc: stable@vger.kernel.org
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It should check access_ok(). Otherwise a bunch of places turn into
trivially exploitable rootholes.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* copy_from_user() on access_ok() failure ought to zero the destination
* none of those primitives should skip the access_ok() check in case of
small constant size.
Cc: stable@vger.kernel.org
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull x86 fixes from Ingo Molnar:
"Three fixes:
- AMD microcode loading fix with randomization
- an lguest tooling fix
- and an APIC enumeration boundary condition fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix num_processors value in case of failure
tools/lguest: Don't bork the terminal in case of wrong args
x86/microcode/AMD: Fix load of builtin microcode with randomized memory
Pull perf fixes from Ingo Molnar:
"This contains:
- a set of fixes found by directed-random perf fuzzing efforts by
Vince Weaver, Alexander Shishkin and Peter Zijlstra
- a cqm driver crash fix
- an AMD uncore driver use after free fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix PEBSv3 record drain
perf/x86/intel/bts: Kill a silly warning
perf/x86/intel/bts: Fix BTS PMI detection
perf/x86/intel/bts: Fix confused ordering of PMU callbacks
perf/core: Fix aux_mmap_count vs aux_refcount order
perf/core: Fix a race between mmap_close() and set_output() of AUX events
perf/x86/amd/uncore: Prevent use after free
perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
perf/core: Remove WARN from perf_event_read()
Pull EFI fixes from Ingo Molnar:
"This contains a Xen fix, an arm64 fix and a race condition /
robustization set of fixes related to ExitBootServices() usage and
boundary conditions"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Use efi_exit_boot_services()
efi/libstub: Use efi_exit_boot_services() in FDT
efi/libstub: Introduce ExitBootServices helper
efi/libstub: Allocate headspace in efi_get_memory_map()
efi: Fix handling error value in fdt_find_uefi_params
efi: Make for_each_efi_memory_desc_in_map() cope with running on Xen
Commit f70ddc07b6 ("MIPS: c-r4k: Avoid small flush_icache_range SMP
calls") adds checks to force use of hit-type cache ops for small icache
flushes where they are globalised & index-type cache ops aren't, in
order to avoid the overhead of IPIs in those cases. However it
calculated the size of the region being flushed incorrectly, subtracting
the end address from the start address rather than the reverse. This
would have led to an overflow with size wrapping round to some large
value, and likely to the special case for avoiding IPIs not actually
being hit.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Fixes: f70ddc07b6 ("MIPS: c-r4k: Avoid small flush_icache_range SMP calls")
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14211/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If the paravirt machine is compiles without CONFIG_SMP, the following
linker error occurs
arch/mips/kernel/head.o: In function `kernel_entry':
(.ref.text+0x10): undefined reference to `smp_bootstrap'
due to the kernel entry macro always including SMP startup code.
Wrap this code in CONFIG_SMP to fix the error.
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 3.16+
Patchwork: https://patchwork.linux-mips.org/patch/14212/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit c1a0e9bc88 ("MIPS: Allow compact branch policy to be changed")
added Kconfig entries allowing for the compact branch policy used by the
compiler for MIPSr6 kernels to be specified. This can be useful for
debugging, particularly in systems where compact branches have recently
been introduced.
Unfortunately mainline gcc 5.x supports MIPSr6 but not the
-mcompact-branches compiler flag, leading to MIPSr6 kernels failing to
build with gcc 5.x with errors such as:
mipsel-linux-gnu-gcc: error: unrecognized command line option '-mcompact-branches=optimal'
make[2]: *** [kernel/bounds.s] Error 1
Fixing this by hiding the Kconfig entry behind another seems to be more
hassle than it's worth, as MIPSr6 & compact branches have been around
for a while now and if policy does need to be set for debug it can be
done easily enough with KCFLAGS. Therefore remove the compact branch
policy Kconfig entries & their handling in the Makefile.
This reverts commit c1a0e9bc88 ("MIPS: Allow compact branch policy to
be changed").
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: c1a0e9bc88 ("MIPS: Allow compact branch policy to be changed")
Cc: stable <stable@vger.kernel.org> # v4.4+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14241/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The alignment of MIPS MAAR region addresses isn't quite right.
- It rounds an already 64 KiB aligned start address up to the next
64 KiB boundary, e.g. 0x80000000 is rounded up to 0x80010000.
- It assumes the end address is already on a 64 KiB boundary and doesn't
round it down. Should that not be the case it will hit the second
BUG_ON() in write_maar_pair().
Both cases are addressed by rounding up and down to 64 KiB boundaries in
the more traditional way of adding 0xffff (for rounding up) and masking
off the low 16 bits.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13858/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Memory regions added with add_memory_region() at the top of the physical
address space will have their end address overflow to 0. This causes
them to be rejected as invalid, and would cause various other issues
later on.
This causes issues on Malta and Boston platforms when wanting to use all
2GB of RAM on a 32-bit kernel, either via highmem (using physical
addresses 0x90000000..0xFFFFFFFF), or with the Malta Enhanced Virtual
Addressing (EVA) layout which exposes the whole 0x80000000..0xFFFFFFFF
physical address range to kernel mode at 0x00000000..0x7FFFFFFF.
Due to the abundance of these non-overflow assumptions and the fact that
memblock already avoids the arithmetic overflow by limiting the size of
new memory regions without the arch code knowing it (in particular
mem_init_free_highmem() will trigger a page dump due to nonzero mapcount
on the last page), it is simpler and safer to just limit the size of the
region in a similar way to memblock but at the arch level to allow most
of the RAM to be used without arithmetic overflows.
Therefore we detect this case specifically and reduce the size of the
region slightly to avoid the arithmetic overflows and cause the last
page to be ignored.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13857/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When a uprobe-replacement breakpoint instruction is handled, a notifier
is called with DIE_UPROBE argument, but a corresponding exception notify
handler for MIPS attempts to handle DIE_BREAK instead. As a result
the breakpoint instruction isn't handled by the uprobe code and the probed
application terminates with SIGTRAP.
Fix this by changing arch_uprobe_exception_notify code to handle
DIE_UPROBE as a pre-singlestep condition instead of DIE_BREAK.
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13884/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
clk_register_fixed_factor returns an ERR_PTR in case of an error and
should have an IS_ERR check instead of a null check.
The Coccinelle semantic patch used to find this issue is as follows:
@@
expression e;
statement S;
@@
*e = clk_register_fixed_factor(...);
if (!e) S
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Cc: julia.lawall@lip6.fr
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13894/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The AES-CTR glue code avoids calling into the blkcipher API for the
tail portion of the walk, by comparing the remainder of walk.nbytes
modulo AES_BLOCK_SIZE with the residual nbytes, and jumping straight
into the tail processing block if they are equal. This tail processing
block checks whether nbytes != 0, and does nothing otherwise.
However, in case of an allocation failure in the blkcipher layer, we
may enter this code with walk.nbytes == 0, while nbytes > 0. In this
case, we should not dereference the source and destination pointers,
since they may be NULL. So instead of checking for nbytes != 0, check
for (walk.nbytes % AES_BLOCK_SIZE) != 0, which implies the former in
non-error conditions.
Fixes: 49788fe2a1 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions")
Cc: stable@vger.kernel.org
Reported-by: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES-CTR glue code avoids calling into the blkcipher API for the
tail portion of the walk, by comparing the remainder of walk.nbytes
modulo AES_BLOCK_SIZE with the residual nbytes, and jumping straight
into the tail processing block if they are equal. This tail processing
block checks whether nbytes != 0, and does nothing otherwise.
However, in case of an allocation failure in the blkcipher layer, we
may enter this code with walk.nbytes == 0, while nbytes > 0. In this
case, we should not dereference the source and destination pointers,
since they may be NULL. So instead of checking for nbytes != 0, check
for (walk.nbytes % AES_BLOCK_SIZE) != 0, which implies the former in
non-error conditions.
Fixes: 86464859cc ("crypto: arm - AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions")
Cc: stable@vger.kernel.org
Reported-by: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add the required PCMCIA clock for the SA1111 "1800" device. This clock
is used to compute timing information for the PCMCIA interface in the
SoC device, rather than the SA1111. Hence, the provision of this clock
is a convenience for the driver and does not reflect the hardware, so
this must not be copied into DT.
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Accidentally booting Collie on Assabet reveals that the locomo driver
incorrectly overwrites gpio-sa1100's chip data for its parent interrupt,
leading to oops in sa1100_gpio_unmask() and sa1100_update_edge_regs()
when "gpio: sa1100: convert to use IO accessors" is applied. Fix locomo
to use the handler data rather than chip data for its parent interrupt.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The cachepolicy variable gets initialized using a masked pmd
value. So far, the pmd has been masked with flags valid for the
2-page table format, but the 3-page table format requires a
different mask. On LPAE, this lead to a wrong assumption of what
initial cache policy has been used. Later a check forces the
cache policy to writealloc and prints the following warning:
Forcing write-allocate cache policy for SMP
This patch introduces a new definition PMD_SECT_CACHE_MASK for
both page table formats which masks in all cache flags in both
cases.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
SA1111 PCMCIA was broken when PCMCIA switched to using dev_pm_ops for
the PCMCIA socket class. PCMCIA used to handle suspend/resume via the
socket hosting device, which happened at normal device suspend/resume
time.
However, the referenced commit changed this: much of the resume now
happens much earlier, in the noirq resume handler of dev_pm_ops.
However, on SA1111, the PCMCIA device is not accessible as the SA1111
has not been resumed at _noirq time. It's slightly worse than that,
because the SA1111 has already been put to sleep at _noirq time, so
suspend doesn't work properly.
Fix this by converting the core SA1111 code to use dev_pm_ops as well,
and performing its own suspend/resume at noirq time.
This fixes these errors in the kernel log:
pcmcia_socket pcmcia_socket0: time out after reset
pcmcia_socket pcmcia_socket1: time out after reset
and the resulting lack of PCMCIA cards after a S2RAM cycle.
Fixes: d7646f7632 ("pcmcia: use dev_pm_ops for class pcmcia_socket_class")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The polarity of the high IRQs was being calculated using
SA1111_IRQMASK_HI(), but this assumes a Linux interrupt number, not a
hardware interrupt number. Hence, the resulting mask was incorrect.
Fix this.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Ensure that we propagate the platform_get_irq() error code out of the
probe function. This allows probe deferrals to work correctly should
platform_get_irq() not be able to resolve the interrupt in a DT
environment at probe time.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The number of CPU feature keys is meant to map 1:1 to the number of CPU
feature flags defined in cputable.h, and the latter must fit in an
unsigned long.
In commit 4db7327194 ("powerpc: Add option to use jump label for
cpu_has_feature()"), I incorrectly defined NUM_CPU_FTR_KEYS to 64.
There should be no real adverse consequences of this bug, other than us
allocating too many keys.
Fix it by using BITS_PER_LONG.
Fixes: 4db7327194 ("powerpc: Add option to use jump label for cpu_has_feature()")
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pnv_wakeup_tb_loss() currently expects cr4 to be "eq" if the CPU is
waking up from a complete hypervisor state loss. Hence, it currently
restores the SPR contents only if cr4 is "eq".
However, after commit bcef83a00d ("powerpc/powernv: Add platform
support for stop instruction"), on ISA v3.0 CPUs, the function
pnv_restore_hyp_resource() sets cr4 to contain the result of the
comparison between the state the CPU has woken up from and the first
deep stop state before calling pnv_wakeup_tb_loss().
Thus if the CPU woke up from a state that is deeper than the first
deep stop state, cr4 will have "gt" set and hence, pnv_wakeup_tb_loss()
will fail to restore the SPRs on waking up from such a state.
Fix the code in pnv_wakeup_tb_loss() to restore the SPR states when cr4
is "eq" or "gt".
Fixes: bcef83a00d ("powerpc/powernv: Add platform support for stop instruction")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Shreyas B. Prabhu <shreyasbp@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull libnvdimm fixes from Dan Williams:
"nvdimm fixes for v4.8, two of them are tagged for -stable:
- Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise,
DAX pmd mappings end up with an uncached pgprot, and unusable
performance for the device-dax interface. The device-dax interface
appeared in 4.7 so this is tagged for -stable.
- Fix a couple VM_BUG_ON() checks in the show_smaps() path to
understand DAX pmd entries. This fix is tagged for -stable.
- Fix a mis-merge of the nfit machine-check handler to flip the
polarity of an if() to match the final version of the patch that
Vishal sent for 4.8-rc1. Without this the nfit machine check
handler never detects / inserts new 'badblocks' entries which
applications use to identify lost portions of files.
- For test purposes, fix the nvdimm_clear_poison() path to operate on
legacy / simulated nvdimm memory ranges. Without this fix a test
can set badblocks, but never clear them on these ranges.
- Fix the range checking done by dax_dev_pmd_fault(). This is not
tagged for -stable since this problem is mitigated by specifying
aligned resources at device-dax setup time.
These patches have appeared in a next release over the past week. The
recent rebase you can see in the timestamps was to drop an invalid fix
as identified by the updated device-dax unit tests [1]. The -mm
touches have an ack from Andrew"
[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm: allow legacy (e820) pmem region to clear bad blocks
nfit, mce: Fix SPA matching logic in MCE handler
mm: fix cache mode of dax pmd mappings
mm: fix show_smap() for zone_device-pmd ranges
dax: fix mapping size check
Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running
the perf fuzzer.
This means the PEBSv3 record included a status bit for an inactive
event, something that _should_ not happen.
Move the code that filters the status bits against our known PEBS
events up a spot to guarantee we only deal with events we know about.
Further add "continue" statements to the WARN_ON_ONCE()s such that
we'll not die nor generate silly events in case we ever do hit them
again.
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: stable@vger.kernel.org
Fixes: a3d86542de ("perf/x86/intel/pebs: Add PEBSv3 decoding")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At the moment, intel_bts will WARN() out if there is more than one
event writing to the same ring buffer, via SET_OUTPUT, and will only
send data from one event to a buffer.
There is no reason to have this warning in, so kill it.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since BTS doesn't have a dedicated PMI status bit, the driver needs to
take extra care to check for the condition that triggers it to avoid
spurious NMI warnings.
Regardless of the local BTS context state, the only way of knowing that
the NMI is ours is to compare the write pointer against the interrupt
threshold.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The intel_bts driver is using a CPU-local 'started' variable to order
callbacks and PMIs and make sure that AUX transactions don't get messed
up. However, the ordering rules in regard to this variable is a complete
mess, which recently resulted in perf_fuzzer-triggered warnings and
panics.
The general ordering rule that is patch is enforcing is that this
cpu-local variable be set only when the cpu-local AUX transaction is
active; consequently, this variable is to be checked before the AUX
related bits can be touched.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage. DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).
track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range. While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not. So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.
Cc: <stable@vger.kernel.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The resent conversion of the cpu hotplug support in the uncore driver
introduced a regression due to the way the callbacks are invoked at
initialization time.
The old code called the prepare/starting/online function on each online cpu
as a block. The new code registers the hotplug callbacks in the core for
each state. The core invokes the callbacks at each registration on all
online cpus.
The code implicitely relied on the prepare/starting/online callbacks being
called as combo on a particular cpu, which was not obvious and completely
undocumented.
The resulting subtle wreckage happens due to the way how the uncore code
manages shared data structures for cpus which share an uncore resource in
hardware. The sharing is determined in the cpu starting callback, but the
prepare callback allocates per cpu data for the upcoming cpu because
potential sharing is unknown at this point. If the starting callback finds
a online cpu which shares the hardware resource it takes a refcount on the
percpu data of that cpu and puts the own data structure into a
'free_at_online' pointer of that shared data structure. The online callback
frees that.
With the old model this worked because in a starting callback only one non
unused structure (the one of the starting cpu) was available. The new code
allocates the data structures for all cpus when the prepare callback is
registered.
Now the starting function iterates through all online cpus and looks for a
data structure (skipping its own) which has a matching hardware id. The id
member of the data structure is initialized to 0, but the hardware id can
be 0 as well. The resulting wreckage is:
CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
its own data structure into CPU1s data structure to be freed.
CPU1 skips CPU0 because the data structure is its allegedly unsued own.
It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
its own data structure into CPU2s data structure to be freed.
....
Now the online callbacks are invoked.
CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
far so good.
CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
is still referenced by CPU0 ---> Booom
So there are two issues to be solved here:
1) The id field must be initialized at allocation time to a value which
cannot be a valid hardware id, i.e. -1
This prevents the above scenario, but now CPU1 and CPU2 both stick their
own data structure into the free_at_online pointer of CPU0. So we leak
CPU1s data structure.
2) Fix the memory leak described in #1
Instead of having a single pointer, use a hlist to enqueue the
superflous data structures which are then freed by the first cpu
invoking the online callback.
Ideally we should know the sharing _before_ invoking the prepare callback,
but that's way beyond the scope of this bug fix.
[ tglx: Rewrote changelog ]
Fixes: 96b2bd3866 ("perf/x86/amd/uncore: Convert to hotplug state machine")
Reported-and-tested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
generic definition to smp_wmb() is not sufficient
- avoid a recursive loop with the graph tracer by using using
preempt_(enable|disable)_notrace in _percpu_(read|write)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX0vFPAAoJEGvWsS0AyF7x1jsP/0PuYxMPfkBBrL6dKfcwsBXR
9eKhIui8gwkjtpvpeDMSia8+5V+UvFYhB/oaPS5tbSaP1DzS/1Uhmr8Nd17fEBZH
nJZfodCRISSy+kOIiEIMAb00pXrkJpgn4Fy6Yf5jgKjrUvLYkLaeNNR9WQ1Nrwbl
tOmK+974fTtRN+46Ht3vS4hX6kqSDr7Ze8ezFLZc5Nt1tS3R7D0gfVnE3DBQ+x9y
gg+01WvGa2K/Iz0ur7zZo5eLVV7L+JfFvkBQCNxfFMkSt8SzOR+i6bPzqTGjy9Ge
rW1l4GOCXw/Di2PfPvQ0+BpJx/HY4vjmEdItdKm1ZOySPqJkl08wOF2NFse2ZYka
cUC+vlZE7zDjoHVOVYgjYTJtXvBZnwakIZ9Oxhsjw3h93hGPOXrJF5y6uYLiMCJa
5yASzwX617hFBxO8VrIeqjVw7z2BLHLMiPE6uhiQ8TJiiwHXMxJzaVnBaGL0xidA
qPzn7DhAHUsEb3bxYixu5uA/uyvJ7l33iRjOSObTUWxtBxMEvgajnxh1rkMl7lfY
izCvwjtIU/1mAae3MA4qddzrxuPwp3GtmXC3Tb/rxzD4fgvkvqrGHrXyufeLYUWC
z40kuGyp5ZMxTArPbno6TurseXeIWJQOW9/ya28FLzNbHC7RenIHBl7bSyoDYnFu
RGCmr4g2q3hKG5pjJ+1P
=QLnH
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- smp_mb__before_spinlock() changed to smp_mb() on arm64 since the
generic definition to smp_wmb() is not sufficient
- avoid a recursive loop with the graph tracer by using using
preempt_(enable|disable)_notrace in _percpu_(read|write)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: use preempt_disable_notrace in _percpu_read/write
arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
With the introduction of critical-clock support in v4.8, our developers'
default configuration is to run with 'clk_ignore_unused' removed. This
patch-set ensures they can achieve successful boot when a) booting from
an SD Card and when b) booting using USB->Eth adaptors for NFS booting.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX0XH9AAoJEMrHeC97M/+mKo4QAJLsI+dcHn5vAhX8wqL0r2HJ
n2T0g8zCUZ+WkDybdgicRTfvhC59GZJ7x6T6i9I9iTNOzAifXGFnagdxzf9crUJ4
bMKP6RKNVNFWzaKE1Xt/gMGCGneLIv63ALhxNxcC8eDT49ws0qXqfP21NTXH1AVi
lFf238pEHpOV324lnziN879cpz/T8ZlYa61KCx+d5Od+xkzm5XBzLH8ZxzOAAKFY
WjgER1j3BZwDZhbsT03Z6F3TiiLrDwBKiARe7ccHnIwfdt0Mk4kjSznMZw4ZIH4d
2V/Rj0Id/MqBrODiw2EXyjA6YPCqJfr3y4wh+15AI+XUCfWEEWsrRfJvBxabFduJ
1pK3WjvEqZXCUE1ow6fUZ9bsVPNiGxSGKcdhI10CgEAuOBgbzBqQyTIty4hCGdn3
4k6Jpk1tXeOZC4Uzq9B28eL1viipyo0uOJdKNHEzXaah9QnwGQgD0spIhvKV4/TY
C3Ov+3eON0sUET4f/XCziNGi+1gF4YUOw0V6iX3rHT1UtDtRvGDznjS125wew1Xp
OXFINmYU3J6o8InF+SvPwfTbeZZBXlqJ/Q5EPFDHGe5q/cYF8wDvU4yTWeNzqjNj
ZOCcKakLZtiq4HfsbeUKX8m64hi+bFmsTIfTj99EgyB/PeCnoWnmHLM/cf94nDTf
Ltjajd2G8NPvzfeyp/Ys
=lGFr
-----END PGP SIGNATURE-----
Merge tag 'sti-dt-fixes-for-v4.8-rcs' of git://git.kernel.org/pub/scm/linux/kernel/git/pchotard/sti into fixes
Pull "Handle STiH410 interconnect clock required for EHCI/OHCI and SDHCI" from Patrice Chotard:
With the introduction of critical-clock support in v4.8, our developers'
default configuration is to run with 'clk_ignore_unused' removed. This
patch-set ensures they can achieve successful boot when a) booting from
an SD Card and when b) booting using USB->Eth adaptors for NFS booting.
* tag 'sti-dt-fixes-for-v4.8-rcs' of git://git.kernel.org/pub/scm/linux/kernel/git/pchotard/sti:
ARM: dts: STiH407-family: Provide interconnect clock for consumption in ST SDHCI
ARM: dts: STiH410: Handle interconnect clock required by EHCI/OHCI (USB)
The ../../../arm... style cross-references added by commit 9d56c22a78
("ARM: bcm2835: Add devicetree for the Raspberry Pi 3.") do not work in the
context of the split device-tree repository[0] (where the directory
structure differs). As with commit 8ee57b8182 ("ARM64: dts: vexpress: Use
a symlink to vexpress-v2m-rs1.dtsi from arch=arm") use symlinks instead.
[0] https://git.kernel.org/cgit/linux/kernel/git/devicetree/devicetree-rebasing.git/
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Lee Jones <lee@kernel.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-rpi-kernel@lists.infradead.org
Cc: arm@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This file is included from DTS files under arch/arm64 too (via
broadcom/bcm2837-rpi-3-b.dts and broadcom/bcm2837.dtsi). There is a desire
not to have skeleton.dtsi for ARM64. See commit 3ebee5a2e1 ("arm64: dts:
kill skeleton.dtsi") for rationale for its removal.
As well as the addition of #*-cells also requires adding the device_type to
the rpi memory node explicitly.
Note that this change results in the removal of an empty /aliases node from
bcm2835-rpi-a.dtb and bcm2835-rpi-a-plus.dtb. I have no hardware to check
if this is a problem or not.
It also results in some reordering of the nodes in the DTBs (the /aliases
and /memory nodes come later). This isn't supposed to matter but, again,
I've no hardware to check if it is true in this particular case.
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Lee Jones <lee@kernel.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-rpi-kernel@lists.infradead.org
Cc: arm@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes an idmap issue on 32-bit KVM on ARM, and fixes a memory unmapping
issue that we've had forever.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJX0pHmAAoJEEtpOizt6ddy1LcIAI0bolSI1veypr1jCE3rTqPU
3y0jnsOUDVYFWAG6g+fy8navkzwRxp9bgR4B9JZ35P/A0Te2vynajikf24ts/n2F
oKQ4p1gq8AninXLYCJNcnlJ2yaB8beHzdPzp++IyayI1gc9CGfSlHdltg5CID/lT
f5hMgrPv1n5tgD2wWZkqLTr84DQfEMopWg7ICgH3BgPwyS4ogs1/JJF4M+MN6nUP
lcAvukxnH62246dXsNOpjtubUmruK3ie16mqxCnRiT8uYnfS4JVsxLgHUqR6U/es
/0aASc0oRhzEafGFiVC6TtrZvkSirbBw363WnswFyfafXrF+4P6Vhd9KnqZ11MI=
=boFc
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-fixes-for-v4.8-round2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/ARM Fixes for v4.8, round 2
Fixes an idmap issue on 32-bit KVM on ARM, and fixes a memory unmapping
issue that we've had forever.
- Don't alias user region to other regions below PAGE_OFFSET from Paul Mackerras
- Fix again csum_partial_copy_generic() on 32-bit from Christophe Leroy
- Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
Fixes for code merged this cycle:
- Fix crash on releasing compound PE from Gavin Shan
- Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
- Fix little endian build with CONFIG_KEXEC=n from Thiago Jung Bauermann
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX0qKlAAoJEFHr6jzI4aWAu30QAK88plLxJ2z5Lyf7axdHf7P0
NfM6xQLyuUQ1xP3ksUB4/4eps3jINJ0bZRd4mMJ3fO/YgNsBACayB9GeE478tMbp
1KmK94qpLk1u4BUxXNtsWJLEuzVqAPr2cGh6jddmkPXGCUx1MFatEVNVJupX+Vt9
sJsmhLatUucZEQI6r4sK5wDOdLYIQgcgTIWW5qHH7jyJDKLGyJbNPtmQhbMWU0a5
zBwD+paecJSGTJEVjd3UwBic+oXt8chwiZkaHLu4Rh6JQ0yVRL4If4EYCodHIpDR
H7b0P9De9W6a+IWLjVDMhYKq9rBjjgZwcjMplkO7gBE2P+v/NGzbfORJtNXeOgKE
/RSWufpTbpiGyUzP1Lr/j0O59ZoijRGBK8zuha5FtsTlhl909ifc6KuHO5aqHY9r
I5o7ws+hSBM1u9cf0Bl011P4uToYzy1auMsZsjDW2SdDEFtJ+WK+0I2vp+M9Jv73
/F48n/EWUuul5oS2Uar+V2AUADpnYPRi50OR1zVJxdJSM8bZFue4brBFfx1bI/2/
jmK87hxNwJtYT45KiuEXr2FWMiB1iNHHxL/OEwWbitf2MfRjq8+LHbdt9FxOSj3/
+8cw3f1zyEjNsvH380HhkUBZknKmD7z8V5Ko5Dx5h8cuRlL+QEW2GnW+1NN7VMoQ
T7QTHRR4ziHSKdzAIlTe
=q0Jo
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Don't alias user region to other regions below PAGE_OFFSET from
Paul Mackerras
- Fix again csum_partial_copy_generic() on 32-bit from Christophe
Leroy
- Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
Fixes for code merged this cycle:
- Fix crash on releasing compound PE from Gavin Shan
- Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
- Fix little endian build with CONFIG_KEXEC=n from Thiago Jung
Bauermann"
* tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
powerpc/32: Fix again csum_partial_copy_generic()
powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE
powerpc/powernv: Fix crash on releasing compound PE
powerpc/xics/opal: Fix processor numbers in OPAL ICP
powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
Pull ARM fixes from Russell King:
"A few ARM fixes:
- Robin Murphy noticed that the non-secure privileged entry was
relying on undefined behaviour, which needed to be fixed.
- Vladimir Murzin noticed that prov-v7 fails to build for MMUless
configurations because a required header file wasn't included.
- A bunch of fixes for StrongARM regressions found while testing
4.8-rc on such platforms"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: sa1100: clear reset status prior to reboot
ARM: 8600/1: Enforce some NS-SVC initialisation
ARM: 8599/1: mm: pull asm/memory.h explicitly
ARM: sa1100: register clocks early
ARM: sa1100: fix 3.6864MHz clock
When debug preempt or preempt tracer is enabled, preempt_count_add/sub()
can be traced by function and function graph tracing, and
preempt_disable/enable() would call preempt_count_add/sub(), so in Ftrace
subsystem we should use preempt_disable/enable_notrace instead.
In the commit 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap
like events do") the function this_cpu_read() was added to
trace_graph_entry(), and if this_cpu_read() calls preempt_disable(), graph
tracer will go into a recursive loop, even if the tracing_on is
disabled.
So this patch change to use preempt_enable/disable_notrace instead in
this_cpu_read().
Since Yonghui Yang helped a lot to find the root cause of this problem,
so also add his SOB.
Signed-off-by: Yonghui Yang <mark.yang@spreadtrum.com>
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
to a full barrier, such that prior stores are ordered with respect to
loads and stores occuring inside the critical section.
Unfortunately, the core code defines the barrier as smp_wmb(), which
is insufficient to provide the required ordering guarantees when used in
conjunction with our load-acquire-based spinlock implementation.
This patch overrides the arm64 definition of smp_mb__before_spinlock()
to map to a full smp_mb().
Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via
mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when
the userspace buffer gets unmapped. However, when the Hypervisor
process exits without explicit unmap of the guest buffers, the only
notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release
) which does nothing on arm. Later this causes us to access pages that
were already released [via exit_mmap() -> unmap_vmas()] when we actually
get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() ->
kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC,
which unmaps any free'd pages from the linear map.
[ 757.644120] Unable to handle kernel paging request at virtual address
ffff800661e00000
[ 757.652046] pgd = ffff20000b1a2000
[ 757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003,
*pmd=00000047fcc7c003, *pte=00e8004661e00712
[ 757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP
[ 757.672041] Modules linked in:
[ 757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G D
4.8.0-rc1 #3
[ 757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board,
BIOS 3.06.15 Aug 19 2016
[ 757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000
[ 757.698840] PC is at __flush_dcache_area+0x1c/0x40
[ 757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70
[ 757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145
...
[ 758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40
[ 758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0
[ 758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60
[ 758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68
[ 758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358
[ 758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40
[ 758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8
[ 758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18
[ 758.400869] [<ffff200008104658>] task_work_run+0x108/0x138
[ 758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8
[ 758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130
[ 758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18
[ 758.421943] [<ffff20000808a098>] do_signal+0x158/0x860
[ 758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88
[ 758.432608] [<ffff200008083624>] work_pending+0x10/0x14
[ 758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20)
This patch fixes the issue by moving the kvm_free_stage2_pgd() to
kvm_arch_flush_shadow_all().
Cc: <stable@vger.kernel.org> # 3.9+
Tested-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Reported-by: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
This is a slightly larger batch of fixes that we've been sitting on a few
-rcs. Most of them are simple oneliners, but there are two sets that are
slightly larger and worth pointing out:
- A set of patches to OMAP to deal with hwmod for RTC on am33xx (beaglebone
SoC, among others). It's the only clock that ever has a valid offset of 0,
so a new flag needed introduction once this problem was discovered.
- A collection of CCI fixes for performance counters discovered once people
started using it on X-Gene CPUs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX0OnNAAoJEIwa5zzehBx3C80P/3Adca+87uQPI57AWJL7TIU3
i9YAf1povBg/u5PHn2XiNj2gBeYtDL6vKTbca9zDX/ezSPaGFAp1nPsUG3m4yKLG
0FjkNRan+FvFoi195TxANrBeYE39/ZNWJCsPn01+2Q2EsYnlCn9wsFglxTQGObsN
MI2rgYVxabdJC8TO5WuGUdJEa9TynU65fdqdD+KR+DfA8yOyu9wZEmE1JuHprUCQ
H2MYYHHFZD5Y41/Y/SvdDVprCOd8lq8Syt7Vk1O6pQKQSGgojgGn36YJeBIapnOy
4y1/FFqoRe5FwSbt934q/4P/QpHwnGvIFi0edS+Reb6/3gUWRchQ8Dt1I1C/2QoJ
J+7KQKR4SBSiu6LnsP8gNOAqulpyGzwE2aHBVD1mpbPIynA7G5MfI1pEJw4nVvFQ
rAlBmuVzjaysuhKAdM32fd34dVwQljDiqh4qWtk0hbgjyPpdxT7QjIwSCR6vfbzo
toOKQ2eiIqF+syOAnXbpPSmtaURavxvBim4DEXQfVHUbFHVErZ5zHmwGWjDc5vLT
WixQ62hilHc08pcalY31RF3k21GwBXBVDf6clQ/9TDCErlCAlpjyN0/4ZrRpcYwz
NHpFdbBNLzW0VJ0gHtwb+Xz4qgcUneTxXDUTaU46MYRquML/uqXMFrkJUQEOq1h5
nPGe55ItupK8REaMCJv/
=8pPd
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"This is a slightly larger batch of fixes that we've been sitting on a
few -rcs. Most of them are simple oneliners, but there are two sets
that are slightly larger and worth pointing out:
- A set of patches to OMAP to deal with hwmod for RTC on am33xx
(beaglebone SoC, among others). It's the only clock that ever has
a valid offset of 0, so a new flag needed introduction once this
problem was discovered.
- A collection of CCI fixes for performance counters discovered once
people started using it on X-Gene CPUs"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (37 commits)
arm-cci: pmu: Fix typo in event name
Revert "ARM: tegra: fix erroneous address in dts"
ARM: dts: imx6qdl: Fix SPDIF regression
ARM: imx6: add missing BM_CLPCR_BYPASS_PMIC_READY setting for imx6sx
ARM: dts: imx7d-sdb: fix ti,x-plate-ohms property name
ARM: dts: kirkwood: Fix PCIe label on OpenRD
ARM: kirkwood: ib62x0: fix size of u-boot environment partition
bus: arm-ccn: make event groups reliable
bus: arm-ccn: fix hrtimer registration
bus: arm-ccn: fix PMU interrupt flags
ARM: tegra: Correct polarity for Tegra114 PMIC interrupt
MAINTAINERS: add tree entry for ARM/UniPhier architecture
ARM: sun5i: Fix typo in trip point temperature
MAINTAINERS: Switch to kernel.org account for Krzysztof Kozlowski
ARM: imx6ul: populates platform device at .init_machine
bus: arm-ccn: Add missing event attribute exclusions for host/guest
bus: arm-ccn: Correct required arguments for XP PMU events
bus: arm-ccn: Fix XP watchpoint settings bitmask
bus: arm-ccn: Do not attempt to configure XPs for cycle counter
bus: arm-ccn: Fix PMU handling of MN
...
When booting a kvm guest on AMD with the latest kernel the following
messages are displayed in the boot log:
tsc: Unable to calibrate against PIT
tsc: HPET/PMTIMER calibration failed
aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
introduced a change to account for a difference in cpu and tsc frequencies for
Intel SKL processors. Before this change the native tsc set
x86_platform.calibrate_tsc to native_calibrate_tsc() which is a hardware
calibration of the tsc, and in tsc_init() executed
tsc_khz = x86_platform.calibrate_tsc();
cpu_khz = tsc_khz;
The kvm code changed x86_platform.calibrate_tsc to kvm_get_tsc_khz() and
executed the same tsc_init() function. This meant that KVM guests did not
execute the native hardware calibration function.
After aa297292d7, there are separate native calibrations for cpu_khz and
tsc_khz. The code sets x86_platform.calibrate_tsc to native_calibrate_tsc()
which is now an Intel specific calibration function, and
x86_platform.calibrate_cpu to native_calibrate_cpu() which is the "old"
native_calibrate_tsc() function (ie, the native hardware calibration
function).
tsc_init() now does
cpu_khz = x86_platform.calibrate_cpu();
tsc_khz = x86_platform.calibrate_tsc();
if (tsc_khz == 0)
tsc_khz = cpu_khz;
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;
The kvm code should not call the hardware initialization in
native_calibrate_cpu(), as it isn't applicable for kvm and it didn't do that
prior to aa297292d7.
This patch resolves this issue by setting x86_platform.calibrate_cpu to
kvm_get_tsc_khz().
v2: I had originally set x86_platform.calibrate_cpu to
cpu_khz_from_cpuid(), however, pbonzini pointed out that the CPUID leaf
in that function is not available in KVM. I have changed the function
pointer to kvm_get_tsc_khz().
Fixes: aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Len Brown <len.brown@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The STiH4{07,10} platform contains some interconnect clocks which are used
by various IPs. If this clock isn't handled correctly by ST's EHCI/OHCI
drivers, their hub won't be found, the following error be shown and the
result will be non-working USB:
[ 97.221963] hub 2-1:1.0: hub_ext_port_status failed (err = -110)
Cc: stable@vger.kernel.org
Tested-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Patrice Chotard <patrice.chotard@st.com>
If the topology package map check of the APIC ID and the CPU is a failure,
we don't generate the processor info for that APIC ID yet we increase
disabled_cpus by one - which is buggy.
Only increase num_processors once we are sure we don't fail.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A single patch fixing a typo in the temperature trip points in the A13
DTSI.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIbBAABAgAGBQJX0GviAAoJEBx+YmzsjxAgpV0P9Rsn4KpUj7UpTdaJEJ9BILaS
ox1+cxia9kYI3xWA2aYK2vUW+eqvzbngDzrX7dr2RvEKA8H4SJttasPmYxFmpUun
GeW5/YKDZDWrvPr7ZA+1ooqF6zamqX5iXvpW0JaTm/LWkVRFTbt4Strm6u0VMcAK
Ys2Qa1zzzthuNe4x3aFinAfg05S4igIv+oQQSQna/hH8MPo39cLeai5KDN0F+KFU
jZi0JFUhdZcwlFOTfnIg0Ma2PPaVmLszDlhlUHlveGb/UhgATegDmB1Uq1Sj8yrl
c42207sO+r3Xwqyq4Tbe5oYGmDJQu8HREaZfa3kXijhOr6r0reE6ftiJs8WzSzN3
7HqoUpCCNfJfz697WlClPvYd0gHi4GxAxIQMQs43sQlo9UiLIF/m1gyaY/FvBqIx
zurUKStlR1MLiKsG0ATbbgORBvZFToDjf6AIGiplC4Ph6VZmiPgTCmFnelZYIqoL
1gPow2uNRGeT0m7XZHbi9g02Q5IPY77Ubpe5SGuXAjM9UukWHe9eS5vVzsvLtS2G
uCi0iCTisuZiFnNgt0NzsSYpRIIHgb4r1TiZKiwwXoMG2bw1pDof0LBe1Qrk2cuE
k+i6/oNj2XwRdfR8CGSZZzC/Hqh90AL1wHd+fkuXqqYekkqmwTjCe442Bt7mkZFT
4vkNrHixc+ALBEAMmRc=
=4ztu
-----END PGP SIGNATURE-----
Merge tag 'sunxi-fixes-for-4.8' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes
Allwinner fixes for 4.8
A single patch fixing a typo in the temperature trip points in the A13
DTSI.
* tag 'sunxi-fixes-for-4.8' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
ARM: sun5i: Fix typo in trip point temperature
Signed-off-by: Olof Johansson <olof@lixom.net>
- Fix misspelled "ti,x-plate-ohms" property name of touchscreen
controller for imx7d-sdb DTS.
- Add missing BM_CLPCR_BYPASS_PMIC_READY setting for i.MX6SX to get
suspend/resume work properly.
- Fix SPDIF regression on imx6qdl which caused by a clock update on
spdif device node.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXzN18AAoJEFBXWFqHsHzODGYH/Akf3taKILH/8awa78R8CdNA
hPmV1ga/t0QVTe6E/EYRyQv3D9qGEqMfluItcG+gLlhKPfqrE7iOmqdxwg6PtZSk
oqM+gPDP//DGBh3yjZRJ1jK+68i0Nf7weh59iLqEW6WkWWxBWTaPNUBYm7MXJa9f
AUYCWDNf0MUdoxIXy/sUJKZTHOozSPmJf9tp92qKsW4+EX28t65YqlGZeWyztJ5i
nmsnFHzfM3mY3qpA+RH1QerC8sAqqUCXMwfB6AO83hLUvcaFwLt3O6UgiOxhDJbZ
L9q4E5IBOvYK+zVn/GT+FBWMFE1q0WeF0GWp3oez2B6i7n21g6st9wmCDwDU3JE=
=zotz
-----END PGP SIGNATURE-----
Merge tag 'imx-fixes-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
i.MX fixes for 4.8, 2nd round:
- Fix misspelled "ti,x-plate-ohms" property name of touchscreen
controller for imx7d-sdb DTS.
- Add missing BM_CLPCR_BYPASS_PMIC_READY setting for i.MX6SX to get
suspend/resume work properly.
- Fix SPDIF regression on imx6qdl which caused by a clock update on
spdif device node.
* tag 'imx-fixes-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6qdl: Fix SPDIF regression
ARM: imx6: add missing BM_CLPCR_BYPASS_PMIC_READY setting for imx6sx
ARM: dts: imx7d-sdb: fix ti,x-plate-ohms property name
Signed-off-by: Olof Johansson <olof@lixom.net>
This reverts commit b5c86b7496.
This is no longer needed due to other changes going into 4.8 to rename
the unit addresses on a large number of device nodes. So it was picked up
for v4.8-rc1 in error.
Reported-by: Ralf Ramsauer <ralf@ramses-pyramidenbau.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
In commit c60ac5693c ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET. That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).
This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it. The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca. If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space. If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).
The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere. Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
Fixes: c60ac5693c ("powerpc: Update kernel VSID range")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination address
is odd and len is lower than cacheline size.
In that case the resulting csum value doesn't have to be rotated one
byte because the cache-aligned copy part is skipped so no alignment
is performed.
Fixes: 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In pnv_ioda_free_pe(), the PE object (including the associated PE
number) is cleared before resetting the corresponding bit in the
PE allocation bitmap. It means PE#0 is always released to the bitmap
wrongly.
This fixes above issue by caching the PE number before the PE object
is cleared.
Fixes: 1e9167726c ("powerpc/powernv: Use PE instead of number during setup and release"
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJX0D/IAAoJEIly9N/cbcAmORYQAIzhZ+tFuXkOrEJx0efQx8ik
8ix0wBrh0KTDQkMWQxUnDyIr+7vNcH0oOkH3kwgYaShAZif4oq6TGkdvgJJQr+ca
ZsBcEdasYvglcr/7hEF4cH28h3YqN7HwW+Mx8IqrBQqqjly7V2jamBBd/YXOcDK9
gqXooH1He9QVtVb0uC/AVFVY+st37iuqrreLFKWrX52FaUQ+7f0svN4LWC9b1UEq
UQyi+HMMRbovum9+2WDKPpl8oEZvXE7CiWexwl2G9qBCpm/AhragArl9BhrA31eE
0PPuVBTtypPNvKVieboXUhTg5eet+nFXtlea+/CLWfbqxpecsNiqMgSIg+aWOnz4
Y6Te5P7oSwhto0WzXvilxjvShvjHgTo98PY9Qj0bBb7ECCbI09GfoDf3SPYayTwC
9zPC04mJihr+6h8QdmYgfuHzTVjMxb6YOmAorz805cJ0vHtdPIP4Gkei4R1tNUkG
TNQNXKef/UJbImL17pFu70Ru6/J58OgDuMuAAuswsFt7KELw4DZhmJL+KnYdPG1G
gHirQd1HtQlsBoVvQAwmQjBp+TqFoY3oiv4Y2oq7IEk/ZCs3XJaEYKm90CoZlm4a
sMtA0j+9rWqNirzuyWUeJfkd48yjcbOzSimzSCTsNVVsbyeA1yyapAdHSTWWm554
zEubN4LCJoGiH/FWnX6U
=UgsA
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"Fix UM seccomp vs ptrace, after reordering landed"
* tag 'seccomp-v4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Remove 2-phase API documentation
um/ptrace: Fix the syscall number update after a ptrace
um/ptrace: Fix the syscall_trace_leave call
Update the syscall number after each PTRACE_SETREGS on ORIG_*AX.
This is needed to get the potentially altered syscall number in the
seccomp filters after RET_TRACE.
This fix four seccomp_bpf tests:
> [ RUN ] TRACE_syscall.skip_after_RET_TRACE
> seccomp_bpf.c:1560:TRACE_syscall.skip_after_RET_TRACE:Expected -1 (18446744073709551615) == syscall(39) (26)
> seccomp_bpf.c:1561:TRACE_syscall.skip_after_RET_TRACE:Expected 1 (1) == (*__errno_location ()) (22)
> [ FAIL ] TRACE_syscall.skip_after_RET_TRACE
> [ RUN ] TRACE_syscall.kill_after_RET_TRACE
> TRACE_syscall.kill_after_RET_TRACE: Test exited normally instead of by signal (code: 1)
> [ FAIL ] TRACE_syscall.kill_after_RET_TRACE
> [ RUN ] TRACE_syscall.skip_after_ptrace
> seccomp_bpf.c:1622:TRACE_syscall.skip_after_ptrace:Expected -1 (18446744073709551615) == syscall(39) (26)
> seccomp_bpf.c:1623:TRACE_syscall.skip_after_ptrace:Expected 1 (1) == (*__errno_location ()) (22)
> [ FAIL ] TRACE_syscall.skip_after_ptrace
> [ RUN ] TRACE_syscall.kill_after_ptrace
> TRACE_syscall.kill_after_ptrace: Test exited normally instead of by signal (code: 1)
> [ FAIL ] TRACE_syscall.kill_after_ptrace
Fixes: 26703c636c ("um/ptrace: run seccomp after ptrace")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: James Morris <jmorris@namei.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Keep the same semantic as before the commit 26703c636c1f: deallocate
audit context and fake a proper syscall exit.
This fix a kernel panic triggered by the seccomp_bpf test:
> [ RUN ] global.ERRNO_valid
> BUG: failure at kernel/auditsc.c:1504/__audit_syscall_entry()!
> Kernel panic - not syncing: BUG!
Fixes: 26703c636c ("um/ptrace: run seccomp after ptrace")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: James Morris <jmorris@namei.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.
Signed-off-by: Kees Cook <keescook@chromium.org>
As already done with __copy_*_user(), mark copy_*_user() as __always_inline.
Without this, the checks for things like __builtin_const_p() won't work
consistently in either hardened usercopy nor the recent adjustments for
detecting usercopy overflows at compile time.
The change in kernel text size is detectable, but very small:
text data bss dec hex filename
12118735 5768608 14229504 32116847 1ea106f vmlinux.before
12120207 5768608 14229504 32118319 1ea162f vmlinux.after
Signed-off-by: Kees Cook <keescook@chromium.org>
We're trying hard to detect when the HYP idmap overlaps with the
HYP va, as it makes the teardown of a cpu dangerous. But there is
one case where an overlap is completely safe, which is when the
whole of the kernel is idmap'ed, which is likely to happen on 32bit
when RAM is at 0x8000000 and we're using a 2G/2G VA split.
In that case, we can proceed safely.
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Yanqiu Zhang reported kernel panic when using mbm event
on system where CQM is detected but without mbm event
support, like with perf:
# perf stat -e 'intel_cqm/event=3/' -a
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff8100d64c>] update_sample+0xbc/0xe0
...
<IRQ>
[<ffffffff8100d688>] __intel_mbm_event_init+0x18/0x20
[<ffffffff81113d6b>] flush_smp_call_function_queue+0x7b/0x160
[<ffffffff81114853>] generic_smp_call_function_single_interrupt+0x13/0x60
[<ffffffff81052017>] smp_call_function_interrupt+0x27/0x40
[<ffffffff816fb06c>] call_function_interrupt+0x8c/0xa0
...
The reason is that we currently allow to init mbm event
even if mbm support is not detected. Adding checks for
both cqm and mbm events and support into cqm's event_init.
Fixes: 33c3cc7acf ("perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init")
Reported-by: Yanqiu Zhang <yanqzhan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1473089407-21857-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The compound PE is created to accommodate the devices attached to
one specific PCI bus that consume multiple M64 segments. The compound
PE is made up of one master PE and possibly multiple slave PEs. The
slave PEs should be destroyed when releasing the master PE. A kernel
crash happens when derferencing @pe->pdev on releasing the slave PE
in pnv_ioda_deconfigure_pe().
# echo 0 > /sys/bus/pci/slots/C7/power
iommu: Removing device 0000:01:00.1 from group 0
iommu: Removing device 0000:01:00.0 from group 0
Unable to handle kernel paging request for data at address 0x00000010
Faulting instruction address: 0xc00000000005d898
cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
sp: c000000fe82178a0
msr: 9000000000009033
dar: 10
dsisr: 40000000
current = 0xc000000fe815ab80
paca = 0xc00000000ff00400 softe: 0 irq_happened: 0x01
pid = 2709, comm = sh
Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
(gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
Tue Sep 6 13:37:29 AEST 2016
enter ? for help
[c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
[c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
[c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
[c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
[c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
[c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
[c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
[c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
[c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
[c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
[c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
[c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
[c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
[c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
[c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
[c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
[c000000fe8217e30] c000000000009524 system_call+0x38/0x108
It fixes the kernel crash by bypassing releasing resources (DMA,
IO and memory segments, PELTM) because there are no resources assigned
to the slave PE.
Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When using the OPAL ICP backend we incorrectly pass Linux CPU numbers
rather than HW CPU numbers to OPAL.
Fixes: d74361881f ("powerpc/xics: Add ICP OPAL backend")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On ppc64le, builds with CONFIG_KEXEC=n fail with:
arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’:
arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’
if (rc && !kdump_in_progress())
This is because pseries/setup.c includes <linux/kexec.h>, but
kdump_in_progress() is defined in <asm/kexec.h>. This is a problem
because the former only includes the latter if CONFIG_KEXEC_CORE=y.
Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c.
Fixes: d3cbff1b5a ("powerpc: Put exception configuration in a common place")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
TSC_OFFSET will be adjusted if discovers TSC backward during vCPU load.
The preemption timer, which relies on the guest tsc to reprogram its
preemption timer value, is also reprogrammed if vCPU is scheded in to
a different pCPU. However, the current implementation reprogram preemption
timer before TSC_OFFSET is adjusted to the right value, resulting in the
preemption timer firing prematurely.
This patch fix it by adjusting TSC_OFFSET before reprogramming preemption
timer if TSC backward.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krċmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We store the address of riccbd at the wrong location, overwriting
gvrd. This means that our nested guest will not be able to use runtime
instrumentation. Also, a memory leak, if our KVM guest actually sets gvrd.
Not noticed until now, as KVM guests never make use of gvrd and runtime
instrumentation wasn't completely tested yet.
Reported-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The eboot code directly calls ExitBootServices. This is inadvisable as the
UEFI spec details a complex set of errors, race conditions, and API
interactions that the caller of ExitBootServices must get correct. The
eboot code attempts allocations after calling ExitBootSerives which is
not permitted per the spec. Call the efi_exit_boot_services() helper
intead, which handles the allocation scenario properly.
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
efi_get_memory_map() allocates a buffer to store the memory map that it
retrieves. This buffer may need to be reused by the client after
ExitBootServices() is called, at which point allocations are not longer
permitted. To support this usecase, provide the allocated buffer size back
to the client, and allocate some additional headroom to account for any
reasonable growth in the map that is likely to happen between the call to
efi_get_memory_map() and the client reusing the buffer.
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
We do not need to add the randomization offset when the microcode is
built in.
Reported-and-tested-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160904093736.GA11939@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 833f2cbf70 ("ARM: dts: imx6: change the core clock of spdif")
changed many more clocks than only the SPDIF core clock as stated in
the commit message.
The MLB clock has been added and this causes SPDIF regression as
reported by Xavi Drudis Ferran and also in this forum post:
https://forum.digikey.com/thread/34240
The MX6Q Reference Manual does not mention that MLB is a clock related
to SPDIF, so change it back to a dummy clock to restore SPDIF
functionality.
Thanks to Ambika for providing the fix at:
https://community.nxp.com/thread/387131
Fixes: 833f2cbf70 ("ARM: dts: imx6: change the core clock of spdif")
Cc: <stable@vger.kernel.org> # 4.4.x
Reported-by: Xavi Drudis Ferran <xdrudis@tinet.cat>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Tested-by: Xavi Drudis Ferran <xdrudis@tinet.cat>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Pull x86 fix from Thomas Gleixner:
"A single fix for an AMD erratum so machines without a BIOS fix work"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/AMD: Apply erratum 665 on machines without a BIOS fix