KPROBES_ON_FTRACE avoids much of the overhead of regular kprobes as it
eliminates the need for a trap, as well as the need to emulate or single-step
instructions.
Though OPTPROBES provides us with similar performance, we have limited
optprobes trampoline slots. As such, when asked to probe at a function
entry, default to using the ftrace infrastructure.
With:
# cd /sys/kernel/debug/tracing
# echo 'p _do_fork' > kprobe_events
before patch:
# cat ../kprobes/list
c0000000000daf08 k _do_fork+0x8 [DISABLED]
c000000000044fc0 k kretprobe_trampoline+0x0 [OPTIMIZED]
and after patch:
# cat ../kprobes/list
c0000000000d074c k _do_fork+0xc [DISABLED][FTRACE]
c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED]
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
kprobe_lookup_name() is specific to the kprobe subsystem and may not always
return the function entry point (in a subsequent patch for KPROBES_ON_FTRACE).
For looking up function entry points, introduce a separate helper and use it
in optprobes.c
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Allow kprobes to be placed on ftrace _mcount() call sites. This optimization
avoids the use of a trap, by riding on ftrace infrastructure.
This depends on HAVE_DYNAMIC_FTRACE_WITH_REGS which depends on MPROFILE_KERNEL,
which is only currently enabled on powerpc64le with newer toolchains.
Based on the x86 code by Masami.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pass the real LR to the ftrace handler. This is needed for KPROBES_ON_FTRACE for
the pre handlers.
Also, with KPROBES_ON_FTRACE, the link register may be updated by the pre
handlers or by a registed kretprobe. Honor updated LR by restoring it from
pt_regs, rather than from the stack save area.
Live patch and function graph continue to work fine with this change.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts commit 42ae2922a6. It
causes a regression with older versions of gcc. The consensus is
that this should instead be fixed in clang.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull RAS fix from Thomas Gleixner:
"The MCE atomic notifier callchain invokes callbacks which might sleep.
Convert it to a blocking notifier and prevent calls from atomic
context"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Make the MCE notifier a blocking one
Blacklist all the exception common/OOL handlers as the kernel stack is not yet
setup, which means we can't take a trap at this point.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Introduce __head_end to mark end of the early fixed sections and use it to
blacklist all exception handlers from kprobes.
mpe: We do not need to do anything special for relocatable kernels, where the
exception vectors are split from the main kernel, as the split vectors are
already excluded by the check for kernel_text_address().
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Move __head_end outside #ifdef 64-bit to unbreak the 32-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Along similar lines as commit 9326638cbe ("kprobes, x86: Use NOKPROBE_SYMBOL()
instead of __kprobes annotation"), convert __kprobes annotation to either
NOKPROBE_SYMBOL() or nokprobe_inline. The latter forces inlining, in which case
the caller needs to be added to NOKPROBE_SYMBOL().
Also:
- blacklist arch_deref_entry_point(), and
- convert a few regular inlines to nokprobe_inline in lib/sstep.c
A key benefit is the ability to detect such symbols as being
blacklisted. Before this patch:
$ cat /sys/kernel/debug/kprobes/blacklist | grep read_mem
$ perf probe read_mem
Failed to write event: Invalid argument
Error: Failed to add events.
$ dmesg | tail -1
[ 3736.112815] Could not insert probe at _text+10014968: -22
After patch:
$ cat /sys/kernel/debug/kprobes/blacklist | grep read_mem
0xc000000000072b50-0xc000000000072d20 read_mem
$ perf probe read_mem
read_mem is blacklisted function, skip it.
Added new events:
(null):(null) (on read_mem)
probe:read_mem (on read_mem)
You can now use it in all perf tools, such as:
perf record -e probe:read_mem -aR sleep 1
$ grep " read_mem" /proc/kallsyms
c000000000072b50 t read_mem
c0000000005f3b40 t read_mem
$ cat /sys/kernel/debug/kprobes/list
c0000000005f3b48 k read_mem+0x8 [DISABLED]
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Minor change log formatting, fix up some conflicts]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the stack setup and teardown code into ftrace_graph_caller(). This way, we
don't incur the cost of setting it up unless function graph is enabled for this
function.
Also, remove the extraneous LR restore code after the function graph stub. LR
has previously been restored and neither livepatch_handler() nor
ftrace_graph_caller() return back here.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Drop bad change to non-mprofile-kernel version of ftrace_graph_caller]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The idle workaround does not need to load PACATOC, and it does not
need to be called within a nested function that requires LR to be
saved.
Load the PACATOC at entry to the idle wakeup. It does not matter which
PACA this comes from, so it's okay to call before the workaround. Then
apply the workaround to get the right PACA.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If not all threads were in winkle, full state loss recovery is not
necessary and can be avoided. A previous patch removed this optimisation
due to some complexity with the implementation. Re-implement it by
counting the number of threads in winkle with the per-core idle state.
Only restore full state loss if all threads were in winkle.
This has a small window of false positives right before threads execute
winkle and just after they wake up, when the winkle count does not
reflect the true number of threads in winkle. This is not a significant
problem in comparison with even the minimum winkle duration. For
correctness, a false positive is not a problem (only false negatives
would be).
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When taking the core idle state lock, grab it immediately like a regular
lock, rather than adding more tests in there. Holding the lock keeps it
stable, so there is no need to do it whole holding the reservation.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation for adding more bits to the core idle state word, move
the lock bit up, and unlock by flipping the lock bit rather than masking
off all but the thread bits.
Add branch hints for atomic operations while we're here.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The ISA specifies power save wakeup due to a machine check exception can
cause a machine check interrupt (rather than the usual system reset
interrupt).
The machine check handler copes with this by doing low level machine
check recovery without restoring full state from idle, then queues up a
machine check event for logging, then directly executes the same idle
instruction it woke from. This minimises the work done before recovery
is performed.
The problem is that it requires machine specific instructions and
knowledge of the book3s idle code. Currently it only has code to handle
POWER8 idle, so POWER9 crashes when trying to execute the P8 idle
instructions which don't exist in ISAv3.0B.
cpu 0x0: Vector: e40 (Emulation Assist) at [c0000000008f3810]
pc: c000000000008380: machine_check_handle_early+0x130/0x2f0
lr: c00000000053a098: stop_loop+0x68/0xd0
sp: c0000000008f3a90
msr: 9000000000081001
current = 0xc0000000008a1080
paca = 0xc00000000ffd0000 softe: 0 irq_happened: 0x01
pid = 0, comm = swapper/0
Instead of going to sleep after recovery, do the usual idle wakeup and
state restoration by calling into the normal idle wakeup path. This
reuses the normal idle wakeup paths.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reduces the number of nops for POWER8.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The POWER8 idle code has a neat trick of programming the power on engine
to restore a low bit into HSPRG0, so idle wakeup code can test and see
if it has been programmed this way and therefore lost all state. Restore
time can be reduced if winkle has not been reached.
However this messes with our r13 PACA pointer, and requires HSPRG0 to be
written to. It also optimizes the slowest and most uncommon case at the
expense of another SPR write in the common nap state wakeup.
Remove this complexity and assume winkle sleeps always require a state
restore. This speedup could be made entirely contained within the winkle
idle code by counting per-core winkles and setting a thread bitmap when
all have gone to winkle.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
No functional change.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts commit 2947ba054a.
Dan Williams reported dax-pmem kernel warnings with the following signature:
WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
... and bisected it to this commit, which suggests possible memory corruption
caused by the x86 fast-GUP conversion.
He also pointed out:
"
This is similar to the backtrace when we were not properly handling
pud faults and was fixed with this commit: 220ced1676 "mm: fix
get_user_pages() vs device-dax pud mappings"
I've found some missing _devmap checks in the generic
get_user_pages_fast() path, but this does not fix the regression
[...]
"
So given that there are known bugs, and a pretty robust looking bisection
points to this commit suggesting that are unknown bugs in the conversion
as well, revert it for the time being - we'll re-try in v4.13.
Reported-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dann.frazier@canonical.com
Cc: dave.hansen@intel.com
Cc: steve.capper@linaro.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The system reset idle handler system_reset_idle_common is relocated, so
relocation is not required to branch to kvm_start_guest. The superfluous
relocation does not result in incorrect code, but it does not compile
outside of exception-64s.S (with fixed section definitions).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is an eBPF JIT for sparc64. All major features are supported.
All tests under tools/testing/selftests/bpf/ pass.
Signed-off-by: David S. Miller <davem@davemloft.net>
Both conflict were simple overlapping changes.
In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
Just two fixes. The first fixes kprobing a stdu, and is marked for stable as
it's been broken for ~ever. In hindsight this could have gone in next.
The other is a fix for a change we merged this cycle, where if we take a certain
exception when the kernel is running relocated (currently only used for kdump),
we checkstop the box.
Thanks to:
Ravi Bangoria.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY+YyZAAoJEFHr6jzI4aWA6iYP/iwHOwFrAVMKl/zN8f5Vq/Ci
pNPhOJoWFNkKfNkjzqGZXleoB76jc342d3uDDCuGI65YZVFIrlCFc1k93hCrlZYS
jU0dJX+RLFEqcqXbwGhsBLEUjT17R8kxzgQ1J8gHzsFUjvbo7c2u49e7WvGrmqB9
+ksoqy81XfN9nkW4xDw+ME6bUcodW5rkxYNIPuZ0BUdnarPC/sdVvLPVzKuPwcRj
56wlry8kIwTdhqUSA9pYDaq1BY80AEp8d2VFEVsibhiNyJjyDFVHX6t4k/9an7oD
IXiqBVuMX7RnYnAI86aaaoqkZ8EOVeNX0A4U2XtQjGuu+avnwAlYaJ+cFvhbzWXX
zfjX8XanuFc7+Yok4G5W6Rqlye6DB6Ep4Asj3S1Nihv/UHKToVfvtAd46pXmUf2e
3Y+ut69AhT4aJZV4QGpJdUuh98xVR5dnmiAV/Yx+vKkcf3Bz2FzJ3OA/PGkevE0C
M6hY8kjMI9cKFgN6WO5ziNFwAj4t2JHf78F7A5Fkp3I3H0FbDKqhU31Gp/Gnrv3L
Giavyms78Z2+XVg+uxvXUIakKfnrWLao8HxwwCsfKKh9uhPoltM+I+5FI9mG3FSq
XVyA81XqcSkH3Gq3Y5aYZI3cq7YOk3auiWKazQ9H4Fbpi342LiT0v9eIxaP4XxbC
cY/QV6StaJ8cvQA/p2oL
=yUj5
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Just two fixes.
The first fixes kprobing a stdu, and is marked for stable as it's been
broken for ~ever. In hindsight this could have gone in next.
The other is a fix for a change we merged this cycle, where if we take
a certain exception when the kernel is running relocated (currently
only used for kdump), we checkstop the box.
Thanks to Ravi Bangoria"
* tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
Add powerpc support for mmap_rnd_bits and mmap_rnd_compat_bits, which are two
sysctls that allow a user to configure the number of bits of randomness used for
ASLR.
Because of the way the Kconfig for ARCH_MMAP_RND_BITS is defined, we have to
construct at least the MIN value in Kconfig, vs in a header which would be more
natural. Given that we just go ahead and do it all in Kconfig.
At least according to the code (the documentation makes no mention of it), the
value is defined as the number of bits of randomisation *of the page*, not the
address. This makes some sense, with larger page sizes more of the low bits are
forced to zero, which would reduce the randomisation if we didn't take the
PAGE_SIZE into account. However it does mean the min/max values have to change
depending on the PAGE_SIZE in order to actually limit the amount of address
space consumed by the randomisation.
The result of that is that we have to define the default values based on both
32-bit vs 64-bit, but also the configured PAGE_SIZE. Furthermore now that we
have 128TB address space support on Book3S, we also have to take that into
account.
Finally we can wire up the value in arch_mmap_rnd().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
In crct10dif_vpmsum() we call enable_kernel_altivec() without first
disabling preemption, which is not allowed.
It used to be sufficient just to call pagefault_disable(), because that
also disabled preemption. But the two were decoupled in commit 8222dbe21e
("sched/preempt, mm/fault: Decouple preemption from the page fault
logic") in mid 2015.
The crct10dif-vpmsum code inherited this bug from the crc32c-vpmsum code
on which it was modelled.
So add the missing preempt_disable/enable(). We should also call
disable_kernel_fp(), although it does nothing by default, there is a
debug switch to make it active and all enables should be paired with
disables.
Fixes: b01df1c16c ("crypto: powerpc - Add CRC-T10DIF acceleration")
Acked-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Few parts of kernel define their own macro for aligning down so provide
a common define for this, with the same usage and assumptions as existing
ALIGN.
Convert also three existing implementations to this one.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The default implementation of ioremap_cache() is aliased to ioremap().
On powerpc ioremap() creates cache-inhibited mappings by default which
is almost certainly not what you wanted.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The guarded storage interface allows to register a control block for
each thread that is activated with the guarded storage broadcast event.
To retrieve the complete state of a process from the kernel a register
set for the stored broadcast control block is required.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fengguang Wu's zero day bot triggered a stack unwinder dump. This can
be easily triggered when CONFIG_FRAME_POINTERS is enabled and -mfentry
is in use on x86_32.
># cd /sys/kernel/debug/tracing
># echo 'p:schedule schedule' > kprobe_events
># echo stacktrace > events/kprobes/schedule/trigger
This is because the code that implemented fentry in the ftrace_regs_caller
tried to use the least amount of #ifdefs, and modified ebp when
CC_USE_FENTRY was defined to point to the parent ip as it does when
CC_USE_FENTRY is not defined. But when CONFIG_FRAME_POINTERS is set, it
corrupts the ebp register for this frame while doing the tracing.
NOTE, it does not corrupt ebp in any other way. It is just a bad frame
pointer when calling into the tracing infrastructure. The original ebp is
restored before returning from the fentry call. But if a stack trace is
performed inside the tracing, the unwinder will notice the bad ebp.
Instead of toying with ebp with CC_USING_FENTRY, just slap the parent ip
into the second parameter (%edx), and have an #else that does it the
original way.
The unwinder will unfortunately miss the function being traced, as the
stack frame is not set up yet for it, as it is for x86_64. But fixing that
is a bit more complex and did not work before anyway.
This has been tested with and without FRAME_POINTERS being set while using
-mfentry, as well as using an older compiler that uses mcount.
Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Fixes: 644e0e8dc7 ("x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lists.01.org/pipermail/lkp/2017-April/006165.html
Link: http://lkml.kernel.org/r/20170420172236.7af7f6e5@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Accumulator is present in configs with FPU and/or DSP MPY (mpy > 6)
Instead of doing this in pt_regs (and thus every kernel entry/exit),
this could have been done in context switch (and for user task only) as
currently kernel doesn't clobber these registers for its own accord.
However we will soon start using 64-bit multiply instructions for kernel
which can clobber these. Also gcc folks also plan to start using these
as GPRs, hence better to always save/restore them
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Pull s390 fix from Martin Schwidefsky:
"There is one more fix I would like to see in 4.11: The combination of
KVM, CMMA and heavy paging can cause data corruption, the fix is to
clear the _PAGE_UNUSED bit in set_pte_at()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: fix CMMA vs KSM vs others
A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
When schemata parses the resource names it does not return an error if it
detects incorrect resource names and fails quietly.
This happens because for_each_enabled_rdt_resource(r) leaves "r" pointing
beyond the end of the rdt_resources_all[] array, and the check for !r->name
results in an out of bounds access.
Split the resource parsing part into a helper function to avoid the issue.
[ tglx: Made it readable by splitting the parser loop out into a function ]
Reported-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1492645804-17465-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Before offlining a CPU its required to check whether there are enough free
irq vectors available so interrupts can be migrated away from the CPU.
This check is executed whether its required or not and neither stops
searching when the number of required free vectors are reached.
Bypass the free vector check if the current CPU has no irq to migrate and
leave the for_each_online_cpu() loop when the free vector count reaches the
number of required vectors.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Len Brown <lenb@kernel.orq>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1492357410-510-1-git-send-email-yu.c.chen@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On kprobe handler re-entry, try to emulate the instruction rather than single
stepping always.
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Factor out code to emulate instruction into a try_to_emulate()
helper function. This makes no functional changes.
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With ABIv2, we offset 8 bytes into a function to get at the local entry
point.
mpe: NB this function is currently not called, the change to generic code to
call it is being merged via the tip tree.
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
commit 239aeba764 ("perf powerpc: Fix kprobe and kretprobe handling with
kallsyms on ppc64le") changed how we use the offset field in struct kprobe on
ABIv2. perf now offsets from the global entry point if an offset is specified
and otherwise chooses the local entry point.
Fix the same in kernel for kprobe API users. We do this by extending
kprobe_lookup_name() to accept an additional parameter to indicate the offset
specified with the kprobe registration. If offset is 0, we return the local
function entry and return the global entry point otherwise.
With:
# cd /sys/kernel/debug/tracing/
# echo "p _do_fork" >> kprobe_events
# echo "p _do_fork+0x10" >> kprobe_events
before this patch:
# cat ../kprobes/list
c0000000000d0748 k _do_fork+0x8 [DISABLED]
c0000000000d0758 k _do_fork+0x18 [DISABLED]
c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED]
and after:
# cat ../kprobes/list
c0000000000d04c8 k _do_fork+0x8 [DISABLED]
c0000000000d04d0 k _do_fork+0x10 [DISABLED]
c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED]
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The macro is now pretty long and ugly on powerpc. In the light of further
changes needed here, convert it to a __weak variant to be over-ridden with a
nicer looking function.
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Malta platform is the only platform remaining to probe the GIC
clocksource via gic_clocksource_init. This route hardcodes an expected
virq number based on MIPS_GIC_IRQ_BASE, which can be fragile to the
eventual virq layout. Instread, probe the driver using the preferred and
more modern devicetree method.
Before the driver is probed, set the "clock-frequency" property of the
devicetree node to the value detected by Malta platform code.
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/1492604806-23420-1-git-send-email-matt.redfearn@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add use_cmma field to mm_context_t, like we do for storage keys.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Janosch Frank <frankja@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add PGSTE manipulation functions:
* set_pgste_bits sets specific bits in a PGSTE
* get_pgste returns the whole PGSTE
* pgste_perform_essa manipulates a PGSTE to set specific storage states
* ESSA_[SG]ET_* macros used to indicate the action for manipulate_pgste
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Janosch Frank <frankja@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Modify the reschedule warning to output the offline CPU number and
use a better debug message.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1492518305-3808-1-git-send-email-prarit@redhat.com
[ Tweaked the warning message. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A CPU in VMX root mode will ignore INIT signals and will fail to bring
up the APs after reboot. Therefore, on a panic we disable VMX on all
CPUs before rebooting or triggering kdump.
Do this when halting the machine as well, in case a firmware-level reboot
does not perform a cold reset for all processors. Without doing this,
rebooting the host may hang.
Signed-off-by: Tiantian Feng <fengtiantian@huawei.com>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
[ Rewritten commit message. ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20170419161839.30550-1-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The minimum size for a new stack (512 bytes) setup for arch/x86/boot components
when the bootloader does not setup/provide a stack for the early boot components
is not "enough".
The setup code executing as part of early kernel startup code, uses the stack
beyond 512 bytes and accidentally overwrites and corrupts part of the BSS
section. This is exposed mostly in the early video setup code, where
it was corrupting BSS variables like force_x, force_y, which in-turn affected
kernel parameters such as screen_info (screen_info.orig_video_cols) and
later caused an exception/panic in console_init().
Most recent boot loaders setup the stack for early boot components, so this
stack overwriting into BSS section issue has not been exposed.
Signed-off-by: Ashish Kalra <ashish@bluestacks.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170419152015.10011-1-ashishkalra@Ashishs-MacBook-Pro.local
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The alloc/free metaphor used for IOP messages is misleading and can
cause mistakes like the pointless msg2 temporary variable. Use a more
meaningful name to help simplify the code.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
These changes save 1014 bytes according to scripts/bloat-o-meter.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Hypervisor Virtualization and Directed Hypervisor Doorbell interrupt handlers
use the macro EXC_VIRT_OOL_MASKABLE_HV for their relocation-on handlers, which
calls MASKABLE_RELON_EXCEPTION_HV_OOL, which uses the *real mode* interrupt
prolog. This means we needlessly rfid from virtual mode to virtual mode.
For POWER8 it only affects doorbell IPIs. Context switch microbenchmark between
threads with snooze disabled (which causes IPI) gets about 3% faster, about 370
cycles. Should be more important on POWER9 with global doorbells and HVI for
host interrupts.
Use the RELON variant instead to reduce overhead.
Fixes: 1707dd1613 ("powerpc: Save CFAR before branching in interrupt entry paths")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold some more detail into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- arch_timer cleanups and refactoring
- new common GTDT parser
- GTDT-based MMIO arch_timer support
- GTDT-based SBSA watchdog support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY94xuAAoJEN97mGXxSWJIydYQAIoanIR8RmDLjtIfV4RMkjIj
BKr3IADjihtJY3B418OAoK8WzkVGwWILNw6rCMr7HvvDqBEEjcyukeO8vT4CMMBp
pkh/2tZFpNLLOBJidDPKlw/m1bUlCsIyC2TwuiqEyvlUQtkSmqVCYJCSgmuVHF0e
bMydjxslfI1v6HvhkVTN+y95ocakv3YfEpWrwRTEIqcwIzvQPo6bfjzrGFluoDRO
8lJT3Z4Q4ibvr8C3Y/UL/nOX2xBaGFR51G2/ILsLxzt7alStqnzXfL6h1dl5ySY2
rTXgbg5iH8k/8lQjYLLh1/UV3w0vdCg5HXFnbxPcfIA9BjaQvH4+CJowk69anHIC
4R7kiHsc6SOEyiQFWdjVGqy8vajokgskKbqEhktGiZUA4IIUd7jvFzK1l9r6IJnc
X/zM49fxR7cYwjJcNtEqzCtZmUt5h8JMWGqAm/Ei38+HWroVlnc4/o+hTAU4vgj9
+054NjdJp+0idm6PjIhCdufqH6AzCWRpyt8I9Qo/P+VaJyytiwn/6QBLNs79KA+q
QrLCYsPUWiTX8yR/y6v4obZSn0j0DnZM9tZOzr/cZH0D9Z0pXSoybspbS7vUMczd
4JhlSYKEycUqcu8J+jvh8HKANFZcMdyegO3bUKQ1ZlhpoyLml4H10dbOSHcIcU3c
eN6zY6RmvxvqwFJf9P2v
=KBll
-----END PGP SIGNATURE-----
Merge tag 'arch-timer-gtdt' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux into timers/core
Pull arch timer GTDT support from Mark Rutland
- arch_timer cleanups and refactoring
- new common GTDT parser
- GTDT-based MMIO arch_timer support
- GTDT-based SBSA watchdog support
Fix up a trivial pr_err() conflict.
Dan Carpenter noticed that the code in __xive_native_disable_queue() has a for
loop with an unconditional break in the middle, which doesn't make a lot of
sense.
What the code's supposed to do is loop as long as OPAL says it's busy, if we get
any other return code, either success or failure, then we should break the loop.
So add the missing check.
Fixes: 243e25112d ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Setup a dax_device to have the same lifetime as the axon_ram block
device and add a ->direct_access() method that is equivalent to
axon_ram_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old axon_ram_direct_access() will be removed.
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds support for parsing arch timer info in GTDT,
provides some kernel APIs to parse all the PPIs and
always-on info in GTDT and export them.
By this driver, we can simplify arm_arch_timer drivers, and
separate the ACPI GTDT knowledge from it.
Signed-off-by: Fu Wei <fu.wei@linaro.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Power9 DD1 does not implement SAO. Although it's not widely used, its presence
or absence is visible to user space via arch_validate_prot() so it's moderately
important that we get the value right.
Fixes: 7dccfbc325 ("powerpc/book3s: Add a cpu table entry for different POWER9 revs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Power9 does not implement the icswx instruction. This CPU feature is not visible
to userspace and is only used in the CONFIG_PPC_ICSWX code, which is generally
not enabled, and can only be triggered by other code using icswx, which should
not happen on Power9 systems in the first place. So impact should be minimal.
Fixes: c3ab300ea5 ("powerpc: Add POWER9 cputable entry")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
mce_usable_address() does a bunch of basic sanity checks to verify
whether the address reported with the error is usable for further
processing. However, we do check MCi_STATUS[MISCV] and that is not
needed on AMD as that bit says that there's additional information about
the logged error in the MCi_MISCj banks.
But we don't need that to know whether the address is usable - we only
need to know whether the physical address is valid - i.e., ADDRV.
On Intel the MISCV bit is needed to perform additional checks to determine
whether the reported address is a physical one, etc.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170418183924.6agjkebilwqj26or@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Patch add "mem_access" event to sysfs. This as-is not a raw event
supported by Power8 pmu. Instead, it is formed based on
raw event encoding specificed in isa207-common.h.
Primary PMU event used here is PM_MRK_INST_CMPL.
This event tracks only the completed marked instructions.
Random sampling mode (MMCRA[SM]) with Random Instruction
Sampling (RIS) is enabled to mark type of instructions.
With Random sampling in RLS mode with PM_MRK_INST_CMPL event,
the LDST /DATA_SRC fields in SIER identifies the memory
hierarchy level (eg: L1, L2 etc) statisfied a data-cache
miss for a marked instruction.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Threshold feature when used with MMCRA [Threshold Event Counter Event],
MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update
MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event
Counter Multiplier] with the corresponding threshold event count values.
Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of
struct perf_sample_data.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The LDST field and DATA_SRC in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied. Use the 'perf_mem_data_src' object to export this
hierarchy level to user space.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The CMA pages migration code does not support compound pages at
the moment so it performs few tests before proceeding to actual page
migration.
One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as
it is designed to be called on head pages only. Since we also test for
PageCompound(), and it contains PageTail() and PageHead(), we can
simplify the check by leaving just PageCompound() and therefore avoid
possible VM_BUG_ON_PAGE.
Fixes: 2e5bbb5461 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As part of the new large address space support, processes start out life with a
128TB virtual address space. However when calling mmap() a process can pass a
hint address, and if that hint is > 128TB the kernel will use the full 512TB
address space to try and satisfy the mmap() request.
Currently we have a check that the hint is > 128TB and < 512TB (TASK_SIZE),
which was added as an optimisation to avoid updating addr_limit unnecessarily
and also to avoid calling slice_flush_segments() on all CPUs more than
necessary.
However this has the user-visible side effect that an mmap() hint above 512TB
does not search the full address space unless a preceding mmap() used a hint
value > 128TB && < 512TB.
So fix it to treat any hint above 128TB as a hint to search the full address
space, instead of checking the hint against TASK_SIZE, we instead check if the
addr_limit is already == TASK_SIZE.
This also brings the ABI in-line with what is proposed on x86. ie, that a hint
address above 128TB up to and including (2^64)-1 is an indication to search the
full address space.
Fixes: f4ea6dcb08 (powerpc/mm: Enable mappings above 128TB)
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The TLB flush for radix first flushes TLB for radix configuration,
then flushes for hash configuration. The second flush is unnecessary
but does not affect correctness.
Fixes: 1a472c9dba ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We don't init addr_limit correctly for 32 bit applications. So default to using
mm->task_size for boundary condition checking. We use addr_limit to only control
free space search. This makes sure that we do the right thing with 32 bit
applications.
We should consolidate the usage of TASK_SIZE/mm->task_size and
mm->context.addr_limit later.
This partially reverts commit fbfef9027c (powerpc/mm: Switch some
TASK_SIZE checks to use mm_context addr_limit).
Fixes: fbfef9027c ("powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The XIVE enablement patches included a change to set the LPES (Logical
Partitioning Environment Selector) bit (bit # 3) in LPCR (Logical Partitioning
Control Register) on POWER9 hosts. This bit sets external interrupts to guest
delivery mode, which uses SRR0/1. The host's EE interrupt handler is written to
expect HSRR0/1 (for earlier CPUs). This should be fine because XIVE is
configured not to deliver EEs to the host (Hypervisor Virtulization Interrupt is
used instead) so the EE handler should never be executed.
However a bug in interrupt controller code, hardware, or odd configuration of a
simulator could result in the host getting an EE incorrectly. Keeping the EE
delivery mode matching the host EE handler prevents strange crashes due to using
the wrong exception registers.
KVM will configure the LPCR to set LPES prior to running a guest so that EEs are
delivered to the guest using SRR0/1.
Fixes: 08a1e650cc ("powerpc: Fixup LPCR:PECE and HEIC setting on POWER9")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Massage change log to avoid referring to LPES0 which is now renamed LPES]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In unwind_dump(), the stack mask value is printed in hex, but is
confusingly not prepended with '0x'.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e7fe41be19d73c9f99f53082486473febfe08ffa.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On x86-32, 32-bit stack values printed by unwind_dump() are confusingly
zero-padded to 16 characters (64 bits):
unwind stack type:0 next_sp: (null) mask:a graph_idx:0
f50cdebc: 00000000f50cdec4 (0xf50cdec4)
f50cdec0: 00000000c40489b7 (irq_exit+0x87/0xa0)
...
Instead, base the field width on the size of a long integer so that it
looks right on both x86-32 and x86-64.
x86-32:
unwind stack type:1 next_sp: (null) mask:0x2 graph_idx:0
c0ee9d98: c0ee9de0 (init_thread_union+0x1de0/0x2000)
c0ee9d9c: c043fd90 (__save_stack_trace+0x50/0xe0)
...
x86-64:
unwind stack type:1 next_sp: (null) mask:0x2 graph_idx:0
ffffffff81e03b88: ffffffff81e03c10 (init_thread_union+0x3c10/0x4000)
ffffffff81e03b90: ffffffff81048f8e (__save_stack_trace+0x5e/0x100)
...
Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/36b743812e7eb291d74af4e5067736736622daad.1492520933.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For pre-4.6.0 versions of GCC, which don't have '-mfentry', the
'-maccumulate-outgoing-args' option is required for function graph
tracing in order to avoid GCC bug 42109.
However, GCC ignores '-maccumulate-outgoing-args' when '-Os' is
also set.
Currently we force a build error to prevent that scenario, but that
breaks randconfigs. So change the error to a warning which also
disables CONFIG_CC_OPTIMIZE_FOR_SIZE.
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Link: http://lkml.kernel.org/r/20170418214429.o7fbwbmf4nqosezy@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJY881cAAoJEHm+PkMAQRiGG4UH+wa2z6Qet36Uc4nXFZuSMYrO
ErUWs1QpTDDv4a+LE4fgyMvM3j9XqtpfQLy1n70jfD14IqPBhHe4gytasAf+8lg1
YvddFx0Yl3sygVu3dDBNigWeVDbfwepW59coN0vI5nrMo+wrei8aVIWcFKOxdMuO
n72u9vuhrkEnLJuQk7SF+t4OQob9McXE3s7QgyRopmlKhKo7mh8On7K2BRI5uluL
t0j5kZM0a43EUT5rq9xR8f5pgtyfTMG/FO2MuzZn43MJcZcyfmnOP/cTSIvAKA5U
1i12lxlokYhURNUe+S6jm8A47TrqSRSJxaQJZRlfGJksZ0LJa8eUaLDCviBQEoE=
=6QWZ
-----END PGP SIGNATURE-----
Merge tag 'v4.11-rc7' into drm-next
Backmerge Linux 4.11-rc7 from Linus tree, to fix some
conflicts that were causing problems with the rerere cache
in drm-tip.
The NFIT MCE handler callback (for handling media errors on NVDIMMs)
takes a mutex to add the location of a memory error to a list. But since
the notifier call chain for machine checks (x86_mce_decoder_chain) is
atomic, we get a lockdep splat like:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
[..]
Call Trace:
dump_stack
___might_sleep
__might_sleep
mutex_lock_nested
? __lock_acquire
nfit_handle_mce
notifier_call_chain
atomic_notifier_call_chain
? atomic_notifier_call_chain
mce_gen_pool_process
Convert the notifier to a blocking one which gets to run only in process
context.
Boris: remove the notifier call in atomic context in print_mce(). For
now, let's print the MCE on the atomic path so that we can make sure
they go out and get logged at least.
Fixes: 6839a6d96f ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make sure the start adderess is aligned to PMD_SIZE
boundary when freeing page table backing a hugepage
region. The issue was causing segfaults when a region
backed by 64K pages was unmapped since such a region
is in general not PMD_SIZE aligned.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the
kernel text, data, and bss fit in the required 32MB limit, but this
option is not set for every config that enables lockdep.
A 4.10 kernel fails to boot with the console output
Kernel: Using 8 locked TLB entries for main kernel image.
hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f
Program terminated
with these config options
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_PROVE_LOCKING=n
To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and
enable this option with CONFIG_LOCKDEP=y so we get the reduced memory
usage every time lockdep is turned on.
Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if
CONFIG_LOCKDEP is set to 'y'. When other lockdep-related config options
that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or
CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also
enabled.
Fixes: e6b5f1be7a ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frameworks that may want to enumerate CMA heaps (e.g. Ion) will find it
useful to have an explicit name attached to each region. Store the name
in each CMA structure.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When switching rotary controlelr from plain IRQ number to IRQ resource, I
messed up the syntax.
Fixes: d422be5f62 ("Input: eeti_ts - expect platform code to set ... ")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Prior to commit 2337d20728 ("powerpc/64: CONFIG_RELOCATABLE support for hmi
interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode()
was just a bl hmi_exception_realmode, which the linker would turn into a bl to
the local entry point of hmi_exception_realmode. This was broken when
CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of
the kernel text that is copied down to 0x0.
But in fixing that, we added a new bug on little endian kernels. Because the
branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry
point of hmi_exception_realmode(). The global entry point must be called with
r12 containing the address of hmi_exception_realmode(), because it uses that
value to calculate the TOC value (r2).
This may manifest as a checkstop, because we take a junk value from r12 which
came from HSRR1, add a small constant to it and then use that as the TOC
pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above
RAM and somewhere in MMIO space.
Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the
label we're branching to. This means r12 will be setup correctly on LE, fixing
this bug, and r12 is also volatile across function calls on BE so it's a good
choice anyway.
Fixes: 2337d20728 ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts")
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel
OOPS:
Bad kernel stack pointer cd93c840 at c000000000009868
Oops: Bad kernel stack pointer, sig: 6 [#1]
...
GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840
...
NIP [c000000000009868] resume_kernel+0x2c/0x58
LR [c000000000006208] program_check_common+0x108/0x180
On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does
not emulate actual store in emulate_step() because it may corrupt the exception
frame. So the kernel does the actual store operation in exception return code
i.e. resume_kernel().
resume_kernel() loads the saved stack pointer from memory using lwz, which only
loads the low 32-bits of the address, causing the kernel crash.
Fix this by loading the 64-bit value instead.
Fixes: be96f63375 ("powerpc: Split out instruction analysis part of emulate_step()")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
[mpe: Change log massage, add stable tag]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
KASLR is mature (and important) enough to be enabled by default on x86.
Also enable it by default in the defconfigs.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dan.j.williams@intel.com
Cc: dave.jiang@intel.com
Cc: dyoung@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With frame pointers disabled, on some older versions of GCC (like
4.8.3), it's possible for the stack pointer to get aligned at a
half-word boundary:
00000000000004d0 <fib_table_lookup>:
4d0: 41 57 push %r15
4d2: 41 56 push %r14
4d4: 41 55 push %r13
4d6: 41 54 push %r12
4d8: 55 push %rbp
4d9: 53 push %rbx
4da: 48 83 ec 24 sub $0x24,%rsp
In such a case, the unwinder ends up reading the entire stack at the
wrong alignment. Then the last read goes past the end of the stack,
hitting the stack guard page:
BUG: stack guard page was hit at ffffc900217c4000 (stack is ffffc900217c0000..ffffc900217c3fff)
kernel stack overflow (page fault): 0000 [#1] SMP
...
Fix it by ensuring the stack pointer is properly aligned before
unwinding.
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 7c7900f897 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/cff33847cc9b02fa548625aa23268ac574460d8d.1492436590.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Update the check which enforces the registration of MCE decoder notifier
callbacks with valid priority only, to include mcelog's priority.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20170418073820.i6kl5tggcntwlisa@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
These defines are never used so remove them to make it clear there is not assembly
code that needs to be updated that uses those fields.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Merge branch 'i2c/for-INT33FE' of
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git
to prepare for an incoming INT33FE driver.
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Pull parisc fix from Helge Deller:
"One patch which fixes get_user() for 64-bit values on 32-bit kernels.
Up to now we lost the upper 32-bits of the returned 64-bit value"
* 'parisc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix get_user() for 64-bit value on 32-bit kernel
Johan Hedberg says:
====================
pull request: bluetooth-next 2017-04-14
Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12
kernel.
- Many fixes to 6LoWPAN, in particular for BLE
- New CA8210 IEEE 802.15.4 device driver (accounting for most of the
lines of code added in this pull request)
- Added Nokia Bluetooth (UART) HCI driver
- Some serdev & TTY changes that are dependencies for the Nokia
driver (with acks from relevant maintainers and an agreement that
these come through the bluetooth tree)
- Support for new Intel Bluetooth device
- Various other minor cleanups/fixes here and there
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike normal compat syscall variants, it is needed only for
biarch architectures that have different alignement requirements for
u64 in 32bit and 64bit ABI *and* have __put_user() that won't handle
a store of 64bit value at 32bit-aligned address. We used to have one
such (ia64), but its biarch support has been gone since 2010 (after
being broken in 2008, which went unnoticed since nobody had been using
it).
It had escaped removal at the same time only because back in 2004
a patch that switched several syscalls on amd64 from private wrappers to
generic compat ones had switched to use of compat_sys_getdents64(), which
hadn't needed (or used) a compat wrapper on amd64.
Let's bury it - it's at least 7 years overdue.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
clang currently does not support these optimizations, only enable them
when they are available.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Davidson <md@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: grundler@chromium.org
Link: http://lkml.kernel.org/r/20170413172609.118122-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull clockevents updates from Daniel Lezcano
- Provide a framework to handle errata gracefuly for arm_arch_timer (Mark
Zyngier)
- Clarify the DT properties for the rockchip timer and add the clocksource as
an alternative to the bogus architected timer (Alexander Kochetkov)
- Rename the Gemini timer to Faraday timer fttmr010 and provide a specific
initialization for Gemini (Linus Walleij)
- Add missing newlines in the error message in the timers (Rafał Miłecki)
- Read the clock once and implement the delay timer on Orion (Russell King)
gcc-4.4.3 fails to statically initialize members of a anon union.
See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676
The storage saving is not really worth it and aside of that it will catch
usage of the cache member for bandwidth and vice versa easier.
Fixes: 05b93417ce ("x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Again, a batch that's been sitting a couple of weeks, mostly because I
anticipated a bit more material but it didn't show up -- which is good.
These are all your garden variety fixes for ARM platforms. Most visible issue
fixed here is probably the SMP reset issue on OMAP, the rest are minor stuff.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY877PAAoJEIwa5zzehBx3TaUQAKtE8CSmvdWIk5grVb4MBDfF
rXQ2R7WJF17LtVhIGwz9BPCQXMK+/XcSuXZ8bvUIoMsmOFXutDao7m7COD0z4aNo
CPWk7ceQRzIWvsmturla6aYGmpAbWzdDgQj61LnaFbQrzmJOHVvcJN685+L5NGkZ
8GBrzNmrvVkWz5N+msnrZRIcKpSqGCXrkjUU1EfHgUgNNdTf6BQRQSzVEYBtILEb
9bC3WdS1fosmSdbsBjcxHsAtWMyO64KqhA691+gGTR93wyOuCRgqv+/ucoFB1oi+
yjuhUVHW7Y5/8KOi67+97PwoKpZYpUFHge1/5iPbgEN6SqhOLzPAALGCKT4ONEQz
KmLl//kX1IJZTD/87OegSLDbpS7h2sRYS7FpwgAa1kyYYOHdsZsECzKfhEvgZNHX
Gtl2XLmtELM9quKQ2X8xWvjU5grfSLkng9OhDJ6ZCFhGddvjtHegq8J5diFyq281
qf4n/Gp/OLC9IoPUyZefn5ya77K8UZPNppqtiWTBbT1IuXWbPUVPKmZoCpq3Qwu+
2fdcFa3sIfpzDg9vkTszFAUCFkRqH5Jzy8BfNIQtfSq+pxwyIRWh88a5CV2RvYFB
uHSt5B88YR2PwoyhV2oFpYNkx3vVu9g+A35ClzIRTLhSMi+Ngh2+DcppHY+LficP
m9Hes98dZOv92JgiSaoQ
=C+Hz
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Again, a batch that's been sitting a couple of weeks, mostly because
I anticipated a bit more material but it didn't show up -- which is
good.
These are all your garden variety fixes for ARM platforms.
The most visible issue fixed here is probably the SMP reset issue on
OMAP, the rest are minor stuff"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: allwinner: a64: add pmu0 regs for USB PHY
ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer
reset: add exported __reset_control_get, return NULL if optional
ARM: orion5x: only call into phylib when available
ARM: omap2+: Revert omap-smp.c changes resetting CPU1 during boot
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
ARM: dts: ti: fix PCI bus dtc warnings
ARM: dts: am335x-baltos: disable EEE for Atheros 8035 PHY
ARM: dts: OMAP3: Fix MFG ID EEPROM
ARM: sun8i: a33: add operating-points-v2 property to all nodes
ARM: sun8i: a33: remove highest OPP to fix CPU crashes
Without this fix we can get PM related warnings for devices that
use deferred probe. If necessary, this fix can wait for the
v4.12 merge window no problem.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAljruuwRHHRvbnlAYXRv
bWlkZS5jb20ACgkQG9Q+yVyrpXMWBBAA0kYBB9IA+OinjpgLBB9ltKX21HBXKAHn
JCiygnR6KzDxnBzsPJk0v0GfRq7FixsEbstyDqfGXDpK5pOFTlgffGFeaMLQt+jU
4PcjLtiXklS9j3jJyUS7SAAjh5sPmR8v5q+NO0ELGi5H2q3c7J7X7VojD9LCB9gm
z/4t133EEPwRdTUghoqxTaB+11ROrbctr0eZUNIafytxsnX4kkpcVGuQxRBu740j
rLXxd3lIJbqasJHj+4v/IkE5CT61OslqEEA8QDaP1oK4d6M4+JVGCBAPXIl6AZ1b
vQEjUl1YMgU1QeF/cnQwf6n6fM1DqhjbdySotDRDlZvWExexTG1BXBcKr1mATfNp
zSGJuAne/zG0AHcqfpTZD3VWbrO1iGw5RmifwwtcmbsAmKu6K7ezMx2QO4L8VBma
N407KOdhcnzsGQnTn3iQNwK36b/lc6ph1DA02TeS41vYB6MBrIp7uugP30k7eOf2
Smv3LClY0H9+db9/IMDyuap8Os3QlGEwXwnVyC67TE3dRP3Js5r7Fm3i9WGmV5Oy
5pU79OXMYzphKUd/11QUSnQUO+SMNH7/fU/dSeqLQMfwe2a5oY4FDuM5Y8uFDoZ7
2KPGLEGWFmn8kxuZvBw/nHiwPexe3y91dIfvA+S7kTQnqFGmM0IzlkMnfcZeV5Zu
AaRkd2G7654=
=07Xj
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v4.11/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Regression fix for omap interconnect code for deferred probe.
Without this fix we can get PM related warnings for devices that
use deferred probe. If necessary, this fix can wait for the
v4.12 merge window no problem.
* tag 'omap-for-v4.11/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer
ARM: omap2+: Revert omap-smp.c changes resetting CPU1 during boot
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
ARM: dts: ti: fix PCI bus dtc warnings
ARM: dts: am335x-baltos: disable EEE for Atheros 8035 PHY
ARM: dts: OMAP3: Fix MFG ID EEPROM
Signed-off-by: Olof Johansson <olof@lixom.net>
This fixes a bug in which the upper 32-bits of a 64-bit value which is
read by get_user() was lost on a 32-bit kernel.
While touching this code, split out pre-loading of %sr2 space register
and clean up code indent.
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Conflicts were simply overlapping changes. In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.
In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nvdimm fixes from Dan Williams:
"A small crop of lockdep, sleeping while atomic, and other fixes /
band-aids in advance of the full-blown reworks targeting the next
merge window. The largest change here is "libnvdimm: fix blk free
space accounting" which deletes a pile of buggy code that better
testing would have caught before merging. The next change that is
borderline too big for a late rc is switching the device-dax locking
from rcu to srcu, I couldn't think of a smaller way to make that fix.
The __copy_user_nocache fix will have a full replacement in 4.12 to
move those pmem special case considerations into the pmem driver. The
"libnvdimm: band aid btt vs clear poison locking" commit admits that
our error clearing support for btt went in broken, so we just disable
it in 4.11 and -stable. A replacement / full fix is in the pipeline
for 4.12
Some of these would have been caught earlier had DEBUG_ATOMIC_SLEEP
been enabled on my development station. I wonder if we should have:
config DEBUG_ATOMIC_SLEEP
default PROVE_LOCKING
...since I mistakenly thought I got both with PROVE_LOCKING=y.
These have received a build success notification from the 0day robot,
and some have appeared in a -next release with no reported issues"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
libnvdimm: band aid btt vs clear poison locking
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
libnvdimm: fix blk free space accounting
acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
The patch 554bfeceb8 ("parisc: Fix access
fault handling in pa_memcpy()") reimplements the pa_memcpy function.
Unfortunatelly, it makes the kernel unbootable. The crash happens in the
function ide_complete_cmd where memcpy is called with the same source
and destination address.
This patch fixes a few bugs in pa_memcpy:
* When jumping to .Lcopy_loop_16 for the first time, don't skip the
instruction "ldi 31,t0" (this bug made the kernel unbootable)
* Use the COND macro when comparing length, so that the comparison is
64-bit (a theoretical issue, in case the length is greater than
0xffffffff)
* Don't use the COND macro after the "extru" instruction (the PA-RISC
specification says that the upper 32-bits of extru result are undefined,
although they are set to zero in practice)
* Fix exception addresses in .Lcopy16_fault and .Lcopy8_fault
* Rename .Lcopy_loop_4 to .Lcopy_loop_8 (so that it is consistent with
.Lcopy8_fault)
Cc: <stable@vger.kernel.org> # v4.9+
Fixes: 554bfeceb8 ("parisc: Fix access fault handling in pa_memcpy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The mmustat_enable sysfs file accessor functions must run code on the
target CPU. This is achieved by temporarily setting the affinity of the
calling user space thread to the requested CPU and reset it to the original
affinity afterwards.
That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU and overwriting the new affinity setting.
Replace it by using work_on_cpu() which guarantees to run the code on the
requested CPU.
Protection against CPU hotplug is not required as the open sysfs file
already prevents the removal from the CPU offline callback. Using the
hotplug protected version would actually be wrong because it would deadlock
against a CPU hotplug operation of the CPU associated to the sysfs file in
progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: herbert@gondor.apana.org.au
Cc: rjw@rjwysocki.net
Cc: peterz@infradead.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: jiangshanlai@gmail.com
Cc: sparclinux@vger.kernel.org
Cc: viresh.kumar@linaro.org
Cc: mpe@ellerman.id.au
Cc: tj@kernel.org
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704131001270.2408@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Init task invokes smp_ops->setup_cpu() from smp_cpus_done(). Init task can
run on any online CPU at this point, but the setup_cpu() callback requires
to be invoked on the boot CPU. This is achieved by temporarily setting the
affinity of the calling user space thread to the requested CPU and reset it
to the original affinity afterwards.
That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.
That's actually not a problem in this context as neither CPU hotplug nor
affinity settings can happen, but the access to task_struct::cpus_allowed
is about to restricted.
Replace it with a call to work_on_cpu_safe() which achieves the same result.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201042.518053336@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
sn_hwperf_op_cpu() which is invoked from an ioctl requires to run code on
the requested cpu. This is achieved by temporarily setting the affinity of
the calling user space thread to the requested CPU and reset it to the
original affinity afterwards.
That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.
Replace it by using work_on_cpu_safe() which guarantees to run the code on
the requested CPU or to fail in case the CPU is offline.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704122251450.2548@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some of the file operations in /proc/sal require to run code on the
requested cpu. This is achieved by temporarily setting the affinity of the
calling user space thread to the requested CPU and reset it to the original
affinity afterwards.
That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.
Replace it by using work_on_cpu_safe() which guarantees to run the code on
the requested CPU or to fail in case the CPU is offline.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201042.341863457@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The CPU hotplug callback fiddles with the cpus_allowed pointer to pin the
calling thread on the plugged CPU. That's already guaranteed by the hotplug
core code.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201042.174518069@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fixes from Thomas Gleixner:
"A set of small fixes for x86:
- fix locking in RDT to prevent memory leaks and freeing in use
memory
- prevent setting invalid values for vdso32_enabled which cause
inconsistencies for user space resulting in application crashes.
- plug a race in the vdso32 code between fork and sysctl which causes
inconsistencies for user space resulting in application crashes.
- make MPX signal delivery work in compat mode
- make the dmesg output of traps and faults readable again"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
x86/vdso: Plug race between mapping and ELF header setup
x86/vdso: Ensure vdso32_enabled gets set to valid values only
x86/signals: Fix lower/upper bound reporting in compat siginfo
Pull perf fixes from Thomas Gleixner:
"Two small fixes for perf:
- the move to support cross arch annotation introduced per arch
initialization requirements, fullfill them for s/390 (Christian
Borntraeger)
- add the missing initialization to the LBR entries to avoid exposing
random or stale data"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
perf annotate s390: Fix perf annotate error -95 (4.10 regression)
Pull EFI fixes from Thomas Gleixner:
"Three fixes from EFI land:
- prevent accessing a Graphic Output Device (GOP) which the kernel
does not know to handle
- prevent PCI reconfiguration to modify a BAR which covers the
framebuffer because that's already in use through the EFI GOP
interface
- avoid reserving EFI runtime regions as this results in bogus memory
mappings"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Don't try to reserve runtime regions
efi/fb: Avoid reconfiguration of BAR that covers the framebuffer
efi/libstub: Skip GOP with PIXEL_BLT_ONLY format
Merge timer updates from John Stultz:
A preparatory patch series for correcting the clock event devices via NTP
to avoid early timer expiry and reprogramming.
The call to irq_ctx_init() is wrapped in #ifdef CONFIG_X86_32.
The declaration of irq_ctx_init in irq.h provides already a stub inline for
the X86_32=n case.
Remove the redundant #ifdef in the code.
[ tglx: Massaged changelog ]
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/1491811500-30307-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The !CONFIG_X86_LOCAL_APIC section in smp.h wraps the define of
hard_smp_processor_id() into #ifndef CONFIG_SMP. But Kconfig has:
config X86_LOCAL_APIC
def_bool y
depends on X86_64 || SMP || X86_32_NON_STANDARD ...
Therefore SMP can't be 'y' when X86_LOCAL_APIC == 'n'.
Remove the redundant #ifndef CONFIG_SMP.
[ tglx: Massaged changelog ]
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: jaswinder@infradead.org
Link: http://lkml.kernel.org/r/1491734806-15413-2-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Currently, the x86's uv rtc clockevent device is initialized as follows:
clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
sn_rtc_cycles_per_second;
clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
(NSEC_PER_SEC / sn_rtc_cycles_per_second);
This translates to a ->min_delta_ticks value of 1 and a ->max_delta_ticks
value of clocksource_uv.mask.
Initialize ->min_delta_ticks and ->max_delta_ticks with these values
respectively.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Mike Travis <travis@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the unicore32 arch's clockevent driver initialize these fields
properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the uml arch's clockevent driver initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Currently, the tile's timer clockevent device is initialized as follows:
evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);
and
.min_delta_ns = 1000,
The first one translates to a ->max_delta_ticks value of MAX_TICK.
For the latter, note that the clockevent core will superimpose a
minimum of 1us by itself -- setting ->min_delta_ticks to 1 is safe here.
Initialize ->min_delta_ticks and ->max_delta_ticks with these values.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the score arch's clockevent driver initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Currently, the s390's CPU timer clockevent device is initialized as
follows:
cd->min_delta_ns = 1;
cd->max_delta_ns = LONG_MAX;
Note that the device's time to cycle conversion factor, i.e.
cd->mult / (2^cd->shift), is approx. equal to 4.
Hence, this would translate to
cd->min_delta_ticks = 4;
cd->max_delta_ticks = 4 * LONG_MAX;
However, a minimum value of 1ns is in the range of noise anyway and the
clockevent core will take care of this by increasing it to 1us or so.
Furthermore, 4*LONG_MAX would overflow the unsigned long argument the
clockevent devices gets programmed with.
Thus, initialize ->min_delta_ticks with 1 and ->max_delta_ticks with
ULONG_MAX.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the mn10300 arch's clockevent driver initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-am33-list@redhat.com
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the c6x arch's clockevent driver initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: linux-c6x-dev@linux-c6x.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the blackfin arch's clockevent driver initialize these fields
properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Steven Miao <realmz6@gmail.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the x86 arch's apic clockevent driver initialize these fields
properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
CC: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the MIPS arch's clockevent drivers initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from these
drivers.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Keguang Zhang <keguang.zhang@gmail.com>
Cc: John Crispin <john@phrozen.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the hexagon arch's clockevent driver initialize these fields
properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the x86 arch's lguest clockevent driver initialize these fields
properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: lguest@lists.ozlabs.org
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the sparc arch's clockevent drivers initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from these
drivers.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the powerpc arch's clockevent driver initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the m68k arch's coldfire clockevent driver initialize these fields
properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Make the x86 arch's xen clockevent driver initialize these fields properly.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
For better maintenance keep it sorted by numeric model ID. Add new lines to
seperate model groups.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20170316155045.50389-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
area on x86 to avoid exposing RAM or tripping hardened usercopy.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJY7ncoAAoJEIly9N/cbcAm6o0QAJKhA+/CnTRr/knMv0ZE7EW0
AuP/Hxdxfu/OCIc+BMDApdfme4yGWLjiD2Jx6GNDy9o1FaKCdJ3MaCOlPNlNa/5o
V+n6z2d7CNDpaiNelhUs38JZGK2aSTYC9a0xQ9JEsQnaunwfHUiirZkdL+ajJI4p
4XOlajWq/mvnBetv8EyZRmBSy51HghNQmk+I0OtyerufZCwwOsbKeDcYr2lqxe7R
WtBtvKJF1p55nsNMXG8L62+q4gY5NGtspwQ/7MLrYwmHI9eOdRLzXZdrqH52PvuF
H1sk6xQ4Xl89Fp43akybaGu6UyTPU09r1Y9LSpgxNApvqdDOsqB+zpD7gq3iWX/c
dtORmMOV3JHyATZkDISX8dN/Qx6bXnsfpfempFd/d+YvdOyh8yRw+ZMCy/2Zx1XP
EaEzHMn6DuOGaROhtDGywXylw1CXFzohnfbeCJ2wiQuPWXPDkyFyqmWjwntkP+TD
jzx+M6glP0Vq7UHScLcJ6mvu65UnfMdNSo+/t4mS2Xg2xsyG5maQ4GQoxAbpmW26
uSZIrxSFlq0kffeyoG9l5lnbTKI24pDf9O98ZiyBM2fOytdQ2LBtxbHI9I6DPHYS
u9QQDsETuWj8LPqcy2stp8BNloTIUdbWwIcuCT/MME/s5qpdkRMEwmLQlXVz64zk
BcmDSmhY7ohAaD8dAnlz
=5z+N
-----END PGP SIGNATURE-----
Merge tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull CONFIG_STRICT_DEVMEM fix from Kees Cook:
"Fixes /dev/mem to read back zeros for System RAM areas in the 1MB
exception area on x86 to avoid exposing RAM or tripping hardened
usercopy"
* tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mm: Tighten x86 /dev/mem with zeroing reads
The files in the info directory for MBA are as follows:
num_closids
The maximum number of CLOSids available for MBA
min_bandwidth
The minimum memory bandwidth percentage value
bandwidth_gran
The granularity of the bandwidth control in percent for the
particular CPU SKU. Intermediate values entered are rounded off
to the previous control step available. Available bandwidth
control steps are minimum_bandwidth + N * bandwidth_gran.
delay_linear
When set, the OS writes a linear percentage based value to the
control MSRs ranging from minimum_bandwidth to 100 percent.
This value is informational and has no influence on the values
written to the schemata files. The values written to the
schemata are always bandwidth percentage that is requested.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-7-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Memory bandwidth allocation requires different information than cache
allocation.
To avoid a lump of data in struct rdt_resource, move all cache related
information into a seperate structure and add that to struct rdt_resource.
Sanitize the data types while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function
graph tracing and then suspend to RAM, it will triple fault and reboot when
it resumes.
The first fault happens when booting a secondary CPU:
startup_32_smp()
load_ucode_ap()
prepare_ftrace_return()
ftrace_graph_is_dead()
(accesses 'kill_ftrace_graph')
The early head_32.S code calls into load_ucode_ap(), which has an an
ftrace hook, so it calls prepare_ftrace_return(), which calls
ftrace_graph_is_dead(), which tries to access the global
'kill_ftrace_graph' variable with a virtual address, causing a fault
because the CPU is still in real mode.
The fix is to add a check in prepare_ftrace_return() to make sure it's
running in protected mode before continuing. The check makes sure the
stack pointer is a virtual kernel address. It's a bit of a hack, but
it's not very intrusive and it works well enough.
For reference, here are a few other (more difficult) ways this could
have potentially been fixed:
- Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
is enabled. (No idea what that would break.)
- Track down load_ucode_ap()'s entire callee tree and mark all the
functions 'notrace'. (Probably not realistic.)
- Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
or __cpu_up(), and ensure that the pause facility can be queried from
real mode.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@kernel.org
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove a redundant self assignment of table->nr_entries, it does
nothing and is an artifact of code simplification re-work.
Detected by CoverityScan, CID#1428450 ("Self assignment")
Fixes: 441ac2f33d ("x86/boot/e820: Simplify e820__update_table()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: kernel-janitors@vger.kernel.org
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/20170413155912.12078-1-colin.king@canonical.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Spurious NMIs will be observed with the following command:
while :; do
perf record -bae "cpu/umask=0x01,event=0xcd,ldlat=0x80/pp"
-e "cpu/umask=0x03,event=0x0/"
-e "cpu/umask=0x02,event=0x0/"
-e cycles,branches,cache-misses
-e cache-references -- sleep 10
done
The bug was introduced by commit:
8077eca079 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")
That commit clears the status bits for the counters used for PEBS
events, by masking the whole 64 bits pebs_enabled. However, only the
low 32 bits of both status and pebs_enabled are reserved for PEBS-able
counters.
For status bits 32-34 are fixed counter overflow bits. For
pebs_enabled bits 32-34 are for PEBS Load Latency.
In the test case, the PEBS Load Latency event and fixed counter event
could overflow at the same time. The fixed counter overflow bit will
be cleared by mistake. Once it is cleared, the fixed counter overflow
never be processed, which finally trigger spurious NMI.
Correct the PEBS enabled mask by ignoring the non-PEBS bits.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 8077eca079 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")
Link: http://lkml.kernel.org/r/1491333246-3965-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A few people have reported unwinder warnings like the following:
WARNING: kernel stack frame pointer at ffffc90000fe7ff0 in rsync:1157 has bad value (null)
unwind stack type:0 next_sp: (null) mask:2 graph_idx:0
ffffc90000fe7f98: ffffc90000fe7ff0 (0xffffc90000fe7ff0)
ffffc90000fe7fa0: ffffffffb7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c)
ffffc90000fe7fa8: 0000000000000246 (0x246)
ffffc90000fe7fb0: 0000000000000000 ...
ffffc90000fe7fc0: 00007ffe3af639bc (0x7ffe3af639bc)
ffffc90000fe7fc8: 0000000000000006 (0x6)
ffffc90000fe7fd0: 00007f80af433fc5 (0x7f80af433fc5)
ffffc90000fe7fd8: 00007ffe3af638e0 (0x7ffe3af638e0)
ffffc90000fe7fe0: 00007ffe3af638e0 (0x7ffe3af638e0)
ffffc90000fe7fe8: 00007ffe3af63970 (0x7ffe3af63970)
ffffc90000fe7ff0: 0000000000000000 ...
ffffc90000fe7ff8: ffffffffb7b74b9a (entry_SYSCALL_64_after_swapgs+0x17/0x4f)
This warning can happen when unwinding a code path where an interrupt
occurred in x86 entry code before it set up the first stack frame.
Silently ignore any warnings for this case.
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: c32c47c68a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/dbd6838826466a60dc23a52098185bc973ce2f1e.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of reading the return address when unwind_get_return_address()
is called, read it from update_stack_state() and store it in the unwind
state. This enables the next patch to check the return address from
unwind_next_frame() so it can detect an entry code frame.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/af0c5e4560c49c0343dca486ea26c4fa92bc4e35.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The __unwind_start() and unwind_next_frame() functions have some
duplicated functionality. They both call decode_frame_pointer() and set
state->regs and state->bp accordingly. Move that functionality to a
common place in update_stack_state().
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a2ee4801113f6d2300d58f08f6b69f85edf4eb43.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ia64 build generates many warnings like this:
WARNING: EXPORT symbol "empty_zero_page" [vmlinux] version generation failed, symbol will not be versioned.
Besides adding the necessary header this also requires fiddling with
some explicit .S -> .o rules.
Cc: IA64-ML <linux-ia64@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the serial slave device for the WL1835 Bluetooth interface.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
A new call to SCU intel_scu_ipc_raw_command() writes SPTR and DPTR
registers before sending a command.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
The pseries platform supports Power4 and later CPUs, all of which are
multithreaded and/or multicore.
In practice no one ever builds a SMP=n kernel for these machines. So as
we did for powernv, have the pseries platform imply SMP=y.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The powernv platform supports Power7 and later CPUs, all of which are
multithreaded and multicore.
As such we never build a SMP=n kernel for those machines, other than
possibly for debugging or running in a simulator.
In the debugging case we can get a similar effect by booting with
nr_cpus=1, or there's always the option of building a custom kernel with
SMP hacked out.
For running in simulators the code size reduction from building without
SMP is not particularly important, what matters is the number of
instructions executed. A quick test shows that a SMP=y kernel takes ~6%
more instructions to boot to a shell. Booting with nr_cpus=1 recovers
about half that deficit.
On the flip side, keeping the SMP=n kernel building can be a pain at
times. And although we've mostly kept it building in recent years, no
one is regularly testing that the SMP=n kernel actually boots and works
well on these machines.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Of the 64-bit Book3S platforms, only powermac supports booting on an
actual non-SMP system. The other platforms can be built with SMP
disabled, but it doesn't make a lot of sense given the CPUs they support
are all multicore or multithreaded.
So give platforms the option of forcing SMP=y.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently powerpc's asm/io.h includes linux/io.h, and linux/io.h
includes asm/io.h.
This can cause problems because depending on which is included first the
order of definitions between the two files will change.
The include of linux/io.h was added back in 2008 in commit b41e5fffe8
("[POWERPC] devres: Add devm_ioremap_prot()"). It's not entirely clear
it was needed then, but devm_ioremap_prot() has since been removed
entirely as unused, in dedd24a12f ("powerpc: Remove unused
devm_ioremap_prot()").
So it seems to be unnecessary and can potentially cause problems, so
remove the include of linux/io.h from asm/io.h
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 requires msgsync for receiver-side synchronization, and a DD1
workaround restricts IPIs to core-local.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop no longer needed asm feature macro changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
IPIs are a pretty hot path and we already have the ability to do asm feature
patching, so use it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Change log detail]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 changes requirements and adds new instructions for
synchronization.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Change the doorbell callers to know about their msgsnd addressing,
rather than have them set a per-cpu target data tag at boot that gets
sent to the cause_ipi functions. The data is only used for doorbell IPI
functions, no other IPI types, so it makes sense to keep that detail
local to doorbell.
Have the platform code understand doorbell IPIs, rather than the
interrupt controller code understand them. Platform code can look at
capabilities it has available and decide which to use.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the bit definition and use it in facility_unavailable_exception() so we can
intelligently report the cause if we take a fault for SCV. This doesn't actually
enable SCV.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop whitespace changes to the existing entries, flush out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have a #define for it, so use it.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reserving a runtime region results in splitting the EFI memory
descriptors for the runtime region. This results in runtime region
descriptors with bogus memory mappings, leading to interesting crashes
like the following during a kexec:
general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05 09/30/2016
RIP: 0010:virt_efi_set_variable()
...
Call Trace:
efi_delete_dummy_variable()
efi_enter_virtual_mode()
start_kernel()
? set_init_arg()
x86_64_start_reservations()
x86_64_start_kernel()
start_cpu()
...
Kernel panic - not syncing: Fatal exception
Runtime regions will not be freed and do not need to be reserved, so
skip the memmap modification in this case.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With commit 23dac14d05 ("MIPS: PCI: Use struct list_head lists") new
controllers are added after the specified head where they where added
before the specified head previously.
Use list_add_tail to restore the former order.
This patches fixes the following PCI error on lantiq:
pci 0000:01:00.0: BAR 0: error updating (0x1c000004 != 0x000000)
Fixes: 23dac14d05 ("MIPS: PCI: Use struct list_head lists")
Signed-off-by: Mathias Kresin <dev@kresin.me>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15808/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit fdd3d8ce0e ("x86/dump_pagetables: Add support for 5-level
paging") introduced an error for dumping with only 4 levels by setting
PGD_LEVEL_MULT to a wrong value.
This is leading to e.g. addresses printed as "(null)" for ranges:
x86/mm: Found insecure W+X mapping at address (null)/(null)
Make PGD_LEVEL_MULT a multiple of PTRS_PER_P4D instead of PTRS_PER_PUD
Fixes: fdd3d8ce0e ("x86/dump_pagetables: Add support for 5-level paging")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170412143634.6846-1-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.
Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.
Fixes: 5de490daec ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable@vger.kernel.org>
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
KGDB is a kernel debug stub and it can't be used to debug userland as it
can only safely access kernel memory.
On MIPS however KGDB has always got the register state of sleeping
processes from the userland register context at the beginning of the
kernel stack. This is meaningless for kernel threads (which never enter
userland), and for user threads it prevents the user seeing what it is
doing while in the kernel:
(gdb) info threads
Id Target Id Frame
...
3 Thread 2 (kthreadd) 0x0000000000000000 in ?? ()
2 Thread 1 (init) 0x000000007705c4b4 in ?? ()
1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
Get the register state instead from the (partial) kernel register
context stored in the task's thread_struct for resume() to restore. All
threads now correctly appear to be in context_switch():
(gdb) info threads
Id Target Id Frame
...
3 Thread 2 (kthreadd) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
2 Thread 1 (init) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
Call clobbered registers which aren't saved and exception registers
(BadVAddr & Cause) which can't be easily determined without stack
unwinding are reported as 0. The PC is taken from the return address,
such that the state presented matches that found immediately after
returning from resume().
Fixes: 8854700115 ("[MIPS] kgdb: add arch support for the kernel's kgdb core")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15829/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Turning on DEBUG in smp-cps.c, or compiling the kernel with
CONFIG_DYNAMIC_DEBUG enabled results the build error:
arch/mips/kernel/smp-cps.c: In function 'play_dead':
./include/linux/dynamic_debug.h:126:3: error: 'core' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
Fix this by always initialising the variable.
Fixes: 0d2808f338 ("MIPS: smp-cps: Add support for CPU hotplug of MIPSr6 processors")
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15848/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:
usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes)
This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.
Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
After the split of linux/sched.h, KASLR stopped building.
Fix this by including the correct header file for init_thread_union
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: Steven J. Hill <Steven.Hill@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15849/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
We found out that HW checksum generation only works from AST2500
onward. This disables it on AST2400 and removes the "no-hw-checksum"
properties in the device-trees. The problem we had wasn't related
to NC-SI.
Also rework the logic testing for that property so it can be used
to disable HW checksum generation and checking regardless of whether
NC-SI is used or not in case other variants out there need this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We test for aspeed chips to handle a couple of special cases,
but we do that by checking the machine type which isn't right.
Instead check the actual device compatible property. This also
updates the dtsi files for the aspeed SoC to match.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
arch_check_elf contains a usage of current_cpu_data that will call
smp_processor_id() with preemption enabled and therefore triggers a
"BUG: using smp_processor_id() in preemptible" warning when an fpxx
executable is loaded.
As a follow-up to commit b244614a60 ("MIPS: Avoid a BUG warning during
prctl(PR_SET_FP_MODE, ...)"), apply the same fix to arch_check_elf by
using raw_current_cpu_data instead. The rationale quoted from the previous
commit:
"It is assumed throughout the kernel that if any CPU has an FPU, then
all CPUs would have an FPU as well, so it is safe to perform the check
with preemption enabled - change the code to use raw_ variant of the
check to avoid the warning."
Fixes: 46490b5725 ("MIPS: kernel: elf: Improve the overall ABI and FPU mode checks")
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
CC: <stable@vger.kernel.org> # 4.0+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15951/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In commit 827456e710 ("MIPS: Export _mcount alongside its definition")
the EXPORT_SYMBOL macro exporting _mcount was moved from C code into
assembly. Unlike C, exported assembly symbols need to have a function
prototype in asm/asm-prototypes.h for modversions to work properly.
Without this, modpost prints out this warning:
WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed,
symbol will not be versioned.
Fix by including asm/ftrace.h (where _mcount is declared) in
asm/asm-prototypes.h.
Fixes: 827456e710 ("MIPS: Export _mcount alongside its definition")
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15952/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The current behaviour of the hash table dump assumes that memory is contiguous
and iterates from the start of memory to (start + size of memory). When memory
isn't physically contiguous, this doesn't work.
If memory exists at 0-5 GB and 6-10 GB then the current approach will check if
entries exist in the hash table from 0GB to 9GB. This patch changes the
behaviour to iterate over any holes up to the end of memory.
Fixes: 1515ab9321 ("powerpc/mm: Dump hash table")
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The current page table dumper scans the Linux page tables and coalesces mappings
with adjacent virtual addresses and similar PTE flags. This behaviour is
somewhat broken when you consider the IOREMAP space where entirely unrelated
mappings will appear to be virtually contiguous. This patch modifies the range
coalescing so that only ranges that are both physically and virtually contiguous
are combined. This patch also adds to the dump output the physical address at
the start of each range.
Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Print the physicall address with 0x like the other addresses]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On Book3s we have two PTE flags used to mark cache-inhibited mappings:
_PAGE_TOLERANT and _PAGE_NON_IDEMPOTENT. Currently the kernel page table dumper
only looks at the generic _PAGE_NO_CACHE which is defined to be _PAGE_TOLERANT.
This patch modifies the dumper so both flags are shown in the dump.
Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
syscall tracing machinery. This means users are not able to see the execution of
mmap() syscalls using the syscall tracer.
Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
meta-data associated with these syscalls is visible to the syscall tracer.
A side-effect of this change is that the return type has changed from unsigned
long to long. However this should have no effect, the only code in the kernel
which uses the result of these syscalls is in the syscall return path, which is
written in asm and treats the result as unsigned regardless.
Example output:
cat-3399 [001] .... 196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
cat-3399 [001] .... 196.542443: sys_mmap -> 0x7fff922a0000
cat-3399 [001] .... 196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
cat-3399 [001] .... 196.542677: sys_munmap -> 0x0
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Massage change log, add detail on return type change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Recently in commit f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB"),
we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
The end result is that the PGD (Page Global Directory, ie top level page table)
of the kernel (aka. swapper_pg_dir), is too small.
This generally doesn't lead to a crash, as we don't use the full range in normal
operation. However if we try to dump the kernel pagetables we can trigger a
crash because we walk off the end of the pgd into other memory and eventually
try to dereference something bogus:
$ cat /sys/kernel/debug/kernel_pagetables
Unable to handle kernel paging request for data at address 0xe8fece0000000000
Faulting instruction address: 0xc000000000072314
cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
pc: c000000000072314: ptdump_show+0x164/0x430
lr: c000000000072550: ptdump_show+0x3a0/0x430
dar: e802cf0000000000
seq_read+0xf8/0x560
full_proxy_read+0x84/0xc0
__vfs_read+0x6c/0x1d0
vfs_read+0xbc/0x1b0
SyS_read+0x6c/0x110
system_call+0x38/0xfc
The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
the calculation into asm-offsets.c where we can do it easily using
max().
Fixes: f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This merges the arch part of the XIVE support, leaving the final commit
with the KVM specific pieces dangling on the branch for Paul to merge
via the kvm-ppc tree.
Since bbb56c2722 ("arm64: Add detection code for broken .inst support
in binutils"), running any make target that doesn't involve the cross
compiler results in a spurious warning:
$ make ARCH=arm64 menuconfig
arch/arm64/Makefile:43: Detected assembler with broken .inst; disassembly will be unreliable
while
$ make ARCH=arm64 CROSS_COMPILE=aarch64-arm-linux- menuconfig
is silent (assuming your compiler is not affected). That's because
the code that tests for the workaround is always run, irrespective
of the current configuration being available or not.
An easy fix is to make the detection conditional on CONFIG_ARM64
being defined, which is only the case when actually building
something.
Fixes: bbb56c2722 ("arm64: Add detection code for broken .inst support in binutils")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Consolidate x86 instruction decoder users on the path of
copying original code for kprobes.
Kprobes decodes the same instruction a maximum of 3 times when
preparing the instruction buffer:
- The first time for getting the length of the instruction,
- the 2nd for adjusting displacement,
- and the 3rd for checking whether the instruction is boostable or not.
For each time, the actual decoding target address is slightly
different (1st is original address or recovered instruction buffer,
2nd and 3rd are pointing to the copied buffer), but all have
the same instruction.
Thus, this patch also changes the target address to the copied
buffer at first and reuses the decoded "insn" for displacement
adjusting and checking boostability.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076389643.22469.13151892839998777373.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use probe_kernel_read() for avoiding unexpected faults while
copying kernel text in __recover_probed_insn(),
__recover_optprobed_insn() and __copy_instruction().
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076382624.22469.10091613887942958518.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Set the pages which is used for kprobes' singlestep buffer
and optprobe's trampoline instruction buffer to readonly.
This can prevent unexpected (or unintended) instruction
modification.
This also passes rodata_test as below.
Without this patch, rodata_test shows a warning:
WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x7a9/0xa20
x86/mm: Found insecure W+X mapping at address ffffffffa0000000/0xffffffffa0000000
With this fix, no W+X pages are found:
x86/mm: Checked W+X mappings: passed, no W+X pages found.
rodata_test: all tests were successful
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076375592.22469.14174394514338612247.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make arch_specific_insn.boostable to boolean, since it has
only 2 states, boostable or not. So it is better to use
boolean from the viewpoint of code readability.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076368566.22469.6322906866458231844.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Do not modify singlestep execution buffer (kprobe.ainsn.insn)
while resuming from single-stepping, instead, modifies
the buffer to add a jump back instruction at preparing
buffer.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076361560.22469.1610155860343077495.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use x86 instruction decoder for checking whether the probed
instruction is able to boost or not, instead of hand-written
code.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076354563.22469.13379472209338986858.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix the description comment of __copy_instruction() function
since it has already been changed to return the length of the
copied instruction.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076347582.22469.3775133607244923462.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix the kprobe-booster not to boost far call instruction,
because a call may store the address in the single-step
execution buffer to the stack, which should be modified
after single stepping.
Currently, this instruction will be filtered as not
boostable in resume_execution(), so this is not a
critical issue.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076340615.22469.14066273186134229909.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The CAD instruction never worked quite as expected for the spinlock
code. It has been disabled by default with git commit 61b0b01686,
if the "cad" kernel parameter is specified it is enabled for both user
space and the spinlock code. Leave the option to enable the instruction
for user space but remove it from the spinlock code.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a couple more __atomic_xxx function to atomic_ops.h and use them
to replace the compare-and-swap inlines in the spinlock code. This
changes the type of the lock value from unsigned int to int.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When this function fails it just sends a SIGSEGV signal to
user-space using force_sig(). This signal is missing
essential information about the cause, e.g. the trap_nr or
an error code.
Fix this by propagating the error to the only caller of
mpx_handle_bd_fault(), do_bounds(), which sends the correct
SIGSEGV signal to the process.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fe3d197f84 ('x86, mpx: On-demand kernel allocation of bounds tables')
Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On heavy paging with KSM I see guest data corruption. Turns out that
KSM will add pages to its tree, where the mapping return true for
pte_unused (or might become as such later). KSM will unmap such pages
and reinstantiate with different attributes (e.g. write protected or
special, e.g. in replace_page or write_protect_page)). This uncovered
a bug in our pagetable handling: We must remove the unused flag as
soon as an entry becomes present again.
Cc: stable@vger.kernel.org
Signed-of-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that we have a framework to handle the ACPI bits, make the PMUv3
code use this. The framework is a little different to what was
originally envisaged, and we can drop some unused support code in the
process of moving over to it.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
[will: make armv8_pmu_driver_init static]
Signed-off-by: Will Deacon <will.deacon@arm.com>
When probing via ACPI, we won't know up-front whether a CPU has a PMUv3
compatible PMU. Thus we need to consult ID registers during probe time.
This patch updates our PMUv3 probing code to test for the presence of
PMUv3 functionality before touching an PMUv3-specific registers, and
before updating the struct arm_pmu with PMUv3 data.
When a PMUv3-compatible PMU is not present, probing will return -ENODEV.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently the ACPI parking protocol code needs to parse each CPU's MADT
GICC table to extract the mailbox address and so on. Each time we parse
a GICC table, we call back to the parking protocol code to parse it.
This has been fine so far, but we're about to have more code that needs
to extract data from the GICC tables, and adding a callback for each
user is going to get unwieldy.
Instead, this patch ensures that we stash a copy of each CPU's GICC
table at boot time, such that anything needing to parse it can later
request it. This will allow for other parsers of GICC, and for
simplification to the ACPI parking protocol code. Note that we must
store a copy, rather than a pointer, since the core ACPI code
temporarily maps/unmaps tables while iterating over them.
Since we parse the MADT before we know how many CPUs we have (and hence
before we setup the percpu areas), we must use an NR_CPUS sized array.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When specifying a generic defconfig target with O=... option set, make
is invoked in the output location before a target makefile wrapper is
created. Ensure that the correct makefile is used by specifying the
kernel source makefile during make invocation.
This fixes the either of the following errors:
$ make sead3_defoncifg ARCH=mips O=test
make[1]: Entering directory '/mnt/ssd/MIPS/linux-next/test'
make[2]: *** No rule to make target '32r2el_defconfig'. Stop.
arch/mips/Makefile:506: recipe for target 'sead3_defconfig' failed
make[1]: *** [sead3_defconfig] Error 2
make[1]: Leaving directory '/mnt/ssd/MIPS/linux-next/test'
Makefile:152: recipe for target 'sub-make' failed
make: *** [sub-make] Error 2
$ make 32r2el_defconfig ARCH=mips O=test
make[1]: Entering directory '/mnt/ssd/MIPS/linux-next/test'
Using ../arch/mips/configs/generic_defconfig as base
Merging ../arch/mips/configs/generic/32r2.config
Merging ../arch/mips/configs/generic/el.config
Merging ../arch/mips/configs/generic/board-sead-3.config
!
! merged configuration written to .config (needs make)
!
make[2]: *** No rule to make target 'olddefconfig'. Stop.
arch/mips/Makefile:489: recipe for target '32r2el_defconfig' failed
make[1]: *** [32r2el_defconfig] Error 2
make[1]: Leaving directory '/mnt/ssd/MIPS/linux-next/test'
Makefile:152: recipe for target 'sub-make' failed
make: *** [sub-make] Error 2
Fixes: eed0eabd12 ('MIPS: generic: Introduce generic DT-based board support')
Fixes: 3f5f0a4475 ('MIPS: generic: Convert SEAD-3 to a generic board')
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15464/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The schemata lock is released before freeing the resource's temporary
tmp_cbms allocation. That's racy versus another write which allocates and
uses new temporary storage, resulting in memory leaks, freeing in use
memory, double a free or any combination of those.
Move the unlock after the release code.
Fixes: 60ec2440c6 ("x86/intel_rdt: Add schemata file")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170411071446.15241-1-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since commit:
4bcc595ccd "printk: reinstate KERN_CONT for printing"
... the debug output of signal_fault(), do_trap() and do_general_protection()
looks garbled, e.g.:
traps: conftest[9335] trap invalid opcode ip:400428 sp:7ffeaba1b0d8 error:0
in conftest[400000+1000]
(note the unintended line break.)
Fix the bug by adding KERN_CONTs.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's a conflict between ongoing level-5 paging support and
the E820 rewrite. Since the E820 rewrite is essentially ready,
merge it into x86/mm to reduce tree conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The E820 rework in WIP.x86/boot has gone through a couple of weeks
of exposure in -tip, merge it in a wider fashion.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fam16h is the same as the default one, remove it. Turn the switch-case
into a simple if-else.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
the HSPRG0 of a thread waking up from can contain the paca pointer of
its sibling.
This patch implements a context recovery framework within threads of a
core, by provisioning space in paca_struct for saving every sibling
threads's paca pointers. Basically, we should be able to arrive at the
right paca pointer from any of the thread's existing paca pointer.
At bootup, during powernv idle-init, we save the paca address of every
CPU in each one its siblings paca_struct in the slot corresponding to
this CPU's index in the core.
On wakeup from a stop, the thread will determine its index in the core
from the TIR register and recover its PACA pointer by indexing into
the correct slot in the provisioned space in the current PACA.
Furthermore, ensure that the NVGPRs are restored from the stack on the
way out by setting the NAPSTATELOST in paca.
[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Call it a bug]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently during idle-init on power9, if we don't find suitable stop
states in the device tree that can be used as the
default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default
stop state psscr to be used by power9_idle and deepest stop state
which is used by CPU-Hotplug.
However, if the platform firmware has not configured or enabled a stop
state, the kernel should not make any assumptions and fallback to a
default choice.
If the kernel uses a stop state that is not configured by the platform
firmware, it may lead to further failures which should be avoided.
In this patch, we modify the init code to ensure that the kernel uses
only the stop states exposed by the firmware through the device
tree. When a suitable default stop state isn't found, we disable
ppc_md.power_save for power9. Similarly, when a suitable
deepest_stop_state is not found in the device tree exported by the
firmware, fall back to the default busy-wait loop in the CPU-Hotplug
code.
[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, the powernv cpu-offline function assumes that platform idle
states such as stop on POWER9, winkle/sleep/nap on POWER8 are always
available. On POWER8, it picks nap as the default state if other deep
idle states like sleep/winkle are not available and enabled in the
platform.
On POWER9, nap is not available and all idle states are managed by
STOP instruction. The parameters to the idle state are passed through
processor stop status control register (PSSCR). Hence as such
executing STOP would take parameters from current PSSCR. We do not
want to make any assumptions in kernel on what STOP states and PSSCR
features are configured by the platform.
Ideally platform will configure a good set of stop states that can be
used in the kernel. We would like to start with a clean slate, if the
platform choose to not configure any state or there is an error in
platform firmware that lead to no stop states being configured or
allowed to be requested.
This patch adds a fallback method for CPU-Hotplug that is similar to
snooze loop at idle where the threads are left to spin at low priority
and hence reduce the cycles consumed.
This is a safe fallback mechanism in the case when no stop state would
be requested if the platform firmware did not configure them most
likely due to an error condition.
Requesting a stop state when the platform has not configured them or
enabled them would lead to further error conditions which could be
difficult to debug.
[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
transitions the CPU to the deepest available platform idle state to a
new function named pnv_cpu_offline() in powernv/idle.c. The rationale
behind this code movement is that the data required to determine the
deepest available platform state resides in powernv/idle.c.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add user space exported API definitions for 512KB, 1MB, 2MB, 8MB, 16MB,
1GB, 16GB non default huge page sizes to be used with mmap() system
call.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Reword the comment to emphasise that these are only needed to use
the non-default huge page size, and updated the change log.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Make sparsemem the default on all 64-bit Book3S platforms. It already is
for pseries and ps3, and we need to enable it for powernv because on
POWER9 memory between chips is discontiguous.
For the other platforms sparsemem should work fine, though it might add
a small amount of overhead. We can always force FLATMEM in the
defconfigs if necessary.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
setup_initial_memory_limit() is called from early_init_devtree(), which
runs prior to feature patching. If the kernel is built with CONFIG_JUMP_LABEL=y
and CONFIG_JUMP_LABEL_FEATURE_CHECKS=y then we will potentially get the
wrong value.
If we also have CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG=y we get a warning
and backtrace:
Warning! mmu_has_feature() used prior to jump label init!
CPU: 0 PID: 0 Comm: swapper Not tainted 4.11.0-rc4-gccN-next-20170331-g6af2434 #1
Call Trace:
[c000000000fc3d50] [c000000000a26c30] .dump_stack+0xa8/0xe8 (unreliable)
[c000000000fc3de0] [c00000000002e6b8] .setup_initial_memory_limit+0xa4/0x104
[c000000000fc3e60] [c000000000d5c23c] .early_init_devtree+0xd0/0x2f8
[c000000000fc3f00] [c000000000d5d3b0] .early_setup+0x90/0x11c
[c000000000fc3f90] [c000000000000520] start_here_multiplatform+0x68/0x80
Fix it by using early_mmu_has_feature().
Fixes: c12e6f24d4 ("powerpc: Add option to use jump label for mmu_has_feature()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These files don't seem to have any need for asm/debug.h, now that all it
includes are the debugger hooks and breakpoint definitions.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.
Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.
Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In the recent commit 1ab66d1fba ("powerpc/powernv: Introduce address
translation services for Nvlink2") the NPU code gained a dependency on MMU
notifiers.
All our defconfigs have KVM enabled, which selects MMU_NOTIFIER, but if KVM is
not enabled then the build breaks.
Fix it by always selecting MMU_NOTIFIER when we're building powernv.
Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Reword change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For a tlbiel with pid, we need to issue tlbiel with set number encoded. We
don't need to do ptesync for each of those. Instead we need one for the entire
tlbiel pid operation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For fullmm tlb flush, we do a flush with RIC_FLUSH_ALL which will invalidate all
related caches (radix__tlb_flush()). Hence the pwc flush is not needed.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJY6mY1AAoJEHm+PkMAQRiGB14IAImsH28JPjxJVDasMIRPBxVc
euPPlZgoBieu7sNt+kEsEqdkXuu0MLk6gln0IGxWLeoB2S+u3Tz5LMa2YArVqV9Z
tWzOnI9auE73P2Pz/tUMOdyMs5tO0PolQxX3uljbULBozOHjHRh13fsXchX2yQvl
mFeFCDqpPV0KhWRH/ciA8uIHdvYPhMpkKgRtmR8jXL0yzqLp6+2J+Bs8nHG4NNng
HMVxZPC8jOE/TgWq6k/GmXgxh3H/AideFdHFbLKYnIFJW41ZGOI8a262zq3NmjPd
lywpVU7O7RMhSITY5PnuR3LpNV8ftw1hz2y6t35unyFK1P02adOSj5GJ3hGdhaQ=
=Xz5O
-----END PGP SIGNATURE-----
Backmerge tag 'v4.11-rc6' into drm-next
Linux 4.11-rc6
drm-misc needs 4.11-rc5, may as well fix conflicts with rc6.
The resource control filesystem provides only a bitmask based cpus file for
assigning CPUs to a resource group. That's cumbersome with large cpumasks
and non-intuitive when modifying the file from the command line.
Range based cpu lists are commonly used along with bitmask based cpu files
in various subsystems throughout the kernel.
Add 'cpus_list' file which is CPU range based.
# cd /sys/fs/resctrl/
# echo 1-10 > krava/cpus_list
# cat krava/cpus_list
1-10
# cat krava/cpus
0007fe
# cat cpus
fffff9
# cat cpus_list
0,3-23
[ tglx: Massaged changelog and replaced "bitmask lists" by "CPU ranges" ]
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20170410145232.GF25354@krava
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel-doc is inconsistently formatted. Fix it up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
The vsyscall32 sysctl can racy against a concurrent fork when it switches
from disabled to enabled:
arch_setup_additional_pages()
if (vdso32_enabled)
--> No mapping
sysctl.vsysscall32()
--> vdso32_enabled = true
create_elf_tables()
ARCH_DLINFO_IA32
if (vdso32_enabled) {
--> Add VDSO entry with NULL pointer
Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for
the newly forked process or not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
vdso_enabled can be set to arbitrary integer values via the kernel command
line 'vdso32=' parameter or via 'sysctl abi.vsyscall32'.
load_vdso32() only maps VDSO if vdso_enabled == 1, but ARCH_DLINFO_IA32
merily checks for vdso_enabled != 0. As a consequence the AT_SYSINFO_EHDR
auxiliary vector for the VDSO_ENTRY is emitted with a NULL pointer which
causes a segfault when the application tries to use the VDSO.
Restrict the valid arguments on the command line and the sysctl to 0 and 1.
Fixes: b0b49f2673 ("x86, vdso: Remove compat vdso support")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Roland McGrath <roland@redhat.com>
Link: http://lkml.kernel.org/r/1491424561-7187-1-git-send-email-minipli@googlemail.com
Link: http://lkml.kernel.org/r/20170410151723.518412863@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the new hdmi phandle to exynos4.dtsi. This phandle is needed by the
s5p-cec driver to initialize the CEC notifier framework.
Tested with my Odroid U3.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: linux-samsung-soc@vger.kernel.org
CC: devicetree@vger.kernel.org
CC: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
To use CEC notifier sti CEC driver needs to get phandle
of the hdmi device.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
CC: Patrice CHOTARD <patrice.chotard@st.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of
that indirection. Update defconfigs which had it.
While at it, fix dependencies such that EDAC depends on RAS for the
tracepoints.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-edac@vger.kernel.org
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. See
c0d1217202 ("drivers/edac: add new nmi rescan").
From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.
Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."
So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:
inline int edac_handler_set(void)
{
if (edac_op_state == EDAC_OPSTATE_POLL)
return 0;
return atomic_read(&edac_handlers);
}
So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
We need to set LPES in order for normal external interrupts (0x500)
to be directed to the guest while running in guest state.
We also need HEIC set to prevent them to be sent to the host while
in host state.
With XIVE the host never gets one of these and wouldn't know how to
handle it. All host external interrupts come in via the new
hypervisor virtualization interrupts vector.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have all sort of variants of MMIO accessors for the real mode
instructions. This creates a clean set of accessors based on
Linux normal naming conventions, replacing all occurrences of
the old ones in the tree.
I have purposefully removed the "out/in" variants in favor of
only including __raw variants. Any code using these is already
pretty much hand tuned to operate in a very specific environment.
I've fixed up the 2 users (only one of them actually needed
a barrier in the first place).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The function doesn't exist anymore
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's only used within the same file it's defined
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We traditionally have linux/ before asm/
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The XIVE interrupt controller is the new interrupt controller
found in POWER9. It supports advanced virtualization capabilities
among other things.
Currently we use a set of firmware calls that simulate the old
"XICS" interrupt controller but this is fairly inefficient.
This adds the framework for using XIVE along with a native
backend which OPAL for configuration. Later, a backend allowing
the use in a KVM or PowerVM guest will also be provided.
This disables some fast path for interrupts in KVM when XIVE is
enabled as these rely on the firmware emulation code which is no
longer available when the XIVE is used natively by Linux.
A latter patch will make KVM also directly exploit the XIVE, thus
recovering the lost performance (and more).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings,
tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV,
fix build errors when SMP=n, fold in fixes from Ben:
Don't call cpu_online() on an invalid CPU number
Fix irq target selection returning out of bounds cpu#
Extra sanity checks on cpu numbers
]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
calculate_min_delta() may incorrectly access a 4th element of buf2[]
which only has 3 elements. This may trigger undefined behaviour and has
been reported to cause strange crashes in start_kernel() sometime after
timer initialization when built with GCC 5.3, possibly due to
register/stack corruption:
sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 10737418237ns
CPU 0 Unable to handle kernel paging request at virtual address ffffb0aa, epc == 8067daa8, ra == 8067da84
Oops[#1]:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.18 #51
task: 8065e3e0 task.stack: 80644000
$ 0 : 00000000 00000001 00000000 00000000
$ 4 : 8065b4d0 00000000 805d0000 00000010
$ 8 : 00000010 80321400 fffff000 812de408
$12 : 00000000 00000000 00000000 ffffffff
$16 : 00000002 ffffffff 80660000 806a666c
$20 : 806c0000 00000000 00000000 00000000
$24 : 00000000 00000010
$28 : 80644000 80645ed0 00000000 8067da84
Hi : 00000000
Lo : 00000000
epc : 8067daa8 start_kernel+0x33c/0x500
ra : 8067da84 start_kernel+0x318/0x500
Status: 11000402 KERNEL EXL
Cause : 4080040c (ExcCode 03)
BadVA : ffffb0aa
PrId : 0501992c (MIPS 1004Kc)
Modules linked in:
Process swapper/0 (pid: 0, threadinfo=80644000, task=8065e3e0, tls=00000000)
Call Trace:
[<8067daa8>] start_kernel+0x33c/0x500
Code: 24050240 0c0131f9 24849c64 <a200b0a8> 41606020 000000c0 0c1a45e6 00000000 0c1a5f44
UBSAN also detects the same issue:
================================================================
UBSAN: Undefined behaviour in arch/mips/kernel/cevt-r4k.c:85:41
load of address 80647e4c with insufficient space
for an object of type 'unsigned int'
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.18 #47
Call Trace:
[<80028f70>] show_stack+0x88/0xa4
[<80312654>] dump_stack+0x84/0xc0
[<8034163c>] ubsan_epilogue+0x14/0x50
[<803417d8>] __ubsan_handle_type_mismatch+0x160/0x168
[<8002dab0>] r4k_clockevent_init+0x544/0x764
[<80684d34>] time_init+0x18/0x90
[<8067fa5c>] start_kernel+0x2f0/0x500
=================================================================
buf2[] is intentionally only 3 elements so that the last element is the
median once 5 samples have been inserted, so explicitly prevent the
possibility of comparing against the 4th element rather than extending
the array.
Fixes: 1fa405552e ("MIPS: cevt-r4k: Dynamically calculate min_delta_ns")
Reported-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Tested-by: Rabin Vincent <rabinv@axis.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.7.x-
Patchwork: https://patchwork.linux-mips.org/patch/15892/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The operand is an integer constant, make the constness explicit by
adding the modifier. This is needed for clang to generate valid code
and also works with gcc.
Also change the constraint of the operand from 'I' ("Integer constant
that is valid as an immediate operand in an ADD instruction", AArch64)
to 'i' ("An immediate integer operand").
Based-on-patch-from: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>