It's possible for iretq to userspace to fail. This can happen because
of a bad CS, SS, or RIP.
Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace. To make this work, there's
an extra fixup to fudge the gs base into a usable state.
This is suboptimal because it loses the original exception. It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with. For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.
This patch throws out bad_iret entirely. As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C. It's should be clearer and more correct.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a 32-bit kernel, this has no effect, since there are no IST stacks.
On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code. The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.
This fixes a bug in which the espfix64 code mishandles a stack segment
violation.
This saves 4k of memory per CPU and a tiny bit of code.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly. Move it to C.
This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.
Fixes: 3891a04aaf
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed
to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage
sites. The conversion helper functions are kept around to avoid
conflicts in next and will be removed after merging into mainline.
Coccinelle assisted conversion. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: x86@kernel.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Mohit Kumar <mohit.kumar@st.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
specific.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fixes from Thomas Gleixner:
"Misc fixes:
- gold linker build fix
- noxsave command line parsing fix
- bugfix for NX setup
- microcode resume path bug fix
- _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious
lockup thread"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
x86, kaslr: Handle Gold linker for finding bss/brk
x86, mm: Set NX across entire PMD at boot
x86, microcode: Update BSPs microcode on resume
x86: Require exact match for 'noxsave' command line option
Pull perf fixes from Ingo Molnar:
"Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a
build dependencies fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
perf: Fix corruption of sibling list with hotplug
perf/x86: Fix embarrasing typo
TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
_TIF_SINGLESTEP), not (1<<19).
This code is involved in Dave's trinity lockup, but I don't see why
it would cause any of the problems he's seeing, except inadvertently
by causing a different path through entry_64.S's syscall handling.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Recover original IP register if the pre_handler doesn't change it.
Since current kprobes doesn't expect that another ftrace handler
may change regs->ip, it sets kprobe.addr + MCOUNT_INSN_SIZE to
regs->ip and returns to ftrace.
This seems wrong behavior since kprobes can recover regs->ip
and safely pass it to another handler.
This adds code which recovers original regs->ip passed from
ftrace right before returning to ftrace, so that another ftrace
user can change regs->ip.
Link: http://lkml.kernel.org/r/20141009130106.4698.26362.stgit@kbuild-f20.novalocal
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When trigger_all_cpu_backtrace() is called on x86, it will trigger an
NMI on each CPU and call show_regs(). But this can lead to a hard lock
up if the NMI comes in on another printk().
In order to avoid this, when the NMI triggers, it switches the printk
routine for that CPU to call a NMI safe printk function that records the
printk in a per_cpu seq_buf descriptor. After all NMIs have finished
recording its data, the seq_bufs are printed in a safe context.
Link: http://lkml.kernel.org/p/20140619213952.360076309@goodmis.org
Link: http://lkml.kernel.org/r/20141115050605.055232587@goodmis.org
Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Stack traces that happen from function tracing check if the address
on the stack is a __kernel_text_address(). That is, is the address
kernel code. This calls core_kernel_text() which returns true
if the address is part of the builtin kernel code. It also calls
is_module_text_address() which returns true if the address belongs
to module code.
But what is missing is ftrace dynamically allocated trampolines.
These trampolines are allocated for individual ftrace_ops that
call the ftrace_ops callback functions directly. But if they do a
stack trace, the code checking the stack wont detect them as they
are neither core kernel code nor module address space.
Adding another field to ftrace_ops that also stores the size of
the trampoline assigned to it we can create a new function called
is_ftrace_trampoline() that returns true if the address is a
dynamically allocate ftrace trampoline. Note, it ignores trampolines
that are not dynamically allocated as they will return true with
the core_kernel_text() function.
Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When CONFIG_FRAME_POINTERS are enabled, it is required that the
ftrace_caller and ftrace_regs_caller trampolines set up frame pointers
otherwise a stack trace from a function call wont print the functions
that called the trampoline. This is due to a check in
__save_stack_address():
#ifdef CONFIG_FRAME_POINTER
if (!reliable)
return;
#endif
The "reliable" variable is only set if the function address is equal to
contents of the address before the address the frame pointer register
points to. If the frame pointer is not set up for the ftrace caller
then this will fail the reliable test. It will miss the function that
called the trampoline. Worse yet, if fentry is used (gcc 4.6 and
beyond), it will also miss the parent, as the fentry is called before
the stack frame is set up. That means the bp frame pointer points
to the stack of just before the parent function was called.
Link: http://lkml.kernel.org/r/20141119034829.355440340@goodmis.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Uncorrected no action required (UCNA) - is a uncorrected recoverable
machine check error that is not signaled via a machine check exception
and, instead, is reported to system software as a corrected machine
check error. UCNA errors indicate that some data in the system is
corrupted, but the data has not been consumed and the processor state
is valid and you may continue execution on this processor. UCNA errors
require no action from system software to continue execution. Note that
UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
(MCG_SER_P) is set.
-- Intel SDM Volume 3B
Deferred errors are errors that cannot be corrected by hardware, but
do not cause an immediate interruption in program flow, loss of data
integrity, or corruption of processor state. These errors indicate
that data has been corrupted but not consumed. Hardware writes information
to the status and address registers in the corresponding bank that
identifies the source of the error if deferred errors are enabled for
logging. Deferred errors are not reported via machine check exceptions;
they can be seen by polling the MCi_STATUS registers.
-- AMD64 APM Volume 2
Above two items, both UCNA and Deferred errors belong to detected
errors, but they can't be corrected by hardware, and this is very
similar to Software Recoverable Action Optional (SRAO) errors.
Therefore, we can take some actions that have been used for handling
SRAO errors to handle UCNA and Deferred errors.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.
This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.
In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.
Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In the situation when we apply early microcode but do *not* apply late
microcode, we fail to update the BSP's microcode on resume because we
haven't initialized the uci->mc microcode pointer. So, in order to
alleviate that, we go and dig out the stashed microcode patch during
early boot. It is basically the same thing that is done on the APs early
during boot so do that too here.
Tested-by: alex.schnaidt@gmail.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88001
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141118094657.GA6635@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is really the meat of the MPX patch set. If there is one patch to
review in the entire series, this is the one. There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner. (small FAQ below).
Long Description:
This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers. The prctl() is an
explicit signal from userspace.
PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.
PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.
PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address. Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.
Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive. Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.
==== Why does the hardware even have these Bounds Tables? ====
MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".
They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.
The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.
==== Why not do this in userspace? ====
This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.
Q: Can virtual space simply be reserved for the bounds tables so
that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
area needs 4*X GB of virtual space, plus 2GB for the bounds
directory. If we were to preallocate them for the 128TB of
user virtual address space, we would need to reserve 512TB+2GB,
which is larger than the entire virtual address space today.
This means they can not be reserved ahead of time. Also, a
single process's pre-popualated bounds directory consumes 2GB
of virtual *AND* physical memory. IOW, it's completely
infeasible to prepopulate bounds directories.
Q: Can we preallocate bounds table space at the same time memory
is allocated which might contain pointers that might eventually
need bounds tables?
A: This would work if we could hook the site of each and every
memory allocation syscall. This can be done for small,
constrained applications. But, it isn't practical at a larger
scale since a given app has no way of controlling how all the
parts of the app might allocate memory (think libraries). The
kernel is really the only place to intercept these calls.
Q: Could a bounds fault be handed to userspace and the tables
allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
handler functions and even if mmap() would work it still
requires locking or nasty tricks to keep track of the
allocation state there.
Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.
Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current x86 instruction decoder steps along through the
instruction stream but always ensures that it never steps farther
than the largest possible instruction size (MAX_INSN_SIZE).
The MPX code is now going to be doing some decoding of userspace
instructions. We copy those from userspace in to the kernel and
they're obviously completely untrusted coming from userspace. In
addition to the constraint that instructions can only be so long,
we also have to be aware of how long the buffer is that came in
from userspace. This _looks_ to be similar to what the perf and
kprobes is doing, but it's unclear to me whether they are
affected.
The whole reason we need this is that it is perfectly valid to be
executing an instruction within MAX_INSN_SIZE bytes of an
unreadable page. We should be able to gracefully handle short
reads in those cases.
This adds support to the decoder to record how long the buffer
being decoded is and to refuse to "validate" the instruction if
we would have gone over the end of the buffer to decode it.
The kprobes code probably needs to be looked at here a bit more
carefully. This patch still respects the MAX_INSN_SIZE limit
there but the kprobes code does look like it might be able to
be a bit more strict than it currently is.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We have some very similarly named command-line options:
arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);
__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:
__setup("foo", x86_foo_func...);
The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar". If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.
This makes the "noxsave" handler only return success when it finds
an *exact* match.
[ tglx: We really need to make __setup() more robust. ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
PEBS can capture machine state regs at retiremnt of the sampled
instructions. When precise sampling is enabled on an event, PEBS
is used, so substitute the interrupted state with the PEBS state.
Note that not all registers are captured by PEBS. Those missing
are replaced by the interrupt state counter-parts.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411559322-16548-3-git-send-email-eranian@google.com
Cc: cebbert.lkml@gmail.com
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Disallow setting inv/cmask/etc. flags for all PEBS events
on these CPUs, except for the UOPS_RETIRED.* events on Nehalem/Westmere,
which are needed for cycles:p. This avoids an undefined situation
strongly discouraged by the Intle SDM. The PLD_* events were already
covered. This follows the earlier changes for Sandy Bridge and alter.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
My earlier commit:
86a04461a9 ("perf/x86: Revamp PEBS event selection")
made nearly all PEBS on Sandy/IvyBridge/Haswell to reject non zero flags.
However this wasn't done for the INST_RETIRED.PREC_DIST event
because no suitable macro existed. Now that we have
INTEL_FLAGS_UEVENT_CONSTRAINT enforce zero flags for
INST_RETIRED.PREC_DIST too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a FLAGS_UEVENT_CONSTRAINT macro that allows us to
match on event+umask, and in additional all flags.
This is needed to ensure the INV and CMASK fields
are zero for specific events, as this can cause undefined
behavior.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1411569288-5627-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add scaling to MB/s to the memory controller read/write
events for Sandy/IvyBridge/Haswell-EP similar to how the client
does. This makes the events easier to use from the
standard perf tool.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were several reports that on some systems writing the SBOX0 PMU
initialization MSR would #GP at boot. This did not happen on all
systems -- my two test systems booted fine.
Writing the three initialization bits bit-by-bit seems to avoid the
problem. So add a special callback to do just that.
This replaces an earlier patch that disabled the SBOX.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-and-Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-4-git-send-email-andi@firstfloor.org
[ Fixed a whitespace error and added attribution tags that were left out inexplicably. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The counter register offsets for the IRP box PMU for Haswell-EP
were incorrect. The offsets actually changed over IvyBridge EP.
Fix them to the correct values. For this we need to fork the read
function from the IVB and use an own counter array.
Tested-by: patrick.lu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.
Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
New Fam15h models carry extra feature bits and extend
the MSR register space for IBS ops. Adding them here.
While at it, add functionality to read IbsBrTarget and
OpData4 depending on their availability if user wants a
PERF_SAMPLE_RAW.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <paulus@samba.org>
Cc: <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUX/DqAAoJEHm+PkMAQRiGLtQH/iAt3fRHlYDXjaJian/KG1Cb
wVP0I+HWZmvVmmd0PzyaxCZLgRNwdmmYHEH4QLy2JwZ3jZfFHlxhy+hDWCgz+67t
bIzkLs0Pf1T4kJ2+r8qW2kBEz9PWJHGTQw7NTqZ++Ts3rPptBA6Fg4mEJ6fQigXy
qRIY68DpipUkXV9BWBWijnTmrvP5tt7JtPzBr4DC8frMjvWct8+XwYhc2k2tEv2j
LwLYb1OW6PUpPv2BQBfWjqqH77vYNQVhJwuwGcDe2YZdI0UFkDheL24+RbbPcZ4f
OnrLjJSSgzv6lBWkAaXZK7/WJ/JZbXxEqHzWZQ3xXoQov97bm7lEYJqqi5gDasQ=
=6Qpa
-----END PGP SIGNATURE-----
Merge tag 'v3.18-rc4' into x86/cleanups, to refresh the tree before pulling new changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.
One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on
the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the
presense of various HWP features.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The problem fixed by 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") has been fixed in a simpler way by a previous commit
("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask
Bits").
The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505
are no longer needed, so revert the commit.
default_msi_mask_irq() and default_msix_mask_irq() were added by
0e4ccb1505 and are still used by s390, so keep them for now.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.
Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the static ftrace_ops (like function tracer) enables tracing, and it
is the only callback that is referencing a function, a trampoline is
dynamically allocated to the function that calls the callback directly
instead of calling a loop function that iterates over all the registered
ftrace ops (if more than one ops is registered).
But when it comes to dynamically allocated ftrace_ops, where they may be
freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe
to free the trampoline. If a task was preempted while executing on the
trampoline, there's currently no way to know when it will be off that
trampoline.
But this is not true when it comes to !CONFIG_PREEMPT. The current method
of calling schedule_on_each_cpu() will force tasks off the trampoline,
becaues they can not schedule while on it (kernel preemption is not
configured). That means it is safe to free a dynamically allocated
ftrace ops trampoline when CONFIG_PREEMPT is not configured.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Previous versions of the espfix had a single function which did setup
the pagetables. It was later split into BSP and AP version. Drop unused
leftovers after that split.
Signed-off-by: Borislav Petkov <bp@suse.de>
* access the dis_ucode_ldr chicken bit properly
* fix patch stashing on AMD on 32-bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUYNWUAAoJEBLB8Bhh3lVKU1sQAKIj1LVBtNAeaMaC9O8AUkUN
SWfskslf0uU2OS4RvV0QjDbr/chivIKMs7rbeMb521lHqWULRV/ZSR0kReB1JL45
yF7Dnz/YZX4VXx7O1lUSBhczN+Xp2jlPGuaeV1Q7iE0S1Focwxe8B24n6ye3dyto
o3dOH9tSna1U5KZqzHSaXWI4LJg3VrVNmf70IbYQFYyINHEtxI3oEtRWUlfFBA6C
+RbA3cUksBhYkNLfpkoA9o9ODbdSh5oSNkKFV8R26GCYw+pBQp27FhSECaEDEYIe
sdMTLgQd3ZWo5zh2zm3U12j8hf0hsfz4TjpDuozXmBlHRJSi/cLbFyEUOAbaCHpQ
Coaxgs8iiGcFVcZnMGmis9WGM41Q4O3UyxYVVpVEyMYLcrOxysKB0j1L2ycMGHV1
YHVL6Ex2MYxxqbK6NoC2ZK0OWWm1KNl4O2NAYsT4ICBxsDyxc9JzA6vidKM7VBU6
VYtOo21fYYbDgxogF6N/C95PA6nRxCm5coJ6X2QENg9DWSQHWkQ/q4Jp3yTrW4Dn
h/vY+Y5FkmVGoPBITg6BjtG9Sl3wrsqpIz2umWEeRmNCbcQm+KNQWSctvzzmOWDW
yYHyPQUgwxVX5qK5VVrTEvtDBn7E0gLEnwJLy4AdwkHf7YESxwbnYv+xXkiAubLH
dDlDNEEv1Fi3wzwc4/6g
=BamU
-----END PGP SIGNATURE-----
Merge tag 'microcode_fixes_for_3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent
Pull two fixes for early microcode loader on 32-bit from Borislav Petkov:
- access the dis_ucode_ldr chicken bit properly
- fix patch stashing on AMD on 32-bit
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Save the patch while we're running on the BSP instead of later, before
the initrd has been jettisoned. More importantly, on 32-bit we need to
access the physical address instead of the virtual.
This way we actually do find it on the APs instead of having to go
through the initrd each time.
Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 5335ba5cf4 ("x86, microcode, AMD: Fix early ucode loading")
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit 2ed53c0d6c ("x86/smpboot: Speed up suspend/resume by
avoiding 100ms sleep for CPU offline during S3") introduced
completions to CPU offlining process. These completions are not
initialized on Xen kernels causing a panic in
play_dead_common().
Move handling of die_complete into common routines to make them
available to Xen guests.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: tianyu.lan@intel.com
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414770572-7950-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The vsyscall emulation code sets orig_ax for seccomp's benefit,
but it forgot to set it back.
I'm not sure that this is observable at all, but it could cause
confusion to various /proc or ptrace users, and it's possible
that it could cause minor artifacts if a signal were to be
delivered on return from vsyscall emulation.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/cdc6a564517a4df09235572ee5f530ccdcf933f7.1415144089.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap
to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null
terminated buffer with newline.
This patch creates a new helper function cpumap_print_to_pagebuf in
cpumask.h using newly added bitmap_print_to_pagebuf and consolidates
most of those sysfs functions using the new helper function.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should be accessing it through a pointer, like on the BSP.
Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 65cef1311d ("x86, microcode: Add a disable chicken bit")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Borislav Petkov <bp@suse.de>
Observing that per-CPU data (in the SMP case) is reachable by
exploiting 64-bit address wraparound (building on the default kernel
load address being at 16Mb), the one byte shorter RIP-relative
addressing form can be used for most per-CPU accesses. The one
exception are the "stable" reads, where the use of the "P" operand
modifier prevents the compiler from using RIP-relative addressing, but
is unavoidable due to the use of the "p" constraint (side note: with
gcc 4.9.x the intended effect of this isn't being achieved anymore,
see gcc bug 63637).
With the dependency on the minimum kernel load address, arbitrarily
low values for CONFIG_PHYSICAL_START are now no longer possible. A
link time assertion is being added, directing to the need to increase
that value when it triggers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both this_cpu_off and cpu_info aren't getting modified post boot, yet
are being accessed on enough code paths that grouping them with other
frequently read items seems desirable. For cpu_info this at the same
time implies removing the cache line alignment (which afaict became
pointless when it got converted to per-CPU data years ago).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54589BD20200007800044A84@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Drop printing that serves no purpose, as it's printing fixed or known
values, and mark constant structure appropriately.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1415089784-28779-3-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The default self-IPI path polls the ICR to delay sending the IPI until
there is no IPI in progress. This is redundant on x86-86 APICs, since
IPIs are queued. See the AMD64 Architecture Programmer's Manual, vol 2,
p525.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1415089784-28779-2-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Prevent 16-bit APIC IDs being truncated by using correct mask. This fixes
booting large systems, where the wrong core would receive the startup and
init IPIs, causing hanging.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1415089784-28779-1-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds CONFIG_X86_VSYSCALL_EMULATION, guarded by CONFIG_EXPERT.
Turning it off completely disables vsyscall emulation, saving ~3.5k
for vsyscall_64.c, 4k for vsyscall_emu_64.S (the fake vsyscall
page), some tiny amount of core mm code that supports a gate area,
and possibly 4k for a wasted pagetable. The latter is because the
vsyscall addresses are misaligned and fit poorly in the fixmap.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/406db88b8dd5f0cbbf38216d11be34bbb43c7eae.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
vsyscall_64.c is just vsyscall emulation. Tidy it up accordingly.
[ tglx: Preserved the original copyright notices ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/9c448d5643d0fdb618f8cde9a54c21d2bcd486ce.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I see no point in having an unusable read-only page sitting at
0xffffffffff600000 when vsyscall=none. Instead, skip mapping it and
remove it from /proc/PID/maps.
I kept the ratelimited warning when programs try to use a vsyscall
in this mode, since it may help admins avoid confusion.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/0dddbadc1d4e3bfbaf887938ff42afc97a7cc1f2.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Konrad triggered the following splat below in a 32-bit guest on an AMD
box. As it turns out, in save_microcode_in_initrd_amd() we're using the
*physical* address of the container *after* we have enabled paging and
thus we #PF in load_microcode_amd() when trying to access the microcode
container in the ramdisk range.
Because the ramdisk is exactly there:
[ 0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff]
and we fault at 0x35e04304.
And since this guest doesn't relocate the ramdisk, we don't do the
computation which will give us the correct virtual address and we end up
with the PA.
So, we should actually be using virtual addresses on 32-bit too by the
time we're freeing the initrd. Do that then!
Unpacking initramfs...
BUG: unable to handle kernel paging request at 35d4e304
IP: [<c042e905>] load_microcode_amd+0x25/0x4a0
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1
Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014
task: f5098000 ti: f50d0000 task.ti: f50d0000
EIP: 0060:[<c042e905>] EFLAGS: 00010246 CPU: 0
EIP is at load_microcode_amd+0x25/0x4a0
EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000
ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0
Stack:
00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f
f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8
c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04
Call Trace:
? unpack_to_rootfs
? unpack_to_rootfs
save_microcode_in_initrd_amd
save_microcode_in_initrd
free_initrd_mem
populate_rootfs
? unpack_to_rootfs
do_one_initcall
? unpack_to_rootfs
? repair_env_string
? proc_mkdir
kernel_init_freeable
kernel_init
ret_from_kernel_thread
? rest_init
Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=1158204
Fixes: 75a1ba5b2c ("x86, microcode, AMD: Unify valid container checks")
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v3.14+
Link: http://lkml.kernel.org/r/20141101100100.GA4462@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are some AMD CPU models which have thresholding banks but which
cannot generate a thresholding interrupt. This is denoted by the bit
MCi_MISC[IntP]. Make sure to check that bit before assigning the
thresholding interrupt handler.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
[ Boris: save an indentation level and rewrite commit message. ]
Link: http://lkml.kernel.org/r/1412662128.28440.18.camel@debian
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull x86 fixes from Ingo Molnar:
"Fixes from all around the place:
- hyper-V 32-bit PAE guest kernel fix
- two IRQ allocation fixes on certain x86 boards
- intel-mid boot crash fix
- intel-quark quirk
- /proc/interrupts duplicate irq chip name fix
- cma boot crash fix
- syscall audit fix
- boot crash fix with certain TSC configurations (seen on Qemu)
- smpboot.c build warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
x86, intel-mid: Create IRQs for APB timers and RTC timers
x86: Don't enable F00F workaround on Intel Quark processors
x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
x86, cma: Reserve DMA contiguous area after initmem_init()
i386/audit: stop scribbling on the stack frame
x86, apic: Handle a bad TSC more gracefully
x86: ACPI: Do not translate GSI number if IOAPIC is disabled
x86/smpboot: Move data structure to its primary usage scope
The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
ftrace function hooks. Add to the output what function is being called
by the trampoline if the arch supports it.
Add support for this feature in x86_64.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
These patches:
86a349a28b ("perf/x86/intel: Add Broadwell core support")
c46e665f03 ("perf/x86: Add INST_RETIRED.ALL workarounds")
fdda3c4aac ("perf/x86/intel: Use Broadwell cache event list for Haswell")
introduced magic constants and unexplained changes:
https://lkml.org/lkml/2014/10/28/1128https://lkml.org/lkml/2014/10/27/325https://lkml.org/lkml/2014/8/27/546https://lkml.org/lkml/2014/10/28/546
Peter Zijlstra has attempted to help out, to clean up the mess:
https://lkml.org/lkml/2014/10/28/543
But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.
Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.
That kind of behavior is not acceptable.
There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:
e735b9db12 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
https://lkml.org/lkml/2014/10/22/730
Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.
As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.
( If the Haswell-EP crash is not resolved via a simple fix then we'll have
to revert the Haswell-EP uncore driver as well. )
The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Function mp_register_gsi() returns blindly the GSI number for the ACPI
SCI interrupt. That causes a regression when the GSI for ACPI SCI is
shared with other devices.
The regression was caused by commit 84245af729 "x86, irq, ACPI:
Change __acpi_register_gsi to return IRQ number instead of GSI" and
exposed on a SuperMicro system, which shares one GSI between ACPI SCI
and PCI device, with following failure:
http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low
level)
[ 2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt
20
Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number.
Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel MID platforms has no legacy interrupts, so no IRQ descriptors
preallocated. We need to call mp_map_gsi_to_irq() to create IRQ
descriptors for APB timers and RTC timers, otherwise it may cause
invalid memory access as:
[ 0.116839] BUG: unable to handle kernel NULL pointer dereference at
0000003a
[ 0.123803] IP: [<c1071c0e>] setup_irq+0xf/0x4d
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Intel Quark processor is a part of family 5, but does not have the
F00F bug present in Pentiums of the same family.
Pentiums were models 0 through 8, Quark is model 9.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems
that make use of 8259A Programmable Interrupt Controllers.
Specifically convert output like this:
CPU0
0: 76573 XT-PIC-XT-PIC timer
1: 11 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
4: 8 XT-PIC-XT-PIC serial
6: 3 XT-PIC-XT-PIC floppy
7: 0 XT-PIC-XT-PIC parport0
8: 1 XT-PIC-XT-PIC rtc0
10: 448 XT-PIC-XT-PIC fddi0
12: 23 XT-PIC-XT-PIC eth0
14: 2464 XT-PIC-XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
to one like this:
CPU0
0: 122033 XT-PIC timer
1: 11 XT-PIC i8042
2: 0 XT-PIC cascade
4: 8 XT-PIC serial
6: 3 XT-PIC floppy
7: 0 XT-PIC parport0
8: 1 XT-PIC rtc0
10: 145 XT-PIC fddi0
12: 31 XT-PIC eth0
14: 2245 XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
that is one like we used to have from ~2.2 till it was changed
sometime.
The rationale is there is no value in this duplicate
information, it merely clutters output and looks ugly. We only
have one handler for 8259A interrupts so there is no need to
give it a name separate from the name already given to
irq_chip.
We could define meaningful names for handlers based on bits in
the ELCR register on systems that have it or the value of the
LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with
MCA support gone), to tell edge-triggered and level-triggered
inputs apart. While that information does not affect 8259A
interrupt handlers it could help people determine which lines
are shareable and which are not. That is material for a
separate change though.
Any tools that parse /proc/interrupts are supposed not to be
affected since it was many years we used the format this change
converts back to.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Drop double entry for pt_regs_bx.
This seems to be a typo - resulting in a double entry in the
generated include/generated/asm-offsets.h, which is not necessary.
Build-tested and booted on x86 64 box to make sure it was not
doing any strange magic.... after all it was in the kernel in
this form for almost 10 years.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141027172805.GA19760@opentech.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy spotted the fail in what was intended as a conditional printk level.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Fixes: cc6cd47e73 ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fengguang Wu reported a boot crash on the x86 platform
via the 0-day Linux Kernel Performance Test:
cma: dma_contiguous_reserve: reserving 31 MiB for global area
BUG: Int 6: CR2 (null)
[<41850786>] dump_stack+0x16/0x18
[<41d2b1db>] early_idt_handler+0x6b/0x6b
[<41072227>] ? __phys_addr+0x2e/0xca
[<41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7
[<41d6d359>] dma_contiguous_reserve_area+0x27/0x47
[<41d6d4d1>] dma_contiguous_reserve+0x158/0x163
[<41d33e0f>] setup_arch+0x79b/0xc68
[<41d2b7cf>] start_kernel+0x9c/0x456
[<41d2b2ca>] i386_start_kernel+0x79/0x7d
(See details at: https://lkml.org/lkml/2014/10/8/708)
It is because dma_contiguous_reserve() is called before
initmem_init() in x86, the variable high_memory is not
initialized but accessed by __pa(high_memory) in
dma_contiguous_reserve().
This patch moves dma_contiguous_reserve() after initmem_init()
so that high_memory is initialized before accessed.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: iamjoonsoo.kim@lge.com
Cc: 'Linux-MM' <linux-mm@kvack.org>
Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com>
Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
git commit b4f0d3755c was very very dumb.
It was writing over %esp/pt_regs semi-randomly on i686 with the expected
"system can't boot" results. As noted in:
https://bugs.freedesktop.org/show_bug.cgi?id=85277
This patch stops fscking with pt_regs. Instead it sets up the registers
for the call to __audit_syscall_entry in the most obvious conceivable
way. It then does just a tiny tiny touch of magic. We need to get what
started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp). This is as easy
as a pair of pushes.
After the call to __audit_syscall_entry all we need to do is get that
now useless junk off the stack (pair of pops) and reload %eax with the
original syscall so other stuff can keep going about it's business.
Reported-by: Paulo Zanoni <przanoni@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-eparis@redhat.com
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJURGCfAAoJEHm+PkMAQRiG6toH/RUazjqZxqMvLlm1y+O6+7s9
OpFdcDl4ZQtrvymBRYipu46pbDUoAAsVbxQJllaLNtHE0UrvaQE76WihBQYM8qW/
WoESLsZRbNQqQYQixf55pOozX7uIuG+9LKHagC8JNfD1Bw/nQ+RleSXqFsBCdpMW
i7SzcZBu2Iv+LnVmjvoGMOQa+loKzO6Pj1MpoHxxJQmeypH3dZR7mLVeBJNZQtLE
BGY47gYraVzb9EjKnSbjrIKjpM9o0MIihoanrrjnq0JMrfm4pi6W5GgaGDUiaBVH
w7Vmr5S2pjzrS41gKSVK9/XO1CrDG8tsp3QwA2+iIbjdR3wBDynyeG3UfnLABec=
=hwbG
-----END PGP SIGNATURE-----
Merge tag 'v3.18-rc1' into x86/urgent
Reason:
Need to apply audit patch on top of v3.18-rc1.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
If the TSC is unusable or disabled, then this patch fixes:
- Confusion while trying to clear old APIC interrupts.
- Division by zero and incorrect programming of the TSC deadline
timer.
This fixes boot if the CPU has a TSC deadline timer but a missing or
broken TSC. The failure to boot can be observed with qemu using
-cpu qemu64,-tsc,+tsc-deadline
This also happens to me in nested KVM for unknown reasons.
With this patch, I can boot cleanly (although without a TSC).
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Aravind had the good question about why we're assigning a
software-defined bank when reporting error thresholding errors instead
of simply using the bank which reports the last error causing the
overflow.
Digging through git history, it pointed to
9526866439 ("[PATCH] x86_64: mce_amd support for family 0x10 processors")
which added that functionality. The problem with this, however, is that
tools don't know about software-defined banks and get puzzled. So drop
that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the
thresholding interrupt.
Save us a couple of MSR reads while at it.
Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: https://lkml.kernel.org/r/5435B206.60402@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Assigning to mce_threshold_vector is loop-invariant code in
mce_amd_feature_init(). So do it only once, out of loop body.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Link: http://lkml.kernel.org/r/1412263212.8085.6.camel@debian
[ Boris: commit message corrections. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
to avoid that mce_log() logs the processed threshold event again
at next time.
But we do the logging ourselves and machine_check_poll() is completely
useless there. So kill it.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Avoid open coded calculations for bank MSRs by hiding the index
of higher bank MSRs in well-defined macros.
No semantic changes.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Link: http://lkml.kernel.org/r/1411438561-24319-1-git-send-email-slaoub@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
When IOAPIC is disabled, acpi_gsi_to_irq() should return gsi directly
instead of calling mp_map_gsi_to_irq() to translate gsi to IRQ by IOAPIC.
It fixes https://bugzilla.kernel.org/show_bug.cgi?id=84381.
This regression was introduced with commit 6b9fb70824 "x86, ACPI,
irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number"
Reported-and-Tested-by: Thomas Richter <thor@math.tu-berlin.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Richter <thor@math.tu-berlin.de>
Cc: rui.zhang@intel.com
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1413816327-12850-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull audit updates from Eric Paris:
"So this change across a whole bunch of arches really solves one basic
problem. We want to audit when seccomp is killing a process. seccomp
hooks in before the audit syscall entry code. audit_syscall_entry
took as an argument the arch of the given syscall. Since the arch is
part of what makes a syscall number meaningful it's an important part
of the record, but it isn't available when seccomp shoots the
syscall...
For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold: Implement syscall_get_arch() everywhere
there is audit which didn't have it. Use syscall_get_arch() in the
seccomp audit code. Having syscall_get_arch() everywhere meant it was
a useless flag on the stack and we could get rid of it for the typical
syscall entry.
The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces. Better locking around the task comm
field. Removing some dead functions and structs. Make some things
static. Really minor stuff"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: rename audit_log_remove_rule to disambiguate for trees
audit: cull redundancy in audit_rule_change
audit: WARN if audit_rule_change called illegally
audit: put rule existence check in canonical order
next: openrisc: Fix build
audit: get comm using lock to avoid race in string printing
audit: remove open_arg() function that is never used
audit: correct AUDIT_GET_FEATURE return message type
audit: set nlmsg_len for multicast messages.
audit: use union for audit_field values since they are mutually exclusive
audit: invalid op= values for rules
audit: use atomic_t to simplify audit_serial()
kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
audit: reduce scope of audit_log_fcaps
audit: reduce scope of audit_net_id
audit: arm64: Remove the audit arch argument to audit_syscall_entry
arm64: audit: Add audit hook in syscall_trace_enter/exit()
audit: x86: drop arch from __audit_syscall_entry() interface
sparc: implement is_32bit_task
sparc: properly conditionalize use of TIF_32BIT
...
Makes the code more readable by moving variable and usage closer
to each other, which also avoids this build warning in the
!CONFIG_HOTPLUG_CPU case:
arch/x86/kernel/smpboot.c:105:42: warning: ‘die_complete’ defined but not used [-Wunused-variable]
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: srostedt@redhat.com
Cc: toshi.kani@hp.com
Cc: imammedo@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
Merge second patch-bomb from Andrew Morton:
- a few hotfixes
- drivers/dma updates
- MAINTAINERS updates
- Quite a lot of lib/ updates
- checkpatch updates
- binfmt updates
- autofs4
- drivers/rtc/
- various small tweaks to less used filesystems
- ipc/ updates
- kernel/watchdog.c changes
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>
ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
frv: remove unused declarations of __start___ex_table and __stop___ex_table
kvm: ensure hard lockup detection is disabled by default
kernel/watchdog.c: control hard lockup detection default
staging: rtl8192u: use %*pEn to escape buffer
staging: rtl8192e: use %*pEn to escape buffer
staging: wlan-ng: use %*pEhp to print SN
lib80211: remove unused print_ssid()
wireless: hostap: proc: print properly escaped SSID
wireless: ipw2x00: print SSID via %*pE
wireless: libertas: print esaped string via %*pE
lib/vsprintf: add %*pE[achnops] format specifier
lib / string_helpers: introduce string_escape_mem()
lib / string_helpers: refactoring the test suite
lib / string_helpers: move documentation to c-file
include/linux: remove strict_strto* definitions
arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
fs: check bh blocknr earlier when searching lru
...
Pull x86 ras, uv and vdso fixlets from Ingo Molnar:
"ras: tone down a kernel message to only occur during initial bootup,
not during suspend/resume cycles.
uv: a cleanup commit
vdso: a fix to error checking"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid showing repetitive message from intel_init_thermal()
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/uv: Remove unnecessary #ifdef
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Fix vdso2c's special_pages[] error checking
Pull x86 fixes from Ingo Molnar:
"Misc smaller fixes that missed the v3.17 cycle"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Add arch/x86/purgatory/ make generated files to gitignore
x86: Fix section conflict for numachip
x86: Reject x32 executables if x32 ABI not supported
x86_64, entry: Filter RFLAGS.NT on entry from userspace
x86, boot, kaslr: Fix nuisance warning on 32-bit builds
Pull x86 seccomp changes from Ingo Molnar:
"This tree includes x86 seccomp filter speedups and related preparatory
work, which touches core seccomp facilities as well.
The main idea is to split seccomp into two phases, to be able to enter
a simple fast path for syscalls with ptrace side effects.
There's no substantial user-visible (and ABI) effects expected from
this, except a change in how we emit a better audit record for
SECCOMP_RET_TRACE events"
* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
x86: Split syscall_trace_enter into two phases
x86, entry: Only call user_exit if TIF_NOHZ
x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
seccomp: Document two-phase seccomp and arch-provided seccomp_data
seccomp: Allow arch code to provide seccomp_data
seccomp: Refactor the filter callback and the API
seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
Pull x86 platform updates from Ingo Molnar:
"The main changes in this tree are:
- fix and update Intel Quark [Galileo] SoC platform support
- update IOSF chipset side band interface and make it available via
debugfs
- enable HPETs on Soekris net6501 and other e6xx based systems"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
x86: Quark: Comment setup_arch() to document TLB/PGE bug
x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
x86/platform/intel/iosf: Add debugfs config option for IOSF
x86/platform/intel/iosf: Add better description of IOSF driver in config
x86/platform/intel/iosf: Add Braswell PCI ID
x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n
x86: HPET force enable for e6xx based systems
x86/iosf: Add debugfs support
x86/iosf: Add Kconfig prompt for IOSF_MBI selection
Pull x86 mm updates from Ingo Molnar:
"This tree includes the following changes:
- fix memory hotplug
- fix hibernation bootup memory layout assumptions
- fix hyperv numa guest kernel messages
- remove dead code
- update documentation"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Update memory map description to list hypervisor-reserved area
x86/mm, hibernate: Do not assume the first e820 area to be RAM
x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
x86/mm/hotplug: Modify PGD entry when removing memory
x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()
x86: Remove set_pmd_pfn
Pull x86 FPU updates from Ingo Molnar:
"x86 FPU handling fixes, cleanups and enhancements from Oleg.
The signal handling race fix and the __restore_xstate_sig() preemption
fix for eager-mode is marked for -stable as well"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: copy_thread: Don't nullify ->ptrace_bps twice
x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
Pull x86 cpufeature updates from Ingo Molnar:
"This tree includes the following changes:
- Introduce DISABLED_MASK to list disabled CPU features, to simplify
CPU feature handling and avoid excessive #ifdefs
- Remove the lightly used cpu_has_pae() primitive"
* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add more disabled features
x86: Introduce disabled-features
x86: Axe the lightly-used cpu_has_pae
Use watchdog_enable_hardlockup_detector() to set hard lockup detection's
default value to false. It's risky to run this detection in a guest, as
false positives are easy to trigger, especially if the host is
overcommitted.
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86_64 allnoconfig:
arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells brought to my attention the mails generated by kbuild test
bot and following sparse warnings were present. This patch fixes these
warnings.
arch/x86/kernel/kexec-bzimage64.c:270:5: warning: symbol 'bzImage64_probe' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:328:6: warning: symbol 'bzImage64_load' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:517:5: warning: symbol 'bzImage64_cleanup' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:531:5: warning: symbol 'bzImage64_verify_sig' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:546:23: warning: symbol 'kexec_bzImage64_ops' was not declared. Should it be static?
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>