WARNING: arch/x86/built-in.o(.text+0x1bb74): Section mismatch in reference from the function kvm_guest_cpu_online() to the function .cpuinit.text:kvm_guest_cpu_init()
The function kvm_guest_cpu_online() references
the function __cpuinit kvm_guest_cpu_init().
This is often because kvm_guest_cpu_online lacks a __cpuinit
annotation or the annotation of kvm_guest_cpu_init is wrong.
This patch fixes the warning.
Tested with linux-next (next-20101231)
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix binutils-2.21 symbol related build failures
x86-64, trampoline: Remove unused variable
x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot
x86, reboot: Move the real-mode reboot code to an assembly file
x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly
x86, trampoline: Use the unified trampoline setup for ACPI wakeup
x86, trampoline: Common infrastructure for low memory trampolines
Fix up trivial conflicts in arch/x86/kernel/Makefile
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (21 commits)
PM / Hibernate: Reduce autotuned default image size
PM / Core: Introduce struct syscore_ops for core subsystems PM
PM QoS: Make pm_qos settings readable
PM / OPP: opp_find_freq_exact() documentation fix
PM: Documentation/power/states.txt: fix repetition
PM: Make system-wide PM and runtime PM treat subsystems consistently
PM: Simplify kernel/power/Kconfig
PM: Add support for device power domains
PM: Drop pm_flags that is not necessary
PM: Allow pm_runtime_suspend() to succeed during system suspend
PM: Clean up PM_TRACE dependencies and drop unnecessary Kconfig option
PM: Remove CONFIG_PM_OPS
PM: Reorder power management Kconfig options
PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)
PM / ACPI: Remove references to pm_flags from bus.c
PM: Do not create wakeup sysfs files for devices that cannot wake up
USB / Hub: Do not call device_set_wakeup_capable() under spinlock
PM: Use appropriate printk() priority level in trace.c
PM / Wakeup: Don't update events_check_enabled in pm_get_wakeup_count()
PM / Wakeup: Make pm_save_wakeup_count() work as documented
...
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support
percpu: Generic support for this_cpu_cmpxchg_double()
alpha: use L1_CACHE_BYTES for cacheline size in the linker script
percpu: align percpu readmostly subsection to cacheline
Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the
percpu alignment having changed ("x86: Reduce back the alignment of the
per-CPU data section")
Support for Always Running APIC timer (ARAT) was introduced in
commit db954b5898. This feature
allows us to avoid switching timers from LAPIC to something else
(e.g. HPET) and go into timer broadcasts when entering deep
C-states.
AMD processors don't provide a CPUID bit for that feature but
they also keep APIC timers running in deep C-states (except for
cases when the processor is affected by erratum 400). Therefore
we should set ARAT feature bit on AMD CPUs.
Tested-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
LKML-Reference: <1300205624-4813-1-git-send-email-ostr@amd64.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 7f74f8f28a
(x86 quirk: Fix polarity for IRQ0 pin2 override on SB800
systems) introduced a regression. It removed some SB600 specific
code to determine the revision ID without adapting a
corresponding revision ID check for SB600.
See this mail thread:
http://marc.info/?l=linux-kernel&m=129980296006380&w=2
This patch adapts the corresponding check to cover all SB600
revisions.
Tested-by: Wang Lei <f3d27b@gmail.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org # 38.x, 37.x, 32.x
LKML-Reference: <20110315143137.GD29499@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
x86: Clean up apic.c and apic.h
x86: Remove superflous goal definition of tsc_sync
x86: dt: Correct local apic documentation in device tree bindings
x86: dt: Cleanup local apic setup
x86: dt: Fix OLPC=y/INTEL_CE=n build
rtc: cmos: Add OF bindings
x86: ce4100: Use OF to setup devices
x86: ioapic: Add OF bindings for IO_APIC
x86: dtb: Add generic bus probe
x86: dtb: Add support for PCI devices backed by dtb nodes
x86: dtb: Add device tree support for HPET
x86: dtb: Add early parsing of IO_APIC
x86: dtb: Add irq domain abstraction
x86: dtb: Add a device tree for CE4100
x86: Add device tree support
x86: e820: Remove conditional early mapping in parse_e820_ext
x86: OLPC: Make OLPC=n build again
x86: OLPC: Remove extra OLPC_OPENFIRMWARE_DT indirection
x86: OLPC: Cleanup config maze completely
x86: OLPC: Hide OLPC_OPENFIRMWARE config switch
...
Fix up conflicts in arch/x86/platform/ce4100/ce4100.c
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (93 commits)
x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others()
x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation
x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation()
x86-64, NUMA: Clean up initmem_init()
x86-64, NUMA: Fix numa_emulation code with node0 without RAM
x86-64, NUMA: Revert NUMA affine page table allocation
x86: Work around old gas bug
x86-64, NUMA: Better explain numa_distance handling
x86-64, NUMA: Fix distance table handling
mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCK
x86-64, NUMA: Fix size of numa_distance array
x86: Rename e820_table_* to pgt_buf_*
bootmem: Move __alloc_memory_core_early() to nobootmem.c
bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c
bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c
x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()
x86-64, NUMA: Add proper function comments to global functions
x86-64, NUMA: Move NUMA emulation into numa_emulation.c
x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file
x86-64, NUMA: Do not scan two times for setup_node_bootmem()
...
Fix up conflicts in arch/x86/kernel/smpboot.c
* 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, mem: Convert memmove() to assembly file and fix return value bug
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
x86: Enable forced interrupt threading support
x86: Mark low level interrupts IRQF_NO_THREAD
x86: Use generic show_interrupts
x86: ioapic: Avoid redundant lookup of irq_cfg
x86: ioapic: Use new move_irq functions
x86: Use the proper accessors in fixup_irqs()
x86: ioapic: Use irq_data->state
x86: ioapic: Simplify irq chip and handler setup
x86: Cleanup the genirq name space
genirq: Add chip flag to force mask on suspend
genirq: Add desc->irq_data accessor
genirq: Add comments to Kconfig switches
genirq: Fixup fasteoi handler for oneshot mode
genirq: Provide forced interrupt threading
sched: Switch wait_task_inactive to schedule_hrtimeout()
genirq: Add IRQF_NO_THREAD
genirq: Allow shared oneshot interrupts
genirq: Prepare the handling of shared oneshot interrupts
genirq: Make warning in handle_percpu_event useful
x86: ioapic: Move trigger defines to io_apic.h
...
Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
space changes clashing with the Xen cleanups. The set_irq_msi() had
moved to xen_bind_pirq_msi_to_irq().
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Combine printk()s in show_regs_common()
x86: Don't call dump_stack() from arch_trigger_all_cpu_backtrace_handler()
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix and clean up generic_processor_info()
x86: Don't copy per_cpu cpuinfo for BSP two times
x86: Move llc_shared_map out of cpu_info
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, binutils, xen: Fix another wrong size directive
x86: Remove dead config option X86_CPU
x86: Really print supported CPUs if PROCESSOR_SELECT=y
x86: Fix a bogus unwind annotation in lib/semaphore_32.S
um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S
x86: Remove unused bits from lib/thunk_*.S
x86: Use {push,pop}_cfi in more places
x86-64: Add CFI annotations to lib/rwsem_64.S
x86, asm: Cleanup unnecssary macros in asm-offsets.c
x86, system.h: Drop unused __SAVE/__RESTORE macros
x86: Use bitmap library functions
x86: Partly unify asm-offsets_{32,64}.c
x86: Reduce back the alignment of the per-CPU data section
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
posix-clocks: Check write permissions in posix syscalls
hrtimer: Remove empty hrtimer_init_hres_timer()
hrtimer: Update hrtimer->state documentation
hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
timers: Export CLOCK_BOOTTIME via the posix timers interface
timers: Add CLOCK_BOOTTIME hrtimer base
time: Extend get_xtime_and_monotonic_offset() to also return sleep
time: Introduce get_monotonic_boottime and ktime_get_boottime
hrtimers: extend hrtimer base code to handle more then 2 clockids
ntp: Remove redundant and incorrect parameter check
mn10300: Switch do_timer() to xtimer_update()
posix clocks: Introduce dynamic clocks
posix-timers: Cleanup namespace
posix-timers: Add support for fd based clocks
x86: Add clock_adjtime for x86
posix-timers: Introduce a syscall for clock tuning.
time: Splitout compat timex accessors
ntp: Add ADJ_SETOFFSET mode bit
time: Introduce timekeeping_inject_offset
posix-timer: Update comment
...
Fix up new system-call-related conflicts in
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/syscall_table_32.S
(name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
due to movement of get_jiffies_64() in:
kernel/time.c
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
perf probe: Clean up probe_point_lazy_walker() return value
tracing: Fix irqoff selftest expanding max buffer
tracing: Align 4 byte ints together in struct tracer
tracing: Export trace_set_clr_event()
tracing: Explain about unstable clock on resume with ring buffer warning
ftrace/graph: Trace function entry before updating index
ftrace: Add .ref.text as one of the safe areas to trace
tracing: Adjust conditional expression latency formatting.
tracing: Fix event alignment: skb:kfree_skb
tracing: Fix event alignment: mce:mce_record
tracing: Fix event alignment: kvm:kvm_hv_hypercall
tracing: Fix event alignment: module:module_request
tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
tracing: Remove lock_depth from event entry
perf header: Stop using 'self'
perf session: Use evlist/evsel for managing perf.data attributes
perf top: Don't let events to eat up whole header line
perf top: Fix events overflow in top command
ring-buffer: Remove unused #include <linux/trace_irq.h>
tracing: Add an 'overwrite' trace_option.
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
tidy the trailing symlinks traversal up
Turn resolution of trailing symlinks iterative everywhere
simplify link_path_walk() tail
Make trailing symlink resolution in path_lookupat() iterative
update nd->inode in __do_follow_link() instead of after do_follow_link()
pull handling of one pathname component into a helper
fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
readlinkat(), fchownat() and fstatat() with empty relative pathnames
Allow O_PATH for symlinks
New kind of open files - "location only".
ext4: Copy fs UUID to superblock
ext3: Copy fs UUID to superblock.
vfs: Export file system uuid via /proc/<pid>/mountinfo
unistd.h: Add new syscalls numbers to asm-generic
x86: Add new syscalls for x86_64
x86: Add new syscalls for x86_32
fs: Remove i_nlink check from file system link callback
fs: Don't allow to create hardlink for deleted file
vfs: Add open by file handle support
...
Intel Archiecture Software Developer's Manual section 7.1.3 specifies that a
core serializing instruction such as "cpuid" should be executed on _each_ core
before the new instruction is made visible.
Failure to do so can lead to unspecified behavior (Intel XMC erratas include
General Protection Fault in the list), so we should avoid this at all cost.
This problem can affect modified code executed by interrupt handlers after
interrupt are re-enabled at the end of stop_machine, because no core serializing
instruction is executed between the code modification and the moment interrupts
are reenabled.
Because stop_machine_text_poke performs the text modification from the first CPU
decrementing stop_machine_first, modified code executed in thread context is
also affected by this problem. To explain why, we have to split the CPUs in two
categories: the CPU that initiates the text modification (calls text_poke_smp)
and all the others. The scheduler, executed on all other CPUs after
stop_machine, issues an "iret" core serializing instruction, and therefore
handles core serialization for all these CPUs. However, the text modification
initiator can continue its execution on the same thread and access the modified
text without any scheduler call. Given that the CPU that initiates the code
modification is not guaranteed to be the one actually performing the code
modification, it falls into the XMC errata.
Q: Isn't this executed from an IPI handler, which will return with IRET (a
serializing instruction) anyway?
A: No, now stop_machine uses per-cpu workqueue, so that handler will be
executed from worker threads. There is no iret anymore.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20110303160137.GB1590@Krystal>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: <stable@kernel.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch adds new syscalls to x86_32
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The variable pm_flags is used to prevent APM from being enabled
along with ACPI, which would lead to problems. However, acpi_init()
is always called before apm_init() and after acpi_init() has
returned, it is known whether or not ACPI will be used. Namely, if
acpi_disabled is not set after acpi_init() has returned, this means
that ACPI is enabled. Thus, it is sufficient to check acpi_disabled
in apm_init() to prevent APM from being enabled in parallel with
ACPI.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Len Brown <len.brown@intel.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: ce4100: Set pci ops via callback instead of module init
x86/mm: Fix pgd_lock deadlock
x86/mm: Handle mm_fault_error() in kernel space
x86: Don't check for BIOS corruption in first 64K when there's no need to
The caller of ioapic_register_intr() has a pointer to the irq_cfg for
the irq already. Hand it in to avoid a full lookup.
In msi_compose_msg() the pointer to irq_cfg is already available. No
need to look it up again.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the functions which take irq_data. We already have a pointer to
irq_data. That avoids a sparse irq lookup in move_*_irq.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the state information in irq_data. That avoids a radix-tree lookup
from apic_ack_level() and simplifies setup_ioapic_dest().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
genirq is switching to a consistent name space for the irq related
functions. Convert x86. Conversion was done with coccinelle.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The latest binutils (2.21.0.20110302/Ubuntu) breaks the build
yet another time, under CONFIG_XEN=y due to a .size directive that
refers to a slightly differently named (hence, to the now very
strict and unforgiving assembler, non-existent) symbol.
[ mingo:
This unnecessary build breakage caused by new binutils
version 2.21 gets escallated back several kernel releases spanning
several years of Linux history, affecting over 130,000 upstream
kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially
affecting all major Linux distro kernel configs).
Git annotate tells us that this slight debug symbol code mismatch
bug has been introduced in 2008 in commit 3d75e1b8:
3d75e1b8 (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs)
The 'bug' is just a slight assymetry in ENTRY()/END()
debug-symbols sequences, with lots of assembly code between the
ENTRY() and the END():
ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs)
...
END(do_hypervisor_callback)
Human reviewers almost never catch such small mismatches, and binutils
never even warned about it either.
This new binutils version thus breaks the Xen build on all upstream kernels
since v2.6.27, out of the blue.
This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels
impossible on such binutils, for a bisection window of over hundred
thousand historic commits. (!)
This is a major fail on the side of binutils and binutils needs to turn
this show-stopper build failure into a warning ASAP. ]
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch moves some functions and variables into init
sections, makes a function static and removes some lines of
cruft.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <1299826956-8607-2-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The extra tsc_sync.o goal definition is superflous.
CONFIG_X86_64_SMP depends on CONFIG_SMP
and tsc_sync.o is already in the definition of CONFIG_SMP.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <1299826956-8607-1-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the index to the ret_stack is updated and the real return address
is saved in the ret_stack. Then we call the trace function. The trace
function could decide that it doesn't want to trace this function
(ex. set_graph_function does not match) and it will return 0 which means
not to trace this call.
The normal function graph tracer has this code:
if (!(trace->depth || ftrace_graph_addr(trace->func)) ||
ftrace_graph_ignore_irqs())
return 0;
What this states is, if the trace depth (which is curr_ret_stack)
is zero (top of nested functions) then test if we want to trace this
function. If this function is not to be traced, then return 0 and
the rest of the function graph tracer logic will not trace this function.
The problem arises when an interrupt comes in after we updated the
curr_ret_stack. The next function that gets called will have a trace->depth
of 1. Which fools this trace code into thinking that we are in a nested
function, and that we should trace. This causes interrupts to be traced
when they should not be.
The solution is to trace the function first and then update the ret_stack.
Reported-by: zhiping zhong <xzhong86@163.com>
Reported-by: wu zhangjin <wuzhangjin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Return 0 on failure. This will cause the initialization of the driver
to fail and prevent the driver from loading if the BIOS cannot handle
the PCC interface command to "get frequency". Otherwise, the driver
will load and display a very high value like "4294967274" (which is
actually -EINVAL) for frequency:
# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
4294967274
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
CC: stable@kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
Due to commit 781c5a67f1 it is
likely that the number of areas to scan for BIOS corruption is 0
-- especially when the first 64K is already reserved
(X86_RESERVE_LOW is 64K by default).
If that's the case then don't set up the scan.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: <stable@kernel.org>
LKML-Reference: <20110225202838.2229.71011.sendpatchset@nchumbalkar.americas.hpqcorp.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
New binutils version 2.21.0.20110302-1 started checking that the symbol
parameter to the .size directive matches the entry name's
symbol parameter, unearthing two mismatches:
AS arch/x86/kernel/acpi/wakeup_rm.o
arch/x86/kernel/acpi/wakeup_rm.S: Assembler messages:
arch/x86/kernel/acpi/wakeup_rm.S:12: Error: .size expression with symbol `wakeup_code_start' does not evaluate to a constant
arch/x86/kernel/entry_32.S: Assembler messages:
arch/x86/kernel/entry_32.S:1421: Error: .size expression with
symbol `apf_page_fault' does not evaluate to a constant
The problem was discovered while using Debian's binutils
(2.21.0.20110302-1) and experimenting with binutils from
upstream.
Thanks Alexander and H.J. for the vital help.
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1299620364-21644-1-git-send-email-sedat.dilek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
You can crash the kernel (with root/admin privileges) using kprobe tracer by running:
echo "p system_call_after_swapgs" > ./kprobe_events
echo 1 > ./events/kprobes/enable
The reason is that at the system_call_after_swapgs label, the
kernel stack is not set up. If optimized kprobes are enabled,
the user space stack is being used in this case (see optimized
kprobe template) and this might result in a crash.
There are several places like this over the entry code
(entry_$BIT). As it seems there's no any reasonable/maintainable
way to disable only those places where the stack is not ready, I
switched off the whole entry code from kprobe optimizing.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: acme@redhat.com
Cc: fweisbec@gmail.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: a.p.zijlstra@chello.nl
Cc: eric.dumazet@gmail.com
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <1298298313-5980-3-git-send-email-jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Put x86 entry code into a separate link section: .entry.text.
Separating the entry text section seems to have performance
benefits - caused by more efficient instruction cache usage.
Running hackbench with perf stat --repeat showed that the change
compresses the icache footprint. The icache load miss rate went
down by about 15%:
before patch:
19417627 L1-icache-load-misses ( +- 0.147% )
after patch:
16490788 L1-icache-load-misses ( +- 0.180% )
The motivation of the patch was to fix a particular kprobes
bug that relates to the entry text section, the performance
advantage was discovered accidentally.
Whole perf output follows:
- results for current tip tree:
Performance counter stats for './hackbench/hackbench 10' (500 runs):
19417627 L1-icache-load-misses ( +- 0.147% )
2676914223 instructions # 0.497 IPC ( +- 0.079% )
5389516026 cycles ( +- 0.144% )
0.206267711 seconds time elapsed ( +- 0.138% )
- results for current tip tree with the patch applied:
Performance counter stats for './hackbench/hackbench 10' (500 runs):
16490788 L1-icache-load-misses ( +- 0.180% )
2717734941 instructions # 0.502 IPC ( +- 0.079% )
5414756975 cycles ( +- 0.148% )
0.206747566 seconds time elapsed ( +- 0.137% )
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I'm sure it was a mere oversight that the CONFIG_ prefixes are
missing.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Dave Jones <davej@redhat.com>
LKML-Reference: <4D7118D30200007800034F79@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1299119690-13991-5-git-send-email-ming.m.lin@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Intel Nehalem and Westmere CPUs the generic perf LLC-* events count the
L2 caches, not the real L3 LLC - this was inconsistent with behavior on
other CPUs.
Fixing this requires the use of the special OFFCORE_RESPONSE
events which need a separate mask register.
This has been implemented by the previous patch, now use this infrastructure
to set correct events for the LLC-* on Nehalem and Westmere.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1299119690-13991-3-git-send-email-ming.m.lin@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change logs against Andi's original version:
- Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra)
- Fixed a major event scheduling issue. There cannot be a ref++ on an
event that has already done ref++ once and without calling
put_constraint() in between. (Stephane Eranian)
- Use thread_cpumask for percore allocation. (Lin Ming)
- Use MSR names in the extra reg lists. (Lin Ming)
- Remove redundant "c = NULL" in intel_percore_constraints
- Fix comment of perf_event_attr::config1
Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event
that can be used to monitor any offcore accesses from a core.
This is a very useful event for various tunings, and it's
also needed to implement the generic LLC-* events correctly.
Unfortunately this event requires programming a mask in a separate
register. And worse this separate register is per core, not per
CPU thread.
This patch:
- Teaches perf_events that OFFCORE_RESPONSE needs extra parameters.
The extra parameters are passed by user space in the
perf_event_attr::config1 field.
- Adds support to the Intel perf_event core to schedule per
core resources. This adds fairly generic infrastructure that
can be also used for other per core resources.
The basic code has is patterned after the similar AMD northbridge
constraints code.
Thanks to Stephane Eranian who pointed out some problems
in the original version and suggested improvements.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch updates PEBS event constraints for Intel Atom, Nehalem, Westmere.
This patch also reorganizes the PEBS format/constraint detection code. It is
now based on processor model and not PEBS format. Two processors may use the
same PEBS format without have the same list of PEBS events.
In this second version, we simplified the initialization of the PEBS
constraints by leveraging the existing switch() statement in perf_event_intel.c.
We also renamed the constraint tables to be more consistent with regular
constraints.
In this 3rd version, we drop BR_INST_RETIRED.MISPRED from Intel Atom as it does
not seem to work. Use MISPREDICTED_BRANCH_RETIRED instead. Also add FP_ASSIST.*
o both Intel Nehalem and Westmere. I misssed those in the earlier patches.
Events were tested using libpfm4 perf_examples.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d6e6b02.815bdf0a.637b.07a7@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>