Commit Graph

9933 Commits

Author SHA1 Message Date
Alex Chiang 47817254b8 ACPI: processor: unify arch_acpi_processor_cleanup_pdc
The x86 and ia64 implementations of the function in $subject are
exactly the same.

Also, since the arch-specific implementations of setting _PDC have
been completely hollowed out, remove the empty shells.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22 03:24:14 -05:00
Alex Chiang 6c5807d7bc ACPI: processor: finish unifying arch_acpi_processor_init_pdc()
The only thing arch-specific about calling _PDC is what bits get
set in the input obj_list buffer.

There's no need for several levels of indirection to twiddle those
bits. Additionally, since we're just messing around with a buffer,
we can simplify the interface; no need to pass around the entire
struct acpi_processor * just to get at the buffer.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22 03:24:13 -05:00
Alex Chiang 08ea48a326 ACPI: processor: factor out common _PDC settings
Both x86 and ia64 initialize _PDC with mostly common bit settings.

Factor out the common settings and leave the arch-specific ones alone.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22 03:24:12 -05:00
Alex Chiang 407cd87c54 ACPI: processor: unify arch_acpi_processor_init_pdc
The x86 and ia64 implementations of arch_acpi_processor_init_pdc()
are almost exactly the same. The only difference is in what bits
they set in obj_list buffer.

Combine the boilerplate memory management code, and leave the
arch-specific bit twiddling in separate implementations.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22 03:24:11 -05:00
Alex Chiang 1d9cb470a7 ACPI: processor: introduce arch_has_acpi_pdc
arch dependent helper function that tells us if we should attempt to
evaluate _PDC on this machine or not.

The x86 implementation assumes that the CPUs in the machine must be
homogeneous, and that you cannot mix CPUs of different vendors.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22 03:24:10 -05:00
Joerg Roedel 0f76480643 x86/amd-iommu: Fix initialization failure panic
The assumption that acpi_table_parse passes the return value
of the hanlder function to the caller proved wrong
recently. The return value of the handler function is
totally ignored. This makes the initialization code for AMD
IOMMU buggy in a way that could cause a kernel panic on
initialization. This patch fixes the issue in the AMD IOMMU
driver.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-12-21 15:51:23 +01:00
Linus Torvalds eca9dfcd00 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf session: Make events_stats u64 to avoid overflow on 32-bit arches
  hw-breakpoints: Fix hardware breakpoints -> perf events dependency
  perf events: Dont report side-band events on each cpu for per-task-per-cpu events
  perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker
  perf events, x86/stacktrace: Make stack walking optional
  perf events: Remove unused perf_counter.h header file
  perf probe: Check new event name
  kprobe-tracer: Check new event/group name
  perf probe: Check whether debugfs path is correct
  perf probe: Fix libdwarf include path for Debian
2009-12-19 09:48:42 -08:00
Linus Torvalds 3981e15286 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system
  Makefile: Unexport LC_ALL instead of clearing it
  x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk
  x86: Reenable TSC sync check at boot, even with NONSTOP_TSC
  x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
  Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C
  x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA
  x86: Fix checking of SRAT when node 0 ram is not from 0
  x86, cpuid: Add "volatile" to asm in native_cpuid()
  x86, msr: msrs_alloc/free for CONFIG_SMP=n
  x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space
  x86: Add IA32_TSC_AUX MSR and use it
  x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers
  initramfs: add missing decompressor error check
  bzip2: Add missing checks for malloc returning NULL
  bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
2009-12-19 09:48:14 -08:00
Masami Hiramatsu 8bee738bb1 x86: Fix objdump version check in chkobjdump.awk for different formats.
Different version of objdump says its version in different way;

GNU objdump 2.16.1

or

GNU objdump version 2.19.51.0.14-1.fc11 20090722

This patch uses the first argument which starts with a number
as version string.

Changes in v2:
 - Remove unneeded increment.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <20091218154012.16960.5113.stgit@dhcp-100-2-132.bos.redhat.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-18 09:26:56 -08:00
Frederic Weisbecker 99e8c5a3b8 hw-breakpoints: Fix hardware breakpoints -> perf events dependency
The kbuild's select command doesn't propagate through the config
dependencies.

Hence the current rules of hardware breakpoint's config can't
ensure perf can never be disabled under us.

We have:

config X86
	selects HAVE_HW_BREAKPOINTS

config HAVE_HW_BREAKPOINTS
	select PERF_EVENTS

config PERF_EVENTS
	[...]

x86 will select the breakpoints but that won't propagate to perf
events. The user can still disable the latter, but it is
necessary for the breakpoints.

What we need is:

 - x86 selects HAVE_HW_BREAKPOINTS and PERF_EVENTS
 - HAVE_HW_BREAKPOINTS depends on PERF_EVENTS

so that we ensure PERF_EVENTS is enabled and frozen for x86.

This fixes the following kind of build errors:

 In file included from arch/x86/kernel/hw_breakpoint.c:31:
 include/linux/hw_breakpoint.h: In function 'hw_breakpoint_addr':
 include/linux/hw_breakpoint.h:39: error: 'struct perf_event' has no member named 'attr'

v2: Select also ANON_INODES from x86, required for perf

Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Michal Marek <mmarek@suse.cz>
Reported-by: Andrew Randrianasulu <randrik_a@yahoo.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1261010034-7786-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-18 13:11:51 +01:00
Suresh Siddha 18374d89e5 x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system
John Blackwood reported:
> on an older Dell PowerEdge 6650 system with 8 cpus (4 are hyper-threaded),
> and  32 bit (x86) kernel, once you change the irq smp_affinity of an irq
> to be less than all cpus in the system, you can never change really the
> irq smp_affinity back to be all cpus in the system (0xff) again,
> even though no error status is returned on the "/bin/echo ff >
> /proc/irq/[n]/smp_affinity" operation.
>
> This is due to that fact that BAD_APICID has the same value as
> all cpus (0xff) on 32bit kernels, and thus the value returned from
> set_desc_affinity() via the cpu_mask_to_apicid_and() function is treated
> as a failure in set_ioapic_affinity_irq_desc(), and no affinity changes
> are made.

set_desc_affinity() is already checking if the incoming cpu mask
intersects with the cpu online mask or not. So there is no need
for the apic op cpu_mask_to_apicid_and() to check again
and return BAD_APICID.

Remove the BAD_APICID return value from cpu_mask_to_apicid_and()
and also fix set_desc_affinity() to return -1 instead of using BAD_APICID
to represent error conditions (as cpu_mask_to_apicid_and() can return
logical or physical apicid values and BAD_APICID is really to represent
bad physical apic id).

Reported-by: John Blackwood <john.blackwood@ccur.com>
Root-caused-by: John Blackwood <john.blackwood@ccur.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1261103386.2535.409.camel@sbs-t61>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17 22:03:06 -08:00
Linus Torvalds 55db493b65 Merge branch 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  cpumask: rename tsk_cpumask to tsk_cpus_allowed
  cpumask: don't recommend set_cpus_allowed hack in Documentation/cpu-hotplug.txt
  cpumask: avoid dereferencing struct cpumask
  cpumask: convert drivers/idle/i7300_idle.c to cpumask_var_t
  cpumask: use modern cpumask style in drivers/scsi/fcoe/fcoe.c
  cpumask: avoid deprecated function in mm/slab.c
  cpumask: use cpu_online in kernel/perf_event.c
2009-12-17 17:00:20 -08:00
akpm@linux-foundation.org 8c63450718 x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk
It says

Warning: objdump version  is older than 2.19
Warning: Skipping posttest.

because it used the wrong field from `objdump -v':

akpm:/usr/src/25> /opt/crosstool/gcc-4.0.2-glibc-2.3.6/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-objdump -v
GNU objdump 2.16.1
Copyright 2005 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License.  This program has absolutely no warranty.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200912172326.nBHNQaQl024796@imap1.linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
2009-12-17 15:34:29 -08:00
Pallipadi, Venkatesh 6c56ccecf0 x86: Reenable TSC sync check at boot, even with NONSTOP_TSC
Commit 83ce4009 did the following change
If the TSC is constant and non-stop, also set it reliable.

But, there seems to be few systems that will end up with TSC warp across
sockets, depending on how the cpus come out of reset. Skipping TSC sync
test on such systems may result in time inconsistency later.

So, reenable TSC sync test even on constant and non-stop TSC systems.
Set, sched_clock_stable to 1 by default and reset it in
mark_tsc_unstable, if TSC sync fails.

This change still gives perf benefit mentioned in 83ce4009 for systems
where TSC is reliable.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091217202702.GA18015@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17 14:44:35 -08:00
Russ Anderson b76365a18f x86, uv: Add serial number parameter to uv_bios_get_sn_info()
Add system_serial_number to the information returned by
uv_bios_get_sn_info() UV BIOS call.

Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20091217165323.GA30774@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17 11:33:51 -08:00
Linus Torvalds 5a865c0606 Merge branch 'for-33' of git://repo.or.cz/linux-kbuild
* 'for-33' of git://repo.or.cz/linux-kbuild: (29 commits)
  net: fix for utsrelease.h moving to generated
  gen_init_cpio: fixed fwrite warning
  kbuild: fix make clean after mismerge
  kbuild: generate modules.builtin
  genksyms: properly consider  EXPORT_UNUSED_SYMBOL{,_GPL}()
  score: add asm/asm-offsets.h wrapper
  unifdef: update to upstream revision 1.190
  kbuild: specify absolute paths for cscope
  kbuild: create include/generated in silentoldconfig
  scripts/package: deb-pkg: use fakeroot if available
  scripts/package: add KBUILD_PKG_ROOTCMD variable
  scripts/package: tar-pkg: use tar --owner=root
  Kbuild: clean up marker
  net: add net_tstamp.h to headers_install
  kbuild: move utsrelease.h to include/generated
  kbuild: move autoconf.h to include/generated
  drop explicit include of autoconf.h
  kbuild: move compile.h to include/generated
  kbuild: drop include/asm
  kbuild: do not check for include/asm-$ARCH
  ...

Fixed non-conflicting clean merge of modpost.c as per comments from
Stephen Rothwell (modpost.c had grown an include of linux/autoconf.h
that needed to be changed to generated/autoconf.h)
2009-12-17 07:23:42 -08:00
Linus Torvalds 04a1e62c2c x86/ptrace: make genregs[32]_get/set more robust
The loop condition is fragile: we compare an unsigned value to zero, and
then decrement it by something larger than one in the loop.  All the
callers should be passing in appropriately aligned buffer lengths, but
it's better to just not rely on it, and have some appropriate defensive
loop limits.

Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 07:04:56 -08:00
Roland Dreier 4beb3d6d14 x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
Not all awk implementations (including the default awk in Ubuntu 9.10)
support POSIX character classes.  Since x86-opcode-map.txt is plain
ASCII, we can just use explicit ranges for lower case, alphabetic, and
alphanumeric characters instead.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <adabphy750b.fsf@roland-alpha.cisco.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17 07:03:21 -08:00
Frederic Weisbecker 06d65bda75 perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker
It's just wasteful for stacktrace users like perf to walk
through every entries on the stack whereas these only accept
reliable ones, ie: that the frame pointer validates.

Since perf requires pure reliable stacktraces, it needs a stack
walker based on frame pointers-only to optimize the stacktrace
processing.

This might solve some near-lockup scenarios that can be triggered
by call-graph tracing timer events.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1261024834-5336-2-git-send-regression-fweisbec@gmail.com>
[ v2: fix for modular builds and small detail tidyup ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17 10:42:52 +01:00
Frederic Weisbecker 61c1917f47 perf events, x86/stacktrace: Make stack walking optional
The current print_context_stack helper that does the stack
walking job is good for usual stacktraces as it walks through
all the stack and reports even addresses that look unreliable,
which is nice when we don't have frame pointers for example.

But we have users like perf that only require reliable
stacktraces, and those may want a more adapted stack walker, so
lets make this function a callback in stacktrace_ops that users
can tune for their needs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1261024834-5336-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17 09:56:19 +01:00
Rusty Russell a4636818f8 cpumask: rename tsk_cpumask to tsk_cpus_allowed
Noone uses this wrapper yet, and Ingo asked that it be kept consistent
with current task_struct usage.

(One user crept in via linux-next: fixed)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
2009-12-17 11:43:30 +10:30
Yinghai Lu 6a1e008a09 x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA
Due to recent changes wakeup and mptable, we run out of early
reservations on 32-bit NUMA.  Thus, adjust the available number.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B22D754.2020706@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 16:46:23 -08:00
Yinghai Lu 3299625036 x86: Fix checking of SRAT when node 0 ram is not from 0
Found one system that boot from socket1 instead of socket0, SRAT get rejected...

[    0.000000] SRAT: Node 1 PXM 0 0-a0000
[    0.000000] SRAT: Node 1 PXM 0 100000-80000000
[    0.000000] SRAT: Node 1 PXM 0 100000000-2080000000
[    0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000
[    0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000
[    0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000
[    0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000
[    0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000
[    0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000
[    0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000
...
[    0.000000] NUMA: Allocated memnodemap from 500000 - 701040
[    0.000000] NUMA: Using 20 for the hash shift.
[    0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used
[    0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used
[    0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used
[    0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used
[    0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used
[    0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used
[    0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used
[    0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used
[    0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used
[    0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used
[    0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used.
[    0.000000] SRAT: SRAT not used.

the early_node_map is not sorted because node0 with non zero start come first.

so try to sort it right away after all regions are registered.

also fixs refression by 8716273c (x86: Export srat physical topology)

-v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g)
-v3: update comments.

Reported-and-tested-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B2579D2.3010201@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 16:43:37 -08:00
Suresh Siddha 45a94d7cd4 x86, cpuid: Add "volatile" to asm in native_cpuid()
xsave_cntxt_init() does something like:

	cpuid(0xd, ..);	// find out what features FP/SSE/.. etc are supported

	xsetbv();	// enable the features known to OS

	cpuid(0xd, ..);	// find out the size of the context for features enabled

Depending on what features get enabled in xsetbv(), value of the
cpuid.eax=0xd.ecx=0.ebx changes correspondingly (representing the
size of the context that is enabled).

As we don't have volatile keyword for native_cpuid(), gcc 4.1.2
optimizes away the second cpuid and the kernel continues to use
the cpuid information obtained before xsetbv(), ultimately leading to kernel
crash on processors supporting more state than the legacy FP/SSE.

Add "volatile" for native_cpuid().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1261009542.2745.55.camel@sbs-t61.sc.intel.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 16:30:57 -08:00
Borislav Petkov 6ede31e030 x86, msr: msrs_alloc/free for CONFIG_SMP=n
Randy Dunlap reported the following build error:

"When CONFIG_SMP=n, CONFIG_X86_MSR=m:

ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined!
ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!"

This is due to the fact that <arch/x86/lib/msr.c> is conditioned on
CONFIG_SMP and in the UP case we have only the stubs in the header.
Fork off SMP functionality into a new file (msr-smp.c) and build
msrs_{alloc,free} unconditionally.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <20091216231625.GD27228@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 15:36:32 -08:00
Andreas Herrmann 9d260ebc09 x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space
Use NodeId MSR to get NodeId and number of nodes per processor.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20091216144355.GB28798@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 15:06:23 -08:00
Jiri Slaby 5714868812 PCI: fix section mismatch on update_res()
Remark update_res from __init to __devinit as it is called also
from __devinit functions.

This patch removes the following warning message:

  WARNING: vmlinux.o(.devinit.text+0x774a): Section mismatch
  in reference from the function pci_root_bus_res() to the
  function .init.text:update_res()
  The function __devinit pci_root_bus_res() references
  a function __init update_res().
  If update_res is only used by pci_root_bus_res then
  annotate update_res with a matching annotation.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Aristeu Sergio <arozansk@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: linux-pci@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:52 -08:00
Linus Torvalds 288f02bbb6 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (117 commits)
  ACPI processor: Fix section mismatch for processor_add()
  ACPI: Add platform-wide _OSC support.
  ACPI: cleanup pci_root _OSC code.
  ACPI: Add a generic API for _OSC -v2
  msi-wmi: depend on backlight and fix corner-cases problems
  msi-wmi: switch to using input sparse keymap library
  msi-wmi: replace one-condition switch-case with if statement
  msi-wmi: remove unused field 'instance' in key_entry structure
  msi-wmi: remove custom runtime debug implementation
  msi-wmi: rework init
  msi-wmi: remove useless includes
  X86 drivers: Introduce msi-wmi driver
  Toshiba Bluetooth Enabling driver (RFKill handler v3)
  ACPI: fix for lapic_timer_propagate_broadcast()
  acpi_pad: squish warning
  ACPI: dock: minor whitespace and style cleanups
  ACPI: dock: add struct dock_station * directly to platform device data
  ACPI: dock: dock_add - hoist up platform_device_register_simple()
  ACPI: dock: remove global 'dock_device_name'
  ACPI: dock: combine add|alloc_dock_dependent_device (v2)
  ...
2009-12-16 12:33:19 -08:00
Linus Torvalds bac5e54c29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
  direct I/O fallback sync simplification
  ocfs: stop using do_sync_mapping_range
  cleanup blockdev_direct_IO locking
  make generic_acl slightly more generic
  sanitize xattr handler prototypes
  libfs: move EXPORT_SYMBOL for d_alloc_name
  vfs: force reval of target when following LAST_BIND symlinks (try #7)
  ima: limit imbalance msg
  Untangling ima mess, part 3: kill dead code in ima
  Untangling ima mess, part 2: deal with counters
  Untangling ima mess, part 1: alloc_file()
  O_TRUNC open shouldn't fail after file truncation
  ima: call ima_inode_free ima_inode_free
  IMA: clean up the IMA counts updating code
  ima: only insert at inode creation time
  ima: valid return code from ima_inode_alloc
  fs: move get_empty_filp() deffinition to internal.h
  Sanitize exec_permission_lite()
  Kill cached_lookup() and real_lookup()
  Kill path_lookup_open()
  ...

Trivial conflicts in fs/direct-io.c
2009-12-16 12:04:02 -08:00
Linus Torvalds 61ecdb84c1 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix kprobes build with non-gawk awk
  x86: Split swiotlb initialization into two stages
  x86: Regex support and known-movable symbols for relocs, fix _end
  x86, msr: Remove incorrect, duplicated code in the MSR driver
  x86: Merge kernel_thread()
  x86: Sync 32/64-bit kernel_thread
  x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper
  x86, 64-bit: Use user_mode() to determine new stack pointer in copy_thread()
  x86, 64-bit: Move kernel_thread to C
  x86-64, paravirt: Call set_iopl_mask() on 64 bits
  x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2
  x86: Merge sys_clone
  x86, 32-bit: Convert sys_vm86 & sys_vm86old
  x86: Merge sys_sigaltstack
  x86: Merge sys_execve
  x86: Merge sys_iopl
  x86-32: Add new pt_regs stubs
  cpumask: Use modern cpumask style in arch/x86/kernel/cpu/mcheck/mce-inject.c
2009-12-16 12:02:37 -08:00
Linus Torvalds 74f3ae7434 Merge branch 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  modpost: fix segfault with short symbol names
  module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y
  Kbuild: clear marker out of modpost
  module: make MODULE_SYMBOL_PREFIX into a CONFIG option
  ARM: unexport symbols used to implement floating point emulation
  ARM: use unified discard definition in linker script
  x86: don't export inline function
  sparc64: don't export static inline pci_ functions
2009-12-16 10:47:24 -08:00
Al Viro 853b3da10d sanitize do_pipe_flags() callers in arch
* hpux_pipe() - no need to take BKL
* sys32_pipe() in arch/x86/ia32 and xtensa_pipe() in arch/xtensa -
	no need at all, since both functions are open-coded sys_pipe()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:40 -05:00
Akinobu Mita a66022c457 iommu-helper: use bitmap library
Use bitmap library and kill some unused iommu helper functions.

1. s/iommu_area_free/bitmap_clear/

2. s/iommu_area_reserve/bitmap_set/

3. Use bitmap_find_next_zero_area instead of find_next_zero_area

  This cannot be simple substitution because find_next_zero_area
  doesn't check the last bit of the limit in bitmap

4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:18 -08:00
Jack Steiner 56abcf24ff gru: function to generate chipset IPI values
Create a function to generate the value that is written to the UV hub MMR
to cause an IPI interrupt to be sent.  The function will be used in the
GRU message queue error recovery code that sends IPIs to nodes in remote
partitions.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:17 -08:00
Robin Holt c2c9f11574 x86: uv: update XPC to handle updated BIOS interface
The UV BIOS has moved the location of some of their pointers to the
"partition reserved page" from memory into a uv hub MMR.  The GRU does not
support bcopy operations from MMR space so we need to special case the MMR
addresses using VLOAD operations.

Additionally, the BIOS call for registering a message queue watchlist has
removed the 'blade' value and eliminated the structure that was being
passed in.  This is also reflected in this patch.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:14 -08:00
Robin Holt fae419f2ab x86: uv: introduce uv_gpa_is_mmr
Provide a mechanism for determining if a global physical address is
pointing to a UV hub MMR.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Robin Holt 729d69e699 x86: uv: introduce a means to translate from gpa -> socket_paddr
The UV BIOS has been updated to implement some of our interface
functionality differently than originally expected.  These patches update
the kernel to the bios implementation and include a few minor bug fixes
which prevent us from doing significant testing on real hardware.

This patch:

For SGI UV systems, translate from a global physical address back to a
socket physical address.  This does nothing to ensure the socket physical
address is actually addressable by the kernel.  That is the responsibility
of the user of the function.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Jan Beulich ac2b3e67dd dma-mapping: fix off-by-one error in dma_capable()
dma_mask is, when interpreted as address, the last valid byte, and hence
comparison msut also be done using the last valid of the buffer in
question.

Also fix the open-coded instances in lib/swiotlb.c.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Christoph Hellwig 698ba7b5a3 elf: kill USE_ELF_CORE_DUMP
Currently all architectures but microblaze unconditionally define
USE_ELF_CORE_DUMP.  The microblaze omission seems like an error to me, so
let's kill this ifdef and make sure we are the same everywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Simek <michal.simek@petalogix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Oleg Nesterov d519650373 ptrace: x86: change syscall_trace_leave() to rely on tracehook when stepping
Suggested by Roland.

Unlike powepc, x86 always calls tracehook_report_syscall_exit(step) with
step = 0, and sends the trap by hand.

This results in unnecessary SIGTRAP when PTRACE_SINGLESTEP follows the
syscall-exit stop.

Change syscall_trace_leave() to pass the correct "step" argument to
tracehook and remove the send_sigtrap() logic.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:08 -08:00
Oleg Nesterov 7f38551fc3 ptrace: x86: implement user_single_step_siginfo()
Suggested by Roland.

Implement user_single_step_siginfo() for x86.  Extract this code from
send_sigtrap().

Since x86 calls tracehook_report_syscall_exit(step => 0) the new helper is
not used yet.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:08 -08:00
Sheng Yang 5df974009f x86: Add IA32_TSC_AUX MSR and use it
Clean up write_tsc() and write_tscp_aux() by replacing
hardcoded values.

No change in functionality.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
LKML-Reference: <1260942485-19156-4-git-send-email-sheng@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 09:02:42 +01:00
Len Brown 0ceafc33af Merge branch 'bugzilla-14700' into release 2009-12-15 22:35:21 -05:00
H. Peter Anvin 0b962d473a x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers
register_chrdev() hardcodes registering 256 minors, presumably to
avoid breaking old drivers.  However, we need to register enough
minors so that we have all possible CPUs.

checkpatch warns on this patch, but the patch is correct: NR_CPUS here
is a static *upper bound* on the *maximum CPU index* (not *number of
CPUs!*) and that is what we want.

Reported-and-tested-by: Russ Anderson <rja@sgi.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@git.kernel.org>
2009-12-15 15:13:07 -08:00
Jonathan Nieder 23637568ad x86: Fix kprobes build with non-gawk awk
The instruction attribute table generator fails when run by mawk
or original-awk:

 $ mawk -f arch/x86/tools/gen-insn-attr-x86.awk \
	arch/x86/lib/x86-opcode-map.txt > /dev/null
 Semantic error at 240: Second IMM error
 $ echo $?
 1

Line 240 contains "c8: ENTER Iw,Ib", which indicates that this
instruction has two immediate operands, the second of which is
one byte.  The script loops through the immediate operands using
a for loop.

Unfortunately, there is no guarantee in awk that a for (variable
in array) loop will return the indices in increasing order.
Internally, both original-awk and mawk iterate over a hash table
for this purpose, and both implementations happen to produce the
index 2 before 1.  The supposed second immediate operand is more
than one byte wide, producing the error.

So loop over the indices in increasing order instead.  As a
side-effect, with mawk this means the silly two-entry hash table
never has to be built.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091213220437.GA27718@progeny.tock>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 20:35:49 +01:00
Ingo Molnar bf08b3b1a1 Merge branch 'x86/mce' into x86/urgent
Merge reason: Leftover mini-topic from the merge window - merge it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 20:33:53 +01:00
Ingo Molnar ab1eebe77d Merge branch 'x86/asm' into x86/urgent
Merge reason: it's stable so lets push it upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 20:33:28 +01:00
Linus Torvalds 8f0ddf91f2 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
  clockevents: Convert to raw_spinlock
  clockevents: Make tick_device_lock static
  debugobjects: Convert to raw_spinlocks
  perf_event: Convert to raw_spinlock
  hrtimers: Convert to raw_spinlocks
  genirq: Convert irq_desc.lock to raw_spinlock
  smp: Convert smplocks to raw_spinlocks
  rtmutes: Convert rtmutex.lock to raw_spinlock
  sched: Convert pi_lock to raw_spinlock
  sched: Convert cpupri lock to raw_spinlock
  sched: Convert rt_runtime_lock to raw_spinlock
  sched: Convert rq->lock to raw_spinlock
  plist: Make plist debugging raw_spinlock aware
  bkl: Fixup core_lock fallout
  locking: Cleanup the name space completely
  locking: Further name space cleanups
  alpha: Fix fallout from locking changes
  locking: Implement new raw_spinlock
  locking: Convert raw_rwlock functions to arch_rwlock
  locking: Convert raw_rwlock to arch_rwlock
  ...
2009-12-15 09:02:01 -08:00
André Goddard Rosa e7d2860b69 tree-wide: convert open calls to remove spaces to skip_spaces() lib function
Makes use of skip_spaces() defined in lib/string.c for removing leading
spaces from strings all over the tree.

It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
   text    data     bss     dec     hex filename
  64688     584     592   65864   10148 (TOTALS-BEFORE)
  64641     584     592   65817   10119 (TOTALS-AFTER)

Also, while at it, if we see (*str && isspace(*str)), we can be sure to
remove the first condition (*str) as the second one (isspace(*str)) also
evaluates to 0 whenever *str == 0, making it redundant. In other words,
"a char equals zero is never a space".

Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
and found occurrences of this pattern on 3 more files:
    drivers/leds/led-class.c
    drivers/leds/ledtrig-timer.c
    drivers/video/output.c

@@
expression str;
@@

( // ignore skip_spaces cases
while (*str &&  isspace(*str)) { \(str++;\|++str;\) }
|
- *str &&
isspace(*str)
)

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Julia Lawall <julia@diku.dk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:32 -08:00
Andres Salomon c95d1e53ed cs5535: drop the Geode-specific MFGPT/GPIO code
With generic modular drivers handling all of this stuff, the
geode-specific code can go away.  The cs5535-gpio, cs5535-mfgpt, and
cs5535-clockevt drivers now handle this.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon f3a57a60d3 cs5535: define lxfb/gxfb MSRs in linux/cs5535.h
..and include them in the lxfb/gxfb drivers rather than asm/geode.h (where
possible).

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon f060f27007 cs5535: move VSA2 checks into linux/cs5535.h
Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon 2e8c12436f cs5535: move the DIVIL MSR definition into linux/cs5535.h
The only thing that uses this is the reboot_fixups code.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon 82dca611bb cs5535: add a generic MFGPT driver
This is based on the old code on arch/x86/kernel/mfgpt_32.c, except it's
not x86 specific, it's modular, and it makes use of a PCI BAR rather than
a random MSR.  Currently module unloading is not supported; it's uncertain
whether or not it can be made work with the hardware.

[akpm@linux-foundation.org: add X86 dependency]
Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon 3c55494670 ALSA: cs5535audio: free OLPC quirks from reliance on MGEODE_LX cpu optimization
Previously, OLPC support for the mic extensions was only enabled in the
ALSA driver if CONFIG_OLPC and CONFIG_MGEODE_LX were both set.  This was
because the old geode GPIO code was written in a manner that assumed
CONFIG_MGEODE_LX.  With the new cs553x-gpio driver, this is no longer the
case; as such, we can drop the requirement on CONFIG_MGEODE_LX and instead
include a requirement on GPIOLIB.

We use the generic GPIO API rather than the cs553x-specific API.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:27 -08:00
Andres Salomon 5f0a96b044 cs5535-gpio: add AMD CS5535/CS5536 GPIO driver support
This creates a CS5535/CS5536 GPIO driver which uses a gpio_chip backend
(allowing GPIO users to use the generic GPIO API if desired) while also
allowing architecture-specific users directly (via the cs5535_gpio_*
functions).

Tested on an OLPC machine.  Some Leemotes also use CS5536 (with a mips
cpu), which is why this is in drivers/gpio rather than arch/x86.
Currently, it conflicts with older geode GPIO support; once MFGPT support
is reworked to also be more generic, the older geode code will be removed.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: David Brownell <david-b@pacbell.net>
Reviewed-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:27 -08:00
Lee Schermerhorn 4e25b2576e hugetlb: add generic definition of NUMA_NO_NODE
Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers
to generic header 'linux/numa.h' for use in generic code.  NUMA_NO_NODE
replaces bare '-1' where it's used in this series to indicate "no node id
specified".  Ultimately, it can be used to replace the -1 elsewhere where
it is used similarly.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:12 -08:00
FUJITA Tomonori 186a25026c x86: Split swiotlb initialization into two stages
The commit f4780ca005 moves
swiotlb initialization before dma32_free_bootmem(). It's
supposed to fix a bug that the commit
75f1cdf1dd introduced, we
initialize SWIOTLB right after dma32_free_bootmem so we wrongly
steal memory area allocated for GART with broken BIOS earlier.

However, the above commit introduced another problem, which
likely breaks machines with huge amount of memory. Such a box
use the majority of DMA32_ZONE so there is no memory for
swiotlb.

With this patch, the x86 IOMMU initialization sequence are:

1. We set swiotlb to 1 in the case of (max_pfn > MAX_DMA32_PFN
   && !no_iommu). If swiotlb usage is forced by the boot option,
   we go to the step 3 and finish (we don't try to detect IOMMUs).

2. We call the detection functions of all the IOMMUs. The
   detection function sets x86_init.iommu.iommu_init to the IOMMU
   initialization function (so we can avoid calling the
   initialization functions of all the IOMMUs needlessly).

3. We initialize swiotlb (and set dma_ops to swiotlb_dma_ops) if
   swiotlb is set to 1.

4. If the IOMMU initialization function doesn't need swiotlb
   (e.g. the initialization is sucessful) then sets swiotlb to zero.

5. If we find that swiotlb is set to zero, we free swiotlb
   resource.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <20091215204729A.fujita.tomonori@lab.ntt.co.jp>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 13:01:57 +01:00
Rusty Russell e642804772 x86: don't export inline function
For CONFIG_PARAVIRT, load_gs_index is an inline function (it's #defined
to native_load_gs_index otherwise).

Exporting an inline function breaks the new assembler-based alphabetical
sorted symbol list:

Today's linux-next build (x86_64 allmodconfig) failed like this:

	.tmp_exports-asm.o: In function `__ksymtab_load_gs_index':
	(__ksymtab_sorted+0x5b40): undefined reference to `load_gs_index'

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: x86@kernel.org
Cc: alan-jenkins@tuffmail.co.uk
2009-12-15 16:28:15 +10:30
Zhao Yakui 03a05ed115 ACPI: Use the ARB_DISABLE for the CPU which model id is less than 0x0f.
Currently, ARB_DISABLE is a NOP on all of the recent Intel platforms.
For such platforms, reduce contention on c3_lock by skipping the fake
ARB_DISABLE.

The cpu model id on one laptop is 14. If we disable ARB_DISABLE on this box,
the box can't be booted correctly. But if we still enable ARB_DISABLE on this
box, the box can be booted correctly.

So we still use the ARB_DISABLE for the cpu which mode id is less than 0x0f.

http://bugzilla.kernel.org/show_bug.cgi?id=14700

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
cc: stable@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-14 21:54:30 -05:00
Thomas Gleixner 239007b844 genirq: Convert irq_desc.lock to raw_spinlock
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Thomas Gleixner e5931943d0 locking: Convert raw_rwlock functions to arch_rwlock
Name space cleanup for rwlock functions. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner fb3a6bbc91 locking: Convert raw_rwlock to arch_rwlock
Not strictly necessary for -rt as -rt does not have non sleeping
rwlocks, but it's odd to not have a consistent naming convention.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner 0199c4e68d locking: Convert __raw_spin* functions to arch_spin*
Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner edc35bd72e locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED
Further name space cleanup. No functional change

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner 445c89514b locking: Convert raw_spinlock to arch_spinlock
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
H. Peter Anvin 873b5271f8 x86: Regex support and known-movable symbols for relocs, fix _end
This adds a new category of symbols to the relocs program: symbols
which are known to be relative, even though the linker emits them as
absolute; this is the case for symbols that live in the linker script,
which currently applies to _end.

Unfortunately the previous workaround of putting _end in its own empty
section was defeated by newer binutils, which remove empty sections
completely.

This patch also changes the symbol matching to use regular expressions
instead of hardcoded C for specific patterns.

This is a decidedly non-minimal patch: a modified version of the
relocs program is used as part of the Syslinux build, and this 	is
basically a backport to Linux of some of those changes; they have
thus been well tested.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4AF86211.3070103@zytor.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2009-12-14 13:55:20 -08:00
Linus Torvalds 75b08038ce Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: Clean up thermal init by introducing intel_thermal_supported()
  x86, mce: Thermal monitoring depends on APIC being enabled
  x86: Gart: fix breakage due to IOMMU initialization cleanup
  x86: Move swiotlb initialization before dma32_free_bootmem
  x86: Fix build warning in arch/x86/mm/mmio-mod.c
  x86: Remove usedac in feature-removal-schedule.txt
  x86: Fix duplicated UV BAU interrupt vector
  nvram: Fix write beyond end condition; prove to gcc copy is safe
  mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe
  x86: Limit the number of processor bootup messages
  x86: Remove enabling x2apic message for every CPU
  doc: Add documentation for bootloader_{type,version}
  x86, msr: Add support for non-contiguous cpumasks
  x86: Use find_e820() instead of hard coded trampoline address
  x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks

Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c
2009-12-14 12:36:46 -08:00
H. Peter Anvin 494c2ebfb2 x86, msr: Remove incorrect, duplicated code in the MSR driver
The MSR driver would compute the values for cpu and c at declaration,
and then again in the body of the function.  This isn't merely
redundant, but unsafe, since cpu might not refer to a valid CPU at
that point.

Remove the unnecessary and dangerous references in the declarations.
This code now matches the equivalent code in the CPUID driver.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-14 10:05:05 -08:00
Linus Torvalds d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
Hidetoshi Seto 70fe440718 x86, mce: Clean up thermal init by introducing intel_thermal_supported()
It looks better to have a common function. No change in functionality.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4B25FDDC.407@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
2009-12-14 10:38:41 +01:00
Cyrill Gorcunov 485a2e1973 x86, mce: Thermal monitoring depends on APIC being enabled
Add check if APIC is not disabled since thermal
monitoring depends on it. As only apic gets disabled
we should not try to install "thermal monitor" vector,
print out that thermal monitoring is enabled and etc...

Note that "Intel Correct Machine Check Interrupts" already
has such a check.

Also I decided to not add cpu_has_apic check into
mcheck_intel_therm_init since even if it'll call apic_read on
disabled apic -- it's safe here and allow us to save a few code
bytes.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <4B25FDC2.3020401@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 10:38:41 +01:00
Yinghai Lu f3eee54276 x86: Gart: fix breakage due to IOMMU initialization cleanup
This fixes the following breakage of the commit
75f1cdf1dda92cae037ec848ae63690d91913eac:

- GART systems that don't AGP with broken BIOS and more than 4GB
  memory are forced to use swiotlb. They can allocate aperture by
  hand and use GART.

- GART systems without GAP must disable GART on shutdown.

- swiotlb usage is forced by the boot option,
  gart_iommu_hole_init() is not called, so we disable GART
  early_gart_iommu_check().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1260759135-6450-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 08:57:40 +01:00
FUJITA Tomonori f4780ca005 x86: Move swiotlb initialization before dma32_free_bootmem
The commit 75f1cdf1dd introduced a
bug that we initialize SWIOTLB right after dma32_free_bootmem so
we wrongly steal memory area allocated for GART with broken BIOS
earlier.

This moves swiotlb initialization before dma32_free_bootmem().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: yinghai@kernel.org
LKML-Reference: <1260759135-6450-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 08:57:40 +01:00
Joe Perches eba11d6da7 x86: Fix build warning in arch/x86/mm/mmio-mod.c
Stephen Rothwell reported these warnings:

 arch/x86/mm/mmio-mod.c: In function 'print_pte':
 arch/x86/mm/mmio-mod.c💯 warning: too many arguments for format
 arch/x86/mm/mmio-mod.c:106: warning: too many arguments for format

The 'fmt' was left out accidentally.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus <torvalds@linux-foundation.org>
LKML-Reference: <1260775443.18538.16.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 08:55:43 +01:00
Cliff Wickman 1d865fb728 x86: Fix duplicated UV BAU interrupt vector
Interrupt vector 0xec has been doubly defined in irq_vectors.h

It seems arbitrary whether LOCAL_PENDING_VECTOR or
UV_BAU_MESSAGE is the higher number.  As long as they are
unique. If they are not unique we'll hit a BUG in
alloc_system_vector().

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <E1NJ9Pe-0004P7-0Q@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-13 08:17:40 +01:00
Sam Ravnborg 273b281fa2 kbuild: move utsrelease.h to include/generated
Fix up all users of utsrelease.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2009-12-12 13:08:15 +01:00
Sam Ravnborg 9204595405 kbuild: move compile.h to include/generated
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2009-12-12 13:08:14 +01:00
Sam Ravnborg 559df2e021 kbuild: move asm-offsets.h to include/generated
The simplest method was to add an extra asm-offsets.h
file in arch/$ARCH/include/asm that references the generated file.

We can now migrate the architectures one-by-one to reference
the generated file direct - and when done we can delete the
temporary arch/$ARCH/include/asm/asm-offsets.h file.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2009-12-12 13:08:14 +01:00
Linus Torvalds 756300983f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Fix PCI hotplug with passthrough mode
  x86/amd-iommu: Fix passthrough mode
  x86: mmio-mod.c: Use pr_fmt
  x86: kmmio.c: Add and use pr_fmt(fmt)
  x86: i8254.c: Add pr_fmt(fmt)
  x86: setup_percpu.c: Use pr_<level> and add pr_fmt(fmt)
  x86: es7000_32.c: Use pr_<level> and add pr_fmt(fmt)
  x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs()
  x86: Factor duplicated code out of __show_regs() into show_regs_common()
  arch/x86/kernel/microcode*: Use pr_fmt() and remove duplicated KERN_ERR prefix
  x86, mce: fix confusion between bank attributes and mce attributes
  x86/mce: Set up timer unconditionally
  x86: Fix bogus warning in apic_noop.apic_write()
  x86: Fix typo in arch/x86/mm/kmmio.c
  x86: ASUS P4S800 reboot=bios quirk
2009-12-11 20:47:59 -08:00
Linus Torvalds 6f696eb17b Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
  x86, perf events: Check if we have APIC enabled
  perf_event: Fix variable initialization in other codepaths
  perf kmem: Fix unused argument build warning
  perf symbols: perf_header__read_build_ids() offset'n'size should be u64
  perf symbols: dsos__read_build_ids() should read both user and kernel buildids
  perf tools: Align long options which have no short forms
  perf kmem: Show usage if no option is specified
  sched: Mark sched_clock() as notrace
  perf sched: Add max delay time snapshot
  perf tools: Correct size given to memset
  perf_event: Fix perf_swevent_hrtimer() variable initialization
  perf sched: Fix for getting task's execution time
  tracing/kprobes: Fix field creation's bad error handling
  perf_event: Cleanup for cpu_clock_perf_event_update()
  perf_event: Allocate children's perf_event_ctxp at the right time
  perf_event: Clean up __perf_event_init_context()
  hw-breakpoints: Modify breakpoints without unregistering them
  perf probe: Update perf-probe document
  perf probe: Support --del option
  trace-kprobe: Support delete probe syntax
  ...
2009-12-11 20:47:30 -08:00
Linus Torvalds 2fe77b81c7 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface
  [CPUFREQ] make internal cpufreq_add_dev_* static
  [CPUFREQ] use an enum for speedstep processor identification
  [CPUFREQ] Document units for transition latency
  [CPUFREQ] Use global sysfs cpufreq structure for conservative governor tunings
  [CPUFREQ] Documentation: ABI: /sys/devices/system/cpu/cpu#/cpufreq/
  [CPUFREQ] powernow-k6: set transition latency value so ondemand governor can be used
  [CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c
2009-12-11 15:59:23 -08:00
Linus Torvalds 3126c136bc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
  ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
  ext3: Fix data / filesystem corruption when write fails to copy data
  ext4: Support for 64-bit quota format
  ext3: Support for vfsv1 quota format
  quota: Implement quota format with 64-bit space and inode limits
  quota: Move definition of QFMT_OCFS2 to linux/quota.h
  ext2: fix comment in ext2_find_entry about return values
  ext3: Unify log messages in ext3
  ext2: clear uptodate flag on super block I/O error
  ext2: Unify log messages in ext2
  ext3: make "norecovery" an alias for "noload"
  ext3: Don't update the superblock in ext3_statfs()
  ext3: journal all modifications in ext3_xattr_set_handle
  ext2: Explicitly assign values to on-disk enum of filetypes
  quota: Fix WARN_ON in lookup_one_len
  const: struct quota_format_ops
  ubifs: remove manual O_SYNC handling
  afs: remove manual O_SYNC handling
  kill wait_on_page_writeback_range
  vfs: Implement proper O_SYNC semantics
  ...
2009-12-11 15:31:13 -08:00
Linus Torvalds 880188b243 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb: Always process the whole breakpoint list on activate or deactivate
  kgdb: continue and warn on signal passing from gdb
  kgdb,x86: do not set kgdb_single_step on x86
  kgdb: allow for cpu switch when single stepping
  kgdb,i386: Fix corner case access to ss with NMI watch dog exception
  kgdb: Replace strstr() by strchr() for single-character needles
  kgdbts: Read buffer overflow
  kgdb: Read buffer overflow
  kgdb,x86: remove redundant test
2009-12-11 15:19:56 -08:00
Mike Travis 2eaad1fddd x86: Limit the number of processor bootup messages
When there are a large number of processors in a system, there
is an excessive amount of messages sent to the system console.
It's estimated that with 4096 processors in a system, and the
console baudrate set to 56K, the startup messages will take
about 84 minutes to clear the serial port.

This set of patches limits the number of repetitious messages
which contain no additional information.  Much of this information
is obtainable from the /proc and /sysfs.   Some of the messages
are also sent to the kernel log buffer as KERN_DEBUG messages so
dmesg can be used to examine more closely any details specific to
a problem.

The new cpu bootup sequence for system_state == SYSTEM_BOOTING:

Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
...
Booting Node   3, Processors  #56 #57 #58 #59 #60 #61 #62 #63 Ok.
Brought up 64 CPUs

After the system is running, a single line boot message is displayed
when CPU's are hotplugged on:

    Booting Node %d Processor %d APIC 0x%x

Status of the following lines:

    CPU: Physical Processor ID:		printed once (for boot cpu)
    CPU: Processor Core ID:		printed once (for boot cpu)
    CPU: Hyper-Threading is disabled	printed once (for boot cpu)
    CPU: Thermal monitoring enabled	printed once (for boot cpu)
    CPU %d/0x%x -> Node %d:		removed
    CPU %d is now offline:		only if system_state == RUNNING
    Initializing CPU#%d:		KERN_DEBUG

Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <4B219E28.8080601@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-11 15:16:00 -08:00
Mike Travis 450b1e8dd1 x86: Remove enabling x2apic message for every CPU
Print only once that the system is supporting x2apic mode.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4B226E92.5080904@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-11 14:32:31 -08:00
Linus Torvalds aad3bf04dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/mmap
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/mmap:
  Add missing alignment check in arch/score sys_mmap()
  fix broken aliasing checks for MAP_FIXED on sparc32, mips, arm and sh
  Get rid of open-coding in ia64_brk()
  sparc_brk() is not needed anymore
  switch do_brk() to get_unmapped_area()
  Take arch_mmap_check() into get_unmapped_area()
  fix a struct file leak in do_mmap_pgoff()
  Unify sys_mmap*
  Cut hugetlb case early for 32bit on ia64
  arch_mmap_check() on mn10300
  Kill ancient crap in s390 compat mmap
  arm: add arch_mmap_check(), get rid of sys_arm_mremap()
  file ->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()
  kill useless checks in sparc mremap variants
  fix pgoff in "have to relocate" case of mremap()
  fix the arch checks in MREMAP_FIXED case
  fix checks for expand-in-place mremap
  do_mremap() untangling, part 3
  do_mremap() untangling, part 2
  untangling do_mremap(), part 1
2009-12-11 12:23:29 -08:00
Linus Torvalds 11bd04f6f3 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (109 commits)
  PCI: fix coding style issue in pci_save_state()
  PCI: add pci_request_acs
  PCI: fix BUG_ON triggered by logical PCIe root port removal
  PCI: remove ifdefed pci_cleanup_aer_correct_error_status
  PCI: unconditionally clear AER uncorr status register during cleanup
  x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource
  PCI: portdrv: remove redundant definitions
  PCI: portdrv: remove unnecessary struct pcie_port_data
  PCI: portdrv: minor cleanup for pcie_port_device_register
  PCI: portdrv: add missing irq cleanup
  PCI: portdrv: enable device before irq initialization
  PCI: portdrv: cleanup service irqs initialization
  PCI: portdrv: check capabilities first
  PCI: portdrv: move PME capability check
  PCI: portdrv: remove redundant pcie type calculation
  PCI: portdrv: cleanup pcie_device registration
  PCI: portdrv: remove redundant pcie_port_device_probe
  PCI: Always set prefetchable base/limit upper32 registers
  PCI: read-modify-write the pcie device control register when initiating pcie flr
  PCI: show dma_mask bits in /sys
  ...

Fixed up conflicts in:
	arch/x86/kernel/amd_iommu_init.c
	drivers/pci/dmar.c
	drivers/pci/hotplug/acpiphp_glue.c
2009-12-11 12:18:16 -08:00
Borislav Petkov 505422517d x86, msr: Add support for non-contiguous cpumasks
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.

This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.

Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-11 10:59:21 -08:00
H. Peter Anvin 5c6baba84e Merge commit 'linus/master' into x86/urgent 2009-12-11 10:57:42 -08:00
Jason Wessel 8097551d9a kgdb,x86: do not set kgdb_single_step on x86
On an SMP system the kgdb_single_step flag has the possibility to
indefinitely hang the system in the case.  Consider the case where,
CPU 1 has the schedule lock and CPU 0 is set to single step, there is
no way for CPU 0 to run another task.

The easy way to observe the problem is to make 2 cpus busy, and run
the kgdb test suite.  You will see that it hangs the system very
quickly.

while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
echo V1 > /sys/module/kgdbts/parameters/kgdbts

The side effect of this patch is that there is the possibility
to miss a breakpoint in the case that a single step operation
was executed to step over a breakpoint in common code.

The trade off of the missed breakpoint is preferred to
hanging the kernel.  This can be fixed in the future by
using kprobes or another strategy to step over planted
breakpoints with out of line execution.

CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2009-12-11 08:43:18 -06:00
Jason Wessel cf6f196d11 kgdb,i386: Fix corner case access to ss with NMI watch dog exception
It is possible for the user_mode_vm(regs) check to return true on the
i368 arch for a non master kgdb cpu or when the master kgdb cpu
handles the NMI watch dog exception.

The solution is simply to select the correct gdb_ss location
based on the check to user_mode_vm(regs).

CC: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2009-12-11 08:43:16 -06:00
Roel Kluin a5d09d6833 kgdb,x86: remove redundant test
The for loop starts with a breakno of 0, and ends when it's 4. so
this test is always true.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2009-12-11 08:43:12 -06:00
Al Viro f8b7256096 Unify sys_mmap*
New helper - sys_mmap_pgoff(); switch syscalls to using it.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-11 06:44:29 -05:00
Yinghai Lu 893f38d144 x86: Use find_e820() instead of hard coded trampoline address
Jens found the following crash/regression:

[    0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80
[    0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page

and

[    0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE

and bisected it to b24c2a9 ("x86: Move find_smp_config()
earlier and avoid bootmem usage").

It turns out the BIOS is using the first 64k for mptable,
without reserving it.

So try to find good range for the real-mode trampoline instead of
hard coding it, in case some bios tries to use that range for sth.

Reported-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4B21630A.6000308@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-11 09:28:22 +01:00
Prarit Bhargava ebb682f522 x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks
The per_cpu cpuid4_info shared_map can contain stale data when CPUs are added
and removed.

The stale data can lead to a NULL pointer derefernce panic on a remove of a
CPU that has had siblings previously removed.

This patch resolves the panic by verifying a cpu is actually online before
adding it to the shared_cpu_map, only examining cpus that are part of
the same lower level cache, and by updating other siblings lowest level cache
maps when a cpu is added.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
LKML-Reference: <20091209183336.17855.98708.sendpatchset@prarit.bos.redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10 17:19:03 -08:00
Brian Gerst df59e7bf43 x86: Merge kernel_thread()
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10 16:41:31 -08:00
Brian Gerst f443ff4201 x86: Sync 32/64-bit kernel_thread
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10 15:55:39 -08:00
Brian Gerst e840227c14 x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper
The arg should be in %eax, but that is clobbered by the return value
of clone.  The function pointer can be in any register.  Also, don't
push args onto the stack, since regparm(3) is the normal calling
convention now.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10 15:55:36 -08:00
Brian Gerst fa4b8f8438 x86, 64-bit: Use user_mode() to determine new stack pointer in copy_thread()
Use user_mode() instead of a magic value for sp to determine when returning
to kernel mode.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10 15:55:30 -08:00
Brian Gerst 3bd95dfb18 x86, 64-bit: Move kernel_thread to C
Prepare for merging with 32-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10 15:55:26 -08:00
Linus Torvalds d71cb81af3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Add debugobjects support
2009-12-10 09:35:44 -08:00
Linus Torvalds ab1831b0b8 Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: try harder to balloon up under memory pressure.
  Xen balloon: fix totalram_pages counting.
  xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
  xen: improve error handling in do_suspend.
  xen: don't leak IRQs over suspend/resume.
  xen: call clock resume notifier on all CPUs
  xen: use iret for return from 64b kernel to 32b usermode
  xen: don't call dpm_resume_noirq() with interrupts disabled.
  xen: register runstate info for boot CPU early
  xen: register runstate on secondary CPUs
  xen: register timer interrupt with IRQF_TIMER
  xen: correctly restore pfn_to_mfn_list_list after resume
  xen: restore runstate_info even if !have_vcpu_info_placement
  xen: re-register runstate area earlier on resume.
  xen: wait up to 5 minutes for device connetion
  xen: improvement to wait_for_devices()
  xen: fix is_disconnected_device/exists_disconnected_device
  xen/xenbus: make DEVICE_ATTR()s static
2009-12-10 09:35:02 -08:00
Cyrill Gorcunov 125580380f x86, perf events: Check if we have APIC enabled
Ralf Hildebrandt reported this boot warning:

| Running a vanilla 2.6.32 as Xen DomU, I'm getting:
|
| [    0.000999] CPU: Physical Processor ID: 0
| [    0.000999] CPU: Processor Core ID: 1
| [    0.000999] Performance Events: AMD PMU driver.
| [    0.000999] ------------[ cut here ]------------
| [    0.000999] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_dummy

So we need to check if APIC functionality is available, and
not just in the P6 driver but elsewhere as well.

Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091210165634.GF5086@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 18:00:30 +01:00
Xiao Guangrong 5e855db5d8 perf_event: Fix variable initialization in other codepaths
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4B20BAA6.7010609@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 17:23:02 +01:00
Christoph Hellwig 6b2f3d1f76 vfs: Implement proper O_SYNC semantics
While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag.  To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.

This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.

We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.

Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op.  We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:50 +01:00
Ingo Molnar 9cf7826743 Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-12-10 14:25:48 +01:00
Joerg Roedel 8638c4914f x86/amd-iommu: Fix PCI hotplug with passthrough mode
The device change notifier is initialized in the dma_ops
initialization path. But this path is never executed for
iommu=pt. Move the notifier initialization to IOMMU hardware
init code to fix this.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-12-10 12:23:47 +01:00
Joerg Roedel b7cc9554bc x86/amd-iommu: Fix passthrough mode
The data structure changes to use dev->archdata.iommu field
broke the iommu=pt mode because in this case the
dev->archdata.iommu was left uninitialized. This moves the
inititalization of the devices into the main init function
and fixes the problem.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-12-10 12:21:31 +01:00
Joe Perches 3a0340be06 x86: mmio-mod.c: Use pr_fmt
- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 - Remove #define NAME
 - Remove NAME from pr_<level>

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <009cb214c45ef932df0242856228f4739cc91408.1260383912.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 08:57:51 +01:00
Joe Perches 1bd591a5f1 x86: kmmio.c: Add and use pr_fmt(fmt)
- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 - Strip "kmmio: " from pr_<level>s

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <7aa509f8a23933036d39f54bd51e9acc52068049.1260383912.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 08:57:51 +01:00
Joe Perches a78d9626f4 x86: i8254.c: Add pr_fmt(fmt)
- Add pr_fmt(fmt) "pit: " fmt
 - Strip pit: prefixes from pr_debug

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <bbd4de532f18bb7c11f64ba20d224c08291cb126.1260383912.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 08:57:50 +01:00
Joe Perches 40685236b3 x86: setup_percpu.c: Use pr_<level> and add pr_fmt(fmt)
- Added #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 - Stripped PERCPU: from a pr_warning

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <7ead24eccbea8f2b11795abad3e2893a98e1e111.1260383912.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 08:57:50 +01:00
Joe Perches 5cd476effe x86: es7000_32.c: Use pr_<level> and add pr_fmt(fmt)
- Added #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 - Converted a few printk(KERN_INFO to pr_info(
 - Stripped "es7000_mipcfg" from pr_debug

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <3b4375af246dec5941168858910210937c110af9.1260383912.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 08:57:49 +01:00
Linus Torvalds 3067e02f8f Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPICA: Update version to 20091112.
  ACPICA: Add additional module-level code support
  ACPICA: Deploy new create integer interface where appropriate
  ACPICA: New internal utility function to create Integer objects
  ACPICA: Add repair for predefined methods that must return sorted lists
  ACPICA: Fix possible fault if return Package objects contain NULL elements
  ACPICA: Add post-order callback to acpi_walk_namespace
  ACPICA: Change package length error message to an info message
  ACPICA: Reduce severity of predefined repair messages, Warning to Info
  ACPICA: Update version to 20091013
  ACPICA: Fix possible memory leak for Scope ASL operator
  ACPICA: Remove possibility of executing _REG methods twice
  ACPICA: Add repair for bad _MAT buffers
  ACPICA: Add repair for bad _BIF/_BIX packages
2009-12-09 19:57:06 -08:00
Linus Torvalds 4ef58d4e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
  tree-wide: fix misspelling of "definition" in comments
  reiserfs: fix misspelling of "journaled"
  doc: Fix a typo in slub.txt.
  inotify: remove superfluous return code check
  hdlc: spelling fix in find_pvc() comment
  doc: fix regulator docs cut-and-pasteism
  mtd: Fix comment in Kconfig
  doc: Fix IRQ chip docs
  tree-wide: fix assorted typos all over the place
  drivers/ata/libata-sff.c: comment spelling fixes
  fix typos/grammos in Documentation/edac.txt
  sysctl: add missing comments
  fs/debugfs/inode.c: fix comment typos
  sgivwfb: Make use of ARRAY_SIZE.
  sky2: fix sky2_link_down copy/paste comment error
  tree-wide: fix typos "couter" -> "counter"
  tree-wide: fix typos "offest" -> "offset"
  fix kerneldoc for set_irq_msi()
  spidev: fix double "of of" in comment
  comment typo fix: sybsystem -> subsystem
  ...
2009-12-09 19:43:33 -08:00
H. Peter Anvin fc380ceed7 x86-64, paravirt: Call set_iopl_mask() on 64 bits
set_iopl_mask() is a no-op on 64 bits, but it is also a paravirt hook,
so call it even on 64 bits.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-3-git-send-email-brgerst@gmail.com>
2009-12-09 16:54:08 -08:00
H. Peter Anvin ce9119ad90 x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2
In the PTREGSCALL1 and 2 macros, we can trivially avoid an unnecessary
pipeline serialization, so do so.

In PTREGSCALLS3 this is much less clear-cut since we have to push a
new value to the stack.  Leave it alone for now assuming it is as good
as it is going to be; may want to check on Atom or another in-order
x86 to see if we can do better.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-2-git-send-email-brgerst@gmail.com>
2009-12-09 16:33:44 -08:00
Brian Gerst f839bbc5c8 x86: Merge sys_clone
Change 32-bit sys_clone to new PTREGSCALL stub, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:29:42 -08:00
Brian Gerst f1382f157f x86, 32-bit: Convert sys_vm86 & sys_vm86old
Convert these to new PTREGSCALL stubs.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:29:23 -08:00
Brian Gerst 052acad48a x86: Merge sys_sigaltstack
Change 32-bit sys_sigaltstack to PTREGSCALL2, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:28:59 -08:00
Brian Gerst 11cf88bd0b x86: Merge sys_execve
Change 32-bit sys_execve to PTREGSCALL3, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:28:34 -08:00
Brian Gerst 27f59559d6 x86: Merge sys_iopl
Change 32-bit sys_iopl to PTREGSCALL1, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:28:10 -08:00
Brian Gerst e258e4e0b4 x86-32: Add new pt_regs stubs
Add new stubs which add the pt_regs pointer as the last arg, matching
64-bit.  This will allow these syscalls to be easily merged.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:27:49 -08:00
Andy Isaacson a1884b8e55 x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs()
Robert Hancock observes that DMI_BOARD_NAME is often more useful
than DMI_PRODUCT_NAME, especially on standalone motherboards.
So, print both.

Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208083021.GB27174@hexapodia.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 10:17:59 +01:00
Andy Isaacson 814e2c84a7 x86: Factor duplicated code out of __show_regs() into show_regs_common()
Unify x86_32 and x86_64 implementations of __show_regs() header,
standardizing on the x86_64 format string in the process. Also,
32-bit will now call print_modules.

Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208082942.GA27174@hexapodia.org>
[ v2: resolved conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 10:17:58 +01:00
Frederic Weisbecker 44234adcdc hw-breakpoints: Modify breakpoints without unregistering them
Currently, when ptrace needs to modify a breakpoint, like disabling
it, changing its address, type or len, it calls
modify_user_hw_breakpoint(). This latter will perform the heavy and
racy task of unregistering the old breakpoint and registering a new
one.

This is racy as someone else might steal the reserved breakpoint
slot under us, which is undesired as the breakpoint is only
supposed to be modified, sometimes in the middle of a debugging
workflow. We don't want our slot to be stolen in the middle.

So instead of unregistering/registering the breakpoint, just
disable it while we modify its breakpoint fields and re-enable it
after if necessary.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1260347148-5519-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 09:48:20 +01:00
Joe Perches f58e1f53de arch/x86/kernel/microcode*: Use pr_fmt() and remove duplicated KERN_ERR prefix
- Use #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
- Remove "microcode: " prefix from each pr_<level>
- Fix duplicated KERN_ERR prefix
- Coalesce pr_<level> format strings
- Add a space after an exclamation point

No other change in output.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
LKML-Reference: <1260340250.27677.191.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 08:25:57 +01:00
Ingo Molnar 4c68db38c8 Merge branch 'linus' into x86/urgent
Merge reason: We want to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 08:24:57 +01:00
Linus Torvalds fbf07eac7b Merge branch 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Fix /proc/timer_list regression
  itimers: Fix racy writes to cpu_itimer fields
  timekeeping: Fix clock_gettime vsyscall time warp
2009-12-08 19:28:09 -08:00
Linus Torvalds 60d8ce2cd6 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers, init: Limit the number of per cpu calibration bootup messages
  posix-cpu-timers: optimize and document timer_create callback
  clockevents: Add missing include to pacify sparse
  x86: vmiclock: Fix printk format
  x86: Fix printk format due to variable type change
  sparc: fix printk for change of variable type
  clocksource/events: Fix fallout of generic code changes
  nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
  nohz: Track last do_timer() cpu
  nohz: Prevent clocksource wrapping during idle
  nohz: Type cast printk argument
  mips: Use generic mult/shift factor calculation for clocks
  clocksource: Provide a generic mult/shift factor calculation
  clockevents: Use u32 for mult and shift factors
  nohz: Introduce arch_needs_cpu
  nohz: Reuse ktime in sub-functions of tick_check_idle.
  time: Remove xtime_cache
  time: Implement logarithmic time accumulation
2009-12-08 19:27:08 -08:00
Linus Torvalds 849e8dea09 Merge branch 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: hpet: Make WARN_ON understandable
  x86: arch specific support for remapping HPET MSIs
  intr-remap: generic support for remapping HPET MSIs
  x86, hpet: Simplify the HPET code
  x86, hpet: Disable per-cpu hpet timer if ARAT is supported
2009-12-08 19:26:55 -08:00
Linus Torvalds e069efb6bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: core - Prevent too-small buffer sizes
  hwrng: virtio-rng - Convert to new API
  hwrng: core - Replace u32 in driver API with byte array
  crypto: ansi_cprng - Move FIPS functions under CONFIG_CRYPTO_FIPS
  crypto: testmgr - Add ghash algorithm test before provide to users
  crypto: ghash-clmulni-intel - Put proper .data section in place
  crypto: ghash-clmulni-intel - Use gas macro for PCLMULQDQ-NI and PSHUFB
  crypto: aesni-intel - Use gas macro for AES-NI instructions
  x86: Generate .byte code for some new instructions via gas macro
  crypto: ghash-intel - Fix irq_fpu_usable usage
  crypto: ghash-intel - Add PSHUFB macros
  crypto: ghash-intel - Hard-code pshufb
  crypto: ghash-intel - Fix building failure on x86_32
  crypto: testmgr - Fix warning
  crypto: ansi_cprng - Fix test in get_prng_bytes
  crypto: hash - Remove cra_u.{digest,hash}
  crypto: api - Remove digest case from procfs show handler
  crypto: hash - Remove legacy hash/digest code
  crypto: ansi_cprng - Add FIPS wrapper
  crypto: ghash - Add PCLMULQDQ accelerated implementation
2009-12-08 15:55:13 -08:00
Linus Torvalds 3ff6a468b4 Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpumask: Use modern cpumask style in Xen
2009-12-08 13:38:33 -08:00
Linus Torvalds 4646575daf Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: UV RTC: Always enable RTC clocksource
  x86: UV RTC: Rename generic_interrupt to x86_platform_ipi
  x86: UV RTC: Clean up error handling
  x86: UV RTC: Add clocksource only boot option
  x86: UV RTC: Fix early expiry handling
2009-12-08 13:38:21 -08:00
Linus Torvalds 86ed4aa457 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: don't restart timer if disabled
  x86: Use -maccumulate-outgoing-args for sane mcount prologues
  x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage
  x86: AMD Northbridge: Verify NB's node is online
  x86 VSDO: Fix Kconfig help
  x86: Fix typo in Intel CPU cache size descriptor
  x86: Add new Intel CPU cache size descriptors
2009-12-08 13:38:11 -08:00
Linus Torvalds 830cd2ac6e Merge branch 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  vgacon: Add support for setting the default cursor state
  vc: Add support for hiding the cursor when creating VTs
  x86, setup: Store the boot cursor state
2009-12-08 13:35:29 -08:00
Linus Torvalds 64227cd83d Merge branch 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/reboot: Add pci_dev_put in reboot_fixup_32.c for consistency
2009-12-08 13:35:18 -08:00
Linus Torvalds e7522ed5c0 Merge branch 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64: merge the standard and compat start_thread() functions
  x86-64: make compat_start_thread() match start_thread()
2009-12-08 13:34:53 -08:00
Linus Torvalds b391738bd1 Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: pat: Remove ioremap_default()
  x86: pat: Clean up req_type special case for reserve_memtype()
  x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED
2009-12-08 13:34:17 -08:00
Linus Torvalds e33c019722 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
  x86, mm: Correct the implementation of is_untracked_pat_range()
  x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
  x86, mtrr: Fix sorting of mtrr after subtracting
  x86: Move find_smp_config() earlier and avoid bootmem usage
  x86, platform: Change is_untracked_pat_range() to bool; cleanup init
  x86: Change is_ISA_range() into an inline function
  x86, mm: is_untracked_pat_range() takes a normal semiclosed range
  x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
  x86: UV SGI: Don't track GRU space in PAT
  x86: SGI UV: Fix BAU initialization
  x86, numa: Use near(er) online node instead of roundrobin for NUMA
  x86, numa, bootmem: Only free bootmem on NUMA failure path
  x86: Change crash kernel to reserve via reserve_early()
  x86: Eliminate redundant/contradicting cache line size config options
  x86: When cleaning MTRRs, do not fold WP into UC
  x86: remove "extern" from function prototypes in <asm/proto.h>
  x86, mm: Report state of NX protections during boot
  x86, mm: Clean up and simplify NX enablement
  x86, pageattr: Make set_memory_(x|nx) aware of NX support
  x86, sleep: Always save the value of EFER
  ...

Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
to 'struct x86_platform_ops') in
	arch/x86/include/asm/x86_init.h
	arch/x86/kernel/x86_init.c
2009-12-08 13:27:33 -08:00
Linus Torvalds 343036cea2 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: ucode-amd: Move family check to microcde_amd.c's init function
  x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle
  x86: ucode-amd: Convert printk(KERN_*...) to pr_*(...)
  x86: ucode-amd: Don't warn when no ucode is available for a CPU revision
  x86: ucode-amd: Load ucode-patches once and not separately of each CPU
  x86, amd-ucode: Remove needless log messages
2009-12-08 13:25:10 -08:00
Hidetoshi Seto 5c0e9f28da x86, mce: fix confusion between bank attributes and mce attributes
Commit cebe182033 had an unnecessary,
wrong change: &mce_banks[i].attr is equivalent to the former
bank_attrs[i], not to mce_attrs[i].

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <4B1E05CC.4040703@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-08 12:11:20 -08:00
Linus Torvalds 6035ccd8e9 Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits)
  cfq-iosched: Do not access cfqq after freeing it
  block: include linux/err.h to use ERR_PTR
  cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit
  blkio: Allow CFQ group IO scheduling even when CFQ is a module
  blkio: Implement dynamic io controlling policy registration
  blkio: Export some symbols from blkio as its user CFQ can be a module
  block: Fix io_context leak after failure of clone with CLONE_IO
  block: Fix io_context leak after clone with CLONE_IO
  cfq-iosched: make nonrot check logic consistent
  io controller: quick fix for blk-cgroup and modular CFQ
  cfq-iosched: move IO controller declerations to a header file
  cfq-iosched: fix compile problem with !CONFIG_CGROUP
  blkio: Documentation
  blkio: Wait on sync-noidle queue even if rq_noidle = 1
  blkio: Implement group_isolation tunable
  blkio: Determine async workload length based on total number of queues
  blkio: Wait for cfq queue to get backlogged if group is empty
  blkio: Propagate cgroup weight updation to cfq groups
  blkio: Drop the reference to queue once the task changes cgroup
  blkio: Provide some isolation between groups
  ...
2009-12-08 08:19:16 -08:00
Linus Torvalds ed9216c171 Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
  KVM: VMX: Fix comparison of guest efer with stale host value
  KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
  KVM: Drop user return notifier when disabling virtualization on a cpu
  KVM: VMX: Disable unrestricted guest when EPT disabled
  KVM: x86 emulator: limit instructions to 15 bytes
  KVM: s390: Make psw available on all exits, not just a subset
  KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
  KVM: VMX: Report unexpected simultaneous exceptions as internal errors
  KVM: Allow internal errors reported to userspace to carry extra data
  KVM: Reorder IOCTLs in main kvm.h
  KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
  KVM: only clear irq_source_id if irqchip is present
  KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
  KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
  KVM: VMX: Remove vmx->msr_offset_efer
  KVM: MMU: update invlpg handler comment
  KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
  KVM: remove duplicated task_switch check
  KVM: powerpc: Fix BUILD_BUG_ON condition
  KVM: VMX: Use shared msr infrastructure
  ...

Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
2009-12-08 08:02:38 -08:00
Linus Torvalds d7fc02c7ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
  mac80211: fix reorder buffer release
  iwmc3200wifi: Enable wimax core through module parameter
  iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
  iwmc3200wifi: Coex table command does not expect a response
  iwmc3200wifi: Update wiwi priority table
  iwlwifi: driver version track kernel version
  iwlwifi: indicate uCode type when fail dump error/event log
  iwl3945: remove duplicated event logging code
  b43: fix two warnings
  ipw2100: fix rebooting hang with driver loaded
  cfg80211: indent regulatory messages with spaces
  iwmc3200wifi: fix NULL pointer dereference in pmkid update
  mac80211: Fix TX status reporting for injected data frames
  ath9k: enable 2GHz band only if the device supports it
  airo: Fix integer overflow warning
  rt2x00: Fix padding bug on L2PAD devices.
  WE: Fix set events not propagated
  b43legacy: avoid PPC fault during resume
  b43: avoid PPC fault during resume
  tcp: fix a timewait refcnt race
  ...

Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
	kernel/sysctl_check.c
	net/ipv4/sysctl_net_ipv4.c
	net/ipv6/addrconf.c
	net/sctp/sysctl.c
2009-12-08 07:55:01 -08:00
Linus Torvalds 1557d33007 Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
  security/tomoyo: Remove now unnecessary handling of security_sysctl.
  security/tomoyo: Add a special case to handle accesses through the internal proc mount.
  sysctl: Drop & in front of every proc_handler.
  sysctl: Remove CTL_NONE and CTL_UNNUMBERED
  sysctl: kill dead ctl_handler definitions.
  sysctl: Remove the last of the generic binary sysctl support
  sysctl net: Remove unused binary sysctl code
  sysctl security/tomoyo: Don't look at ctl_name
  sysctl arm: Remove binary sysctl support
  sysctl x86: Remove dead binary sysctl support
  sysctl sh: Remove dead binary sysctl support
  sysctl powerpc: Remove dead binary sysctl support
  sysctl ia64: Remove dead binary sysctl support
  sysctl s390: Remove dead sysctl binary support
  sysctl frv: Remove dead binary sysctl support
  sysctl mips/lasat: Remove dead binary sysctl support
  sysctl drivers: Remove dead binary sysctl support
  sysctl crypto: Remove dead binary sysctl support
  sysctl security/keys: Remove dead binary sysctl support
  sysctl kernel: Remove binary sysctl logic
  ...
2009-12-08 07:38:50 -08:00
Jan Beulich bc09effabf x86/mce: Set up timer unconditionally
mce_timer must be passed to setup_timer() in all cases, no
matter whether it is going to be actually used. Otherwise, when
the CPU gets brought down, its call to del_timer_sync() will
never return, as the timer won't have a base associated, and
hence lock_timer_base() will loop infinitely.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: <stable@kernel.org>
LKML-Reference: <4B1DB831.2030801@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-08 05:34:39 +01:00
Jiri Kosina d014d04386 Merge branch 'for-next' into for-linus
Conflicts:

	kernel/irq/chip.c
2009-12-07 18:36:35 +01:00
Masami Hiramatsu d32ba45503 x86 insn: Delete empty or incomplete inat-tables.c
Delete empty or incomplete inat-tables.c if gen-insn-attr-x86.awk
failed, because it causes a build error if user tries to build
kernel next time.

Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091207170033.19230.37688.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-07 18:33:19 +01:00
Thomas Gleixner a946d8f11f x86: Fix bogus warning in apic_noop.apic_write()
apic_noop is used to provide dummy apic functions. It's installed
when the CPU has no APIC or when the APIC is disabled on the kernel
command line.

The apic_noop implementation of apic_write() warns when the CPU has
an APIC or when the APIC is not disabled.

That's bogus. The warning should only happen when the CPU has an
APIC _AND_ the APIC is not disabled. apic_noop.apic_read() has the
correct check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: <stable@kernel.org> # in <= .32 this typo resides in native_apic_write_dummy()
LKML-Reference: <alpine.LFD.2.00.0912071255420.3089@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-07 13:16:37 +01:00
Ingo Molnar f3d607c6b3 Merge branch 'linus' into x86/urgent
Merge reason: we want to queue up a dependent fix.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-07 13:14:18 +01:00
OGAWA Hirofumi cbe5c34c8c x86: Compile insn.c and inat.c only for KPROBES
At least, insn.c and inat.c is needed for kprobe for now. So,
this compile those only if KPROBES is enabled.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <878wdg8icq.fsf@devron.myhome.or.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-07 08:31:28 +01:00
Jean Delvare be2bf0a2df x86, perf probe: Fix warning in test_get_len()
Fix the following warning:

 arch/x86/tools/test_get_len.c: In function "main":
 arch/x86/tools/test_get_len.c:116: warning: unused variable "c"

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06 12:53:40 +01:00
Shaun Patterson 8055039c2a x86: Fix typo in arch/x86/mm/kmmio.c
Signed-off-by: Shaun Patterson <shaunpatterson@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: pq@iki.fi
LKML-Reference: <1260027694.10074.170.camel@linux-4lgc.site>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06 09:23:40 +01:00
Frederic Weisbecker af2d8289f5 x86: Fixup wrong irq frame link in stacktraces
When we enter in irq, two things can happen to preserve the link
to the previous frame pointer:

- If we were in an irq already, we don't switch to the irq stack
  as we are inside. We just need to save the previous frame
  pointer and to link the new one to the previous.

- Otherwise we need another level of indirection. We enter the irq with
  the previous stack. We save the previous bp inside and make bp
  pointing to its saved address. Then we switch to the irq stack and
  push bp another time but to the new stack. This makes two levels to
  dereference instead of one.

In the second case, the current stacktrace code omits the second level
and loses the frame pointer accuracy. The stack that follows will then
be considered as unreliable.

Handling that makes the perf callchain happier.
Before:

43.94%  [k] _raw_read_lock
            |
            --- _read_lock
               |
               |--60.53%-- send_sigio
               |          __kill_fasync
               |          kill_fasync
               |          evdev_pass_event
               |          evdev_event
               |          input_pass_event
               |          input_handle_event
               |          input_event
               |          synaptics_process_byte
               |          psmouse_handle_byte
               |          psmouse_interrupt
               |          serio_interrupt
               |          i8042_interrupt
               |          handle_IRQ_event
               |          handle_edge_irq
               |          handle_irq
               |          __irqentry_text_start
               |          ret_from_intr
               |          |
               |          |--30.43%-- __select
               |          |
               |          |--17.39%-- 0x454f15
               |          |
               |          |--13.04%-- __read
               |          |
               |          |--13.04%-- vread_hpet
               |          |
               |          |--13.04%-- _xcb_lock_io
               |          |
               |           --13.04%-- 0x7f630878ce8

After:

    50.00%  [k] _raw_read_lock
            |
            --- _read_lock
               |
               |--98.97%-- send_sigio
               |          __kill_fasync
               |          kill_fasync
               |          evdev_pass_event
               |          evdev_event
               |          input_pass_event
               |          input_handle_event
               |          input_event
               |          |
               |          |--96.88%-- synaptics_process_byte
               |          |          psmouse_handle_byte
               |          |          psmouse_interrupt
               |          |          serio_interrupt
               |          |          i8042_interrupt
               |          |          handle_IRQ_event
               |          |          handle_edge_irq
               |          |          handle_irq
               |          |          __irqentry_text_start
               |          |          ret_from_intr
               |          |          |
               |          |          |--39.78%-- __const_udelay
               |          |          |          |
               |          |          |          |--91.89%-- ath5k_hw_register_timeout
               |          |          |          |          ath5k_hw_noise_floor_calibration
               |          |          |          |          ath5k_hw_reset
               |          |          |          |          ath5k_reset
               |          |          |          |          ath5k_config
               |          |          |          |          ieee80211_hw_config
               |          |          |          |          |
               |          |          |          |          |--88.24%-- ieee80211_scan_work
               |          |          |          |          |          worker_thread
               |          |          |          |          |          kthread
               |          |          |          |          |          child_rip
               |          |          |          |          |
               |          |          |          |           --11.76%-- ieee80211_scan_completed
               |          |          |          |                     ieee80211_scan_work
               |          |          |          |                     worker_thread
               |          |          |          |                     kthread
               |          |          |          |                     child_rip
               |          |          |          |
               |          |          |           --8.11%-- ath5k_hw_noise_floor_calibration
               |          |          |                     ath5k_hw_reset
               |          |          |                     ath5k_reset
               |          |          |                     ath5k_config

Note: This does not only affect perf events but also x86-64
stacktraces. They were considered as unreliable once we quit
the irq stack frame.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2009-12-06 08:27:24 +01:00
Frederic Weisbecker b625b3b3b7 x86: Fixup wrong debug exception frame link in stacktraces
While dumping a stacktrace, the end of the exception stack won't link
the frame pointer to the previous stack.

The interrupted stack will then be considered as unreliable and ignored
by perf, as the frame pointer is unreliable itself.

This happens because we overwrite the frame pointer that links to the
interrupted frame with the address of the exception stack. This is
done in order to reserve space inside.
But rbp has been chosen here only because it is not a scratch register,
so that the address of the exception stack remains in rbp after calling
do_debug(), we can then release the exception stack space without the
need to retrieve its address again.

But we can pick another non-scratch register to do that, so that we
preserve the link to the interrupted stack frame in the stacktraces.

Just randomly choose r12. Every registers are saved just before and
restored just after calling do_debug(). And r12 is not used in the
middle, which makes it a perfect candidate.

Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw

Before:
    44.18%  [k] _raw_read_lock
            |
            |
            ---  |--6.31%-- waitid
                 |
                 |--4.26%-- writev
                 |
                 |--3.63%-- __select
                 |
                 |--3.15%-- __waitpid
                 |          |
                 |          |--28.57%-- 0x8b52e00000139f
                 |          |
                 |          |--28.57%-- 0x8b52e0000013c6
                 |          |
                 |          |--14.29%-- 0x7fde786dc000
                 |          |
                 |          |--14.29%-- 0x62696c2f7273752f
                 |          |
                 |           --14.29%-- 0x1ea9df800000000
                 |
                 |--3.00%-- __poll

After:

    43.94%  [k] _raw_read_lock
            |
            --- _read_lock
               |
               |--60.53%-- send_sigio
               |          __kill_fasync
               |          kill_fasync
               |          evdev_pass_event
               |          evdev_event
               |          input_pass_event
               |          input_handle_event
               |          input_event
               |          synaptics_process_byte
               |          psmouse_handle_byte
               |          psmouse_interrupt
               |          serio_interrupt
               |          i8042_interrupt
               |          handle_IRQ_event
               |          handle_edge_irq
               |          handle_irq
               |          __irqentry_text_start
               |          ret_from_intr
               |          |
               |          |--30.43%-- __select
               |          |
               |          |--17.39%-- 0x454f15
               |          |
               |          |--13.04%-- __read
               |          |
               |          |--13.04%-- vread_hpet
               |          |
               |          |--13.04%-- _xcb_lock_io
               |          |
               |           --13.04%-- 0x7f630878ce87

Note: it does not only affect perf events but also other stacktraces in
x86-64. They were considered as unreliable once we quit the debug
stack frame.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2009-12-06 08:27:22 +01:00
Frederic Weisbecker 7f33f9c5cc x86/perf: Exclude the debug stack from the callchains
Dumping the callchains from breakpoint events with perf gives strange
results:

3.75%             perf  [kernel]           [k] _raw_read_unlock
                       |
                       --- _raw_read_unlock
                           perf_callchain
                           perf_prepare_sample
                           __perf_event_overflow
                           perf_swevent_overflow
                           perf_swevent_add
                           perf_bp_event
                           hw_breakpoint_exceptions_notify
                           notifier_call_chain
                           __atomic_notifier_call_chain
                           atomic_notifier_call_chain
                           notify_die
                           do_debug
                           debug
                           munmap

We are infected with all the debug stack. Like the nmi stack, the debug
stack is undesired as it is part of the profiling path, not helpful for
the user.

Ignore it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-12-06 08:27:21 +01:00
Frederic Weisbecker b326e9560a hw-breakpoints: Use overflow handler instead of the event callback
struct perf_event::event callback was called when a breakpoint
triggers. But this is a rather opaque callback, pretty
tied-only to the breakpoint API and not really integrated into perf
as it triggers even when we don't overflow.

We prefer to use overflow_handler() as it fits into the perf events
rules, being called only when we overflow.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-12-06 08:27:18 +01:00
Frederic Weisbecker 2f0993e0fb hw-breakpoints: Drop callback and task parameters from modify helper
Drop the callback and task parameters from modify_user_hw_breakpoint().
For now we have no user that need to modify a breakpoint to the point
of changing its handler or its task context.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-12-06 08:27:17 +01:00
Linus Torvalds 6ec22f9b03 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Limit number of per cpu TSC sync messages
  x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks
  x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details
  x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes
  x86: Suppress stack overrun message for init_task
  x86: Fix cpu_devs[] initialization in early_cpu_init()
  x86: Remove CPU cache size output for non-Intel too
  x86: Minimise printk spew from per-vendor init code
  x86: Remove the CPU cache size printk's
  cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c
  x86: Make sure we also print a Code: line for show_regs()
2009-12-05 15:33:27 -08:00
Linus Torvalds 83be7d764d Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t
  x86, cpuid: Simplify the code in cpuid_open
  x86, cpuid: Remove the bkl from cpuid_open()
  x86, msr: Remove the bkl from msr_open()
  x86: AMD Geode LX optimizations
  x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
2009-12-05 15:32:35 -08:00
Linus Torvalds c2ed69cdc9 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix a section mismatch in arch/x86/kernel/setup.c
  x86: Fixup last users of irq_chip->typename
  x86: Remove BKL from apm_32
  x86: Remove BKL from microcode
  x86: use kernel_stack_pointer() in kprobes.c
  x86: use kernel_stack_pointer() in kgdb.c
  x86: use kernel_stack_pointer() in dumpstack.c
  x86: use kernel_stack_pointer() in process_32.c
2009-12-05 15:32:18 -08:00
Linus Torvalds ef26b1691d Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size
  x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
  x86/alternatives: Check replacementlen <= instrlen at build time
  x86, 64-bit: Set data segments to null after switching to 64-bit mode
  x86: Clean up the loadsegment() macro
  x86: Optimize loadsegment()
  x86: Add missing might_fault() checks to copy_{to,from}_user()
  x86-64: __copy_from_user_inatomic() adjustments
  x86: Remove unused thread_return label from switch_to()
  x86, 64-bit: Fix bstep_iret jump
  x86: Don't use the strict copy checks when branch profiling is in use
  x86, 64-bit: Move K8 B step iret fixup to fault entry asm
  x86: Generate cmpxchg build failures
  x86: Add a Kconfig option to turn the copy_from_user warnings into errors
  x86: Turn the copy_from_user check into an (optional) compile time warning
  x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
2009-12-05 15:32:03 -08:00
Linus Torvalds a77d2e081b Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
  x86, apic: Enable lapic nmi watchdog on AMD Family 11h
  x86: Remove unnecessary mdelay() from cpu_disable_common()
  x86, ioapic: Document another case when level irq is seen as an edge
  x86, ioapic: Fix the EOI register detection mechanism
  x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq
  x86: SGI UV: Map low MMR ranges
  x86: apic: Print out SRAT table APIC id in hex
  x86: Re-get cfg_new in case reuse/move irq_desc
  x86: apic: Remove not needed #ifdef
  x86: io-apic: IO-APIC MMIO should not fail on resource insertion
  x86: Remove asm/apicnum.h
  x86: apic: Do not use stacked physid_mask_t
  x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit
  x86, ioapic: Use snrpintf while set names for IO-APIC resourses
  x86, apic: Use PAGE_SIZE instead of numbers
  x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs()
  x86: Use EOI register in io-apic on intel platforms
  x86: Force irq complete move during cpu offline
  x86: Remove move_cleanup_count from irq_cfg
  x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping
  ...
2009-12-05 15:31:25 -08:00
Linus Torvalds c3fa27d136 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits)
  x86: Fix comments of register/stack access functions
  perf tools: Replace %m with %a in sscanf
  hw-breakpoints: Keep track of user disabled breakpoints
  tracing/syscalls: Make syscall events print callbacks static
  tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook
  perf: Don't free perf_mmap_data until work has been done
  perf_event: Fix compile error
  perf tools: Fix _GNU_SOURCE macro related strndup() build error
  trace_syscalls: Remove unused syscall_name_to_nr()
  trace_syscalls: Simplify syscall profile
  trace_syscalls: Remove duplicate init_enter_##sname()
  trace_syscalls: Add syscall_nr field to struct syscall_metadata
  trace_syscalls: Remove enter_id exit_id
  trace_syscalls: Set event_enter_##sname->data to its metadata
  trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit
  perf_event: Initialize data.period in perf_swevent_hrtimer()
  perf probe: Simplify event naming
  perf probe: Add --list option for listing current probe events
  perf probe: Add argv_split() from lib/argv_split.c
  perf probe: Move probe event utility functions to probe-event.c
  ...
2009-12-05 15:30:21 -08:00
David S. Miller 28b4d5cc17 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/net/pcmcia/fmvj18x_cs.c
	drivers/net/pcmcia/nmclan_cs.c
	drivers/net/pcmcia/xirc2ps_cs.c
	drivers/net/wireless/ray_cs.c
2009-12-05 15:22:26 -08:00
Linus Torvalds 96fa2b508d Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
  tracing: Separate raw syscall from syscall tracer
  ring-buffer-benchmark: Add parameters to set produce/consumer priorities
  tracing, function tracer: Clean up strstrip() usage
  ring-buffer benchmark: Run producer/consumer threads at nice +19
  tracing: Remove the stale include/trace/power.h
  tracing: Only print objcopy version warning once from recordmcount
  tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
  ring-buffer: Move access to commit_page up into function used
  tracing: do not disable interrupts for trace_clock_local
  ring-buffer: Add multiple iterations between benchmark timestamps
  kprobes: Sanitize struct kretprobe_instance allocations
  tracing: Fix to use __always_unused attribute
  compiler: Introduce __always_unused
  tracing: Exit with error if a weak function is used in recordmcount.pl
  tracing: Move conditional into update_funcs() in recordmcount.pl
  tracing: Add regex for weak functions in recordmcount.pl
  tracing: Move mcount section search to front of loop in recordmcount.pl
  tracing: Fix objcopy revision check in recordmcount.pl
  tracing: Check absolute path of input file in recordmcount.pl
  tracing: Correct the check for number of arguments in recordmcount.pl
  ...
2009-12-05 09:53:36 -08:00
Linus Torvalds 7b626acb8f Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
  x86/amd-iommu: Remove amd_iommu_pd_table
  x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
  x86/amd-iommu: Cleanup DTE flushing code
  x86/amd-iommu: Introduce iommu_flush_device() function
  x86/amd-iommu: Cleanup attach/detach_device code
  x86/amd-iommu: Keep devices per domain in a list
  x86/amd-iommu: Add device bind reference counting
  x86/amd-iommu: Use dev->arch->iommu to store iommu related information
  x86/amd-iommu: Remove support for domain sharing
  x86/amd-iommu: Rearrange dma_ops related functions
  x86/amd-iommu: Move some pte allocation functions in the right section
  x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
  x86/amd-iommu: Use get_device_id and check_device where appropriate
  x86/amd-iommu: Move find_protection_domain to helper functions
  x86/amd-iommu: Simplify get_device_resources()
  x86/amd-iommu: Let domain_for_device handle aliases
  x86/amd-iommu: Remove iommu specific handling from dma_ops path
  x86/amd-iommu: Remove iommu parameter from __(un)map_single
  x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
  ...
2009-12-05 09:49:07 -08:00
David Daney a5fc5eba4d x86: Convert BUG() to use unreachable()
Use the new unreachable() macro instead of for(;;);.  When
allyesconfig is built with a GCC-4.5 snapshot on i686 the size of the
text segment is reduced by 3987 bytes (from 6827019 to 6823032).

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-05 09:10:12 -08:00
Leann Ogasawara 4832ddda2e x86: ASUS P4S800 reboot=bios quirk
Bug reporter noted their system with an ASUS P4S800 motherboard would
hang when rebooting unless reboot=b was specified.  Their dmidecode
didn't contain descriptive System Information for Manufacturer or
Product Name, so I used their Base Board Information to create a
reboot quirk patch.  The bug reporter confirmed this patch resolves
the reboot hang.

Handle 0x0001, DMI type 1, 25 bytes
System Information
       Manufacturer: System Manufacturer
       Product Name: System Name
       Version: System Version
       Serial Number: SYS-1234567890
       UUID: E0BFCD8B-7948-D911-A953-E486B4EEB67F
       Wake-up Type: Power Switch

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
     Manufacturer: ASUSTeK Computer INC.
     Product Name: P4S800
     Version: REV 1.xx
     Serial Number: xxxxxxxxxxx

BugLink: http://bugs.launchpad.net/bugs/366682

ASUS P4S800 will hang when rebooting unless reboot=b is specified.
Add a quirk to reboot through the bios.

Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
LKML-Reference: <1259972107.4629.275.camel@emiko>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
2009-12-04 16:38:59 -08:00
Chris Wright 5d990b6275 PCI: add pci_request_acs
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.

Add a function to pci core to allow an IOMMU to request that ACS
be enabled.  The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order;  iommu has only been detected not initialized.

Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.

Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:19:24 -08:00
Yinghai Lu 575939cf54 x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource
This allows us to use the BIOS SR-IOV allocations rather than assigning
our own later on.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:00:17 -08:00
Adam Buchbinder 6070d81eb5 tree-wide: fix misspelling of "definition" in comments
"Definition" is misspelled "defintion" in several comments; this
patch fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 23:41:47 +01:00
André Goddard Rosa af901ca181 tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:55 +01:00
Ian Campbell f6eafe3665 xen: call clock resume notifier on all CPUs
tick_resume() is never called on secondary processors. Presumably this
is because they are offlined for suspend on native and so this is
normally taken care of in the CPU onlining path. Under Xen we keep all
CPUs online over a suspend.

This patch papers over the issue for me but I will investigate a more
generic, less hacky, way of doing to the same.

tick_suspend is also only called on the boot CPU which I presume should
be fixed too.

Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-12-03 11:14:55 -08:00
Jeremy Fitzhardinge 6aaf5d633b xen: use iret for return from 64b kernel to 32b usermode
If Xen wants to return to a 32b usermode with sysret it must use the
right form.  When using VCGF_in_syscall to trigger this, it looks at
the code segment and does a 32b sysret if it is FLAT_USER_CS32.
However, this is different from __USER32_CS, so it fails to return
properly if we use the normal Linux segment.

So avoid the whole mess by dropping VCGF_in_syscall and simply use
plain iret to return to usermode.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:54 -08:00
Jeremy Fitzhardinge 499d19b82b xen: register runstate info for boot CPU early
printk timestamping uses sched_clock, which in turn relies on runstate
info under Xen.  So make sure we set it up before any printks can
be called.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:53 -08:00
Ian Campbell 028896721a xen: register runstate on secondary CPUs
The commit "xen: re-register runstate area earlier on resume" caused us
to never try and setup the runstate area for secondary CPUs. Ensure that
we do this...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:52 -08:00
Ian Campbell f350c7922f xen: register timer interrupt with IRQF_TIMER
Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents
correct operation of stop_machine on multi-processor systems and breaks
suspend.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:52 -08:00
Ian Campbell fa24ba62ea xen: correctly restore pfn_to_mfn_list_list after resume
pvops kernels >= 2.6.30 can currently only be saved and restored once. The
second attempt to save results in:

    ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
    ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
    ERROR Internal error: Failed to map/save the p2m frame list

I finally narrowed it down to:

    commit cdaead6b4e
        Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
        Date:   Fri Feb 27 15:34:59 2009 -0800

            xen: split construction of p2m mfn tables from registration

            Build the p2m_mfn_list_list early with the rest of the p2m table, but
            register it later when the real shared_info structure is in place.

            Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The unforeseen side-effect of this change was to cause the mfn list list to not
be rebuilt on resume. Prior to this change it would have been rebuilt via
xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().

Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:51 -08:00
Jeremy Fitzhardinge 3905bb2aa7 xen: restore runstate_info even if !have_vcpu_info_placement
Even if have_vcpu_info_placement is not set, we still need to set up
the runstate area on each resumed vcpu.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:51 -08:00
Ian Campbell be012920ec xen: re-register runstate area earlier on resume.
This is necessary to ensure the runstate area is available to
xen_sched_clock before any calls to printk which will require it in
order to provide a timestamp.

I chose to pull the xen_setup_runstate_info out of xen_time_init into
the caller in order to maintain parity with calling
xen_setup_runstate_info separately from calling xen_time_resume.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:50 -08:00
Ingo Molnar d103d01e4b Merge branch 'perf/probes' into perf/core
Merge reason: add these fixes to 'perf probe'.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 20:11:38 +01:00
Ingo Molnar 26fb20d008 Merge branch 'perf/mce' into perf/core
Merge reason: It's ready for v2.6.33.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 20:11:06 +01:00
Mikael Pettersson 7d1849aff6 x86, apic: Enable lapic nmi watchdog on AMD Family 11h
The x86 lapic nmi watchdog does not recognize AMD Family 11h,
resulting in:

  NMI watchdog: CPU not supported

As far as I can see from available documentation (the BKDM),
family 11h looks identical to family 10h as far as the PMU
is concerned.

Extending the check to accept family 11h results in:

  Testing NMI watchdog ... OK.

I've been running with this change on a Turion X2 Ultra ZM-82
laptop for a couple of weeks now without problems.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
LKML-Reference: <19223.53436.931768.278021@pilspetsen.it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 16:25:15 +01:00
Jens Axboe 220d0b1dbf Merge branch 'master' into for-2.6.33 2009-12-03 13:49:39 +01:00
Xiaotian Feng 57fea8f7ab x86/reboot: Add pci_dev_put in reboot_fixup_32.c for consistency
pci_get_device will increase the ref count of found device.
Although we're going to reset soon, we should use pci_dev_put
to decrease the ref count for consistency.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1259838400-23833-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 12:17:55 +01:00
Darrick J. Wong 4528752f49 x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
On a multi-node x3950M2 system, there's a slight oddity in the
PCI device tree for all secondary nodes:

 30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
     \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

...as compared to the primary node:

 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
 03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
  \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

In both nodes, the LSI RAID controller hangs off a CalIOC2
device, but on the secondary nodes, the BIOS hides the VGA
device and substitutes the device tree ending with the disk
controller.

It would seem that Calgary devices don't necessarily appear at
the top of the PCI tree, which means that the current code to
find the Calgary IOMMU that goes with a particular device is
buggy.

Rather than walk all the way to the top of the PCI
device tree and try to match bus number with Calgary descriptor,
the code needs to examine each parent of the particular device;
if it encounters a Calgary with a matching bus number, simply
use that.

Otherwise, we BUG() when the bus number of the Calgary doesn't
match the bus number of whatever's at the top of the device tree.

Extra note: This patch appears to work correctly for the x3950
that came before the x3950 M2.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Corinna Schultz <coschult@us.ibm.com>
Cc: <stable@kernel.org>
LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:44:05 +01:00
Avi Kivity d5696725b2 KVM: VMX: Fix comparison of guest efer with stale host value
update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.

However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.

Fix by deferring the mask/compare to the actual point of guest entry.

Noted by Marcelo.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:34:20 +02:00
Avi Kivity 3548bab501 KVM: Drop user return notifier when disabling virtualization on a cpu
This way, we don't leave a dangling notifier on cpu hotunplug or module
unload.  In particular, module unload leaves the notifier pointing into
freed memory.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:26 +02:00
Sheng Yang 046d87103a KVM: VMX: Disable unrestricted guest when EPT disabled
Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
supported processors.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity eb3c79e64a KVM: x86 emulator: limit instructions to 15 bytes
While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Jan Kiszka 3cfc3092f4 KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity 65ac726404 KVM: VMX: Report unexpected simultaneous exceptions as internal errors
These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults.  If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Avi Kivity a9c7399d6c KVM: Allow internal errors reported to userspace to carry extra data
Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available.  Add extra data to internal error
reporting so we can expose it to the debugger.  Extra data is specific
to the suberror.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Jan Kiszka 4f926bf291 KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
avoid triggering the WARN_ON in kvm_queue_exception if there is already
an exception pending and reject such invalid requests.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:24 +02:00
Marcelo Tosatti 2204ae3c96 KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
Otherwise kvm might attempt to dereference a NULL pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti 3ddea128ad KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
Also serialize multiple accesses with kvm->lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Avi Kivity 92c0d90015 KVM: VMX: Remove vmx->msr_offset_efer
This variable is used to communicate between a caller and a callee; switch
to a function argument instead.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti 5f5c35aad5 KVM: MMU: update invlpg handler comment
Large page translations are always synchronized (either in level 3
or level 2), so its not necessary to properly deal with them
in the invlpg handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti 7c93be44a4 KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.

Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Gleb Natapov 1655e3a3dc KVM: remove duplicated task_switch check
Probably introduced by a bad merge.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Avi Kivity 26bb0981b3 KVM: VMX: Use shared msr infrastructure
Instead of reloading syscall MSRs on every preemption, use the new shared
msr infrastructure to reload them at the last possible minute (just before
exit to userspace).

Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be
reloaded as well).

[jan: fix slot index missing indirection]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Avi Kivity 18863bdd60 KVM: x86 shared msr infrastructure
The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Avi Kivity 44ea2b1758 KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area
Currently MSR_KERNEL_GS_BASE is saved and restored as part of the
guest/host msr reloading.  Since we wish to lazy-restore all the other
msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using
the common code.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Eduardo Habkost 3ce672d484 KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization
The svm_set_cr0() call will initialize save->cr0 properly even when npt is
enabled, clearing the NW and CD bits as expected, so we don't need to
initialize it manually for npt_enabled anymore.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Eduardo Habkost 18fa000ae4 KVM: SVM: Reset cr0 properly on vcpu reset
svm_vcpu_reset() was not properly resetting the contents of the guest-visible
cr0 register, causing the following issue:
https://bugzilla.redhat.com/show_bug.cgi?id=525699

Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine
with paging enabled, making the vcpu get a pagefault exception while trying to
run it.

Instead of setting vmcb->save.cr0 directly, the new code just resets
kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().

kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Eduardo Habkost fa40052ca0 KVM: VMX: Use macros instead of hex value on cr0 initialization
This should have no effect, it is just to make the code clearer.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Glauber Costa afbcf7ab8d KVM: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:19 +02:00
Jan Kiszka 6be7d3062b KVM: SVM: Cleanup NMI singlestep
Push the NMI-related singlestep variable into vcpu_svm. It's dealing
with an AMD-specific deficit, nothing generic for x86.

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

 arch/x86/include/asm/kvm_host.h |    1 -
 arch/x86/kvm/svm.c              |   12 +++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:19 +02:00
Jan Kiszka 94fe45da48 KVM: x86: Fix guest single-stepping while interruptible
Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.

Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:19 +02:00
Ed Swierk ffde22ac53 KVM: Xen PV-on-HVM guest support
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:18 +02:00
Jan Kiszka 94c30d9ca6 KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check
This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Marcelo Tosatti 9fb41ba896 KVM: VMX: fix handle_pause declaration
There's no kvm_run argument anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Zachary Amsden 6b7d7e762b KVM: x86: Harden against cpufreq
If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Mark Langsdorf 565d0998ec KVM: SVM: Support Pause Filter in AMD processors
New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature.  This feature creates a new field in the VMCB
called Pause Filter Count.  If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Zhai, Edwin 4b8d54f972 KVM: VMX: Add support for Pause-Loop Exiting
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Joerg Roedel d36f19e9ec KVM: SVM: Remove nsvm_printk debugging code
With all important informations now delivered through
tracepoints we can savely remove the nsvm_printk debugging
code for nested svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Joerg Roedel 532a46b989 KVM: SVM: Add tracepoint for skinit instruction
This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel ec1ff79084 KVM: SVM: Add tracepoint for invlpga instruction
This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel 236649de33 KVM: SVM: Add tracepoint for #vmexit because intr pending
This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel 17897f3668 KVM: SVM: Add tracepoint for injected #vmexit
This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel d8cabddf7e KVM: SVM: Add tracepoint for nested #vmexit
This patch adds a tracepoint for every #vmexit we get from a
nested guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel 0ac406de8f KVM: SVM: Add tracepoint for nested vmrun
This patch adds a dedicated kvm tracepoint for a nested
vmrun.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel cd3ff653ae KVM: SVM: Move INTR vmexit out of atomic code
The nested SVM code emulates a #vmexit caused by a request
to open the irq window right in the request function. This
is a bug because the request function runs with preemption
and interrupts disabled but the #vmexit emulation might
sleep. This can cause a schedule()-while-atomic bug and is
fixed with this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Alexander Graf 8d23c46624 KVM: SVM: Notify nested hypervisor of lost event injections
If event_inj is valid on a #vmexit the host CPU would write
the contents to exit_int_info, so the hypervisor knows that
the event wasn't injected.

We don't do this in nested SVM by now which is a bug and
fixed by this patch.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:14 +02:00
Glauber Costa e3267cbbbf KVM: x86: include pvclock MSRs in msrs_to_save
For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: stable@kernel.org
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:14 +02:00
Jan Kiszka 91586a3b7d KVM: x86: Rework guest single-step flag injection and filtering
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Marcelo Tosatti a68a6a7282 KVM: x86: disable paravirt mmu reporting
Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Jan Kiszka 355be0b930 KVM: x86: Refactor guest debug IOCTL handling
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Juan Quintela 201d945bcf KVM: remove pre_task_link setting in save_state_to_tss16
Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37a1
  Author: Gleb Natapov <gleb@redhat.com>
  Date:   Mon Mar 30 16:03:24 2009 +0300

    KVM: Fix task switch back link handling.

CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:13 +02:00
Zachary Amsden 3230bb4707 KVM: Fix hotplug of CPUs
Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.

Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:13 +02:00
Zachary Amsden e6732a5af9 KVM: Fix printk name error in svm.c
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:13 +02:00
Zachary Amsden 0cca790753 KVM: Kill the confusing tsc_ref_khz and ref_freq variables
They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.

On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Zachary Amsden b820cc0ca2 KVM: Separate timer intialization into an indepedent function
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Joerg Roedel e935d48e1b KVM: SVM: Remove remaining occurences of rdtscll
This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Joerg Roedel 33527ad7e1 KVM: SVM: don't copy exit_int_info on nested vmrun
The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:11 +02:00
Joerg Roedel 7fcdb5103d KVM: SVM: reorganize svm_interrupt_allowed
This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:11 +02:00
Huang Weiyi bfc33beaed KVM: remove duplicated #include
Remove duplicated #include('s) in
  arch/x86/kvm/lapic.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:10 +02:00
Alexander Graf 10474ae894 KVM: Activate Virtualization On Demand
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Marcelo Tosatti e8b3433a5c KVM: SVM: remove needless mmap_sem acquision from nested_svm_map
nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Mohammed Gamal 80ced186d1 KVM: VMX: Enhance invalid guest state emulation
- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that have already failed

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:09 +02:00
Mohammed Gamal abcf14b560 KVM: x86 emulator: Add pusha and popa instructions
This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.

[marcelo: remove unused variable]

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:09 +02:00
Mohammed Gamal 94677e61fd KVM: x86 emulator: Add missing decoder flags for 'or' instructions
Add missing decoder flags for or instructions (0xc-0xd).

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:09 +02:00
Avi Kivity bfd99ff5d4 KVM: Move assigned device code to own file
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:09 +02:00
Avi Kivity 367e1319b2 KVM: Return -ENOTTY on unrecognized ioctls
Not the incorrect -EINVAL.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:08 +02:00
Gleb Natapov 680b3648ba KVM: Drop kvm->irq_lock lock from irq injection path
The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:08 +02:00
Gleb Natapov eba0226bdf KVM: Move IO APIC to its own lock
The allows removal of irq_lock from the injection path.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:08 +02:00
Gleb Natapov 136bdfeee7 KVM: Move irq ack notifier list to arch independent code
Mask irq notifier list is already there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:07 +02:00
Gleb Natapov 3e71f88bc9 KVM: Maintain back mapping from irqchip/pin to gsi
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:07 +02:00
Gleb Natapov 1a6e4a8c27 KVM: Move irq sharing information to irqchip level
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:06 +02:00
Gleb Natapov 79c727d437 KVM: Call pic_clear_isr() on pic reset to reuse logic there
Also move call of ack notifiers after pic state change.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:06 +02:00
Avi Kivity 851ba6922a KVM: Don't pass kvm_run arguments
They're just copies of vcpu->run, which is readily accessible.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:06 +02:00
Mohammed Gamal d8769fedd4 KVM: x86 emulator: Introduce No64 decode option
Introduces a new decode option "No64", which is used for instructions that are
invalid in long mode.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:05 +02:00
Mohammed Gamal 0934ac9d13 KVM: x86 emulator: Add 'push/pop sreg' instructions
[avi: avoid buffer overflow]

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:05 +02:00
Avi Kivity 58988b07cf Merge remote branch 'tip/x86/entry' into kvm-updates/2.6.33
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:30:06 +02:00
Hidetoshi Seto fe5ed91ddc x86, mce: don't restart timer if disabled
Even it is in error path unlikely taken, add_timer_on() at
CPU_DOWN_FAILED* needs to be skipped if mce_timer is disabled.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-02 21:27:32 -08:00
Jan Beulich 99063c0bce x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
This at once also gets the alignment specification right for
x86-64.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8F80200007800022708@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 11:39:45 +01:00
Jan Beulich 01be50a308 x86/alternatives: Check replacementlen <= instrlen at build time
Having run into the run-(boot-)time check a couple of times lately,
I finally took time to find a build-time check so that one doesn't
need to analyze the register/stack dump and resolve this (through
manual lookup in vmlinux) to the offending construct.

The assembler will emit a message like "Error: value of <num> too
large for field of 1 bytes at <offset>", which while not pointing
out the source location still makes analysis quite a bit easier.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8AA0200007800022703@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 11:39:45 +01:00
Masami Hiramatsu e859cf8656 x86: Fix comments of register/stack access functions
Fix typos and some redundant comments of register/stack access
functions in asm/ptrace.h.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Wenji Huang <wenji.huang@oracle.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
LKML-Reference: <20091201000222.7669.7477.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suggested-by: Wenji Huang <wenji.huang@oracle.com>
2009-12-02 10:22:22 +01:00
Suresh Siddha 6d20792e85 x86: Remove unnecessary mdelay() from cpu_disable_common()
fixup_irqs() already has a mdelay(). Remove the extra and
unnecessary mdelay() from cpu_disable_common().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233335.232177348@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 10:11:02 +01:00
Suresh Siddha 1c83995b6c x86, ioapic: Document another case when level irq is seen as an edge
In the case when cpu goes offline, fixup_irqs() will forward any
unhandled interrupt on the offlined cpu to the new cpu
destination that is handling the corresponding interrupt. This
interrupt forwarding is done via IPI's. Hence, in this case also
level-triggered io-apic interrupt will be seen as an edge
interrupt in the cpu's APIC IRR.

Document this scenario in the code which handles this case by doing
an explicit EOI to the io-apic to clear remote IRR of the io-apic RTE.

Requested-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233335.143970505@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 10:11:01 +01:00
Suresh Siddha c29d9db338 x86, ioapic: Fix the EOI register detection mechanism
Maciej W. Rozycki reported:

> 82093AA I/O APIC has its version set to 0x11 and it
> does not support the EOI register.  Similarly I/O APICs
> integrated into the 82379AB south bridge and the 82374EB/SB
> EISA component.

IO-APIC versions below 0x20 don't support EOI register.

Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic
version as 0x2. This is an error with documentation and these
ICH chips use io-apic's of version 0x20 and indeed has a working
EOI register for the io-apic.

Fix the EOI register detection mechanism to check for version
0x20 and beyond.

And also, a platform can potentially  have io-apic's with
different versions. Make the EOI register check per io-apic.

Reported-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233335.065361533@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 10:11:01 +01:00
Maciej W. Rozycki ca64c47cec x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq
When the level-triggered interrupt is seen as an edge interrupt,
we try to clear the remoteIRR explicitly (using either an
io-apic eoi register when present or through the idea of
changing trigger mode of the io-apic RTE to edge and then back
to level). But this explicit try also needs to happen before we
try to migrate the irq. Otherwise irq migration attempt will
fail anyhow, as it postpones the irq migration to a later
attempt when it sees the remoteIRR in the io-apic RTE still set.

Signed-off-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233334.975416130@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 10:11:00 +01:00
Frederic Weisbecker 1cedae7290 hw-breakpoints: Keep track of user disabled breakpoints
When we disable a breakpoint through dr7, we unregister it right
away, making us lose track of its corresponding address
register value.

It means that the following sequence would be unsupported:

 - set address in dr0
 - enable it through dr7
 - disable it through dr7
 - enable it through dr7

because we lost the address register value when we disabled the
breakpoint.

Don't unregister the disabled breakpoints but rather disable
them.

Reported-by: "K.Prasad" <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259735536-9236-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 09:59:03 +01:00
David S. Miller ff9c38bba3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/mac80211/ht.c
2009-12-01 22:13:38 -08:00
Herbert Xu 8386324381 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2009-12-01 15:16:22 +08:00
H. Peter Anvin ccef086454 x86, mm: Correct the implementation of is_untracked_pat_range()
The semantics the PAT code expect of is_untracked_pat_range() is "is
this range completely contained inside the untracked region."  This
means that checkin 8a27138924 was
technically wrong, because the implementation needlessly confusing.

The sane interface is for it to take a semiclosed range like just
about everything else (as evidenced by the sheer number of "- 1"'s
removed by that patch) so change the actual implementation to match.

Reported-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
2009-11-30 21:33:51 -08:00
Helight.Xu 9eaa192d89 x86: Fix a section mismatch in arch/x86/kernel/setup.c
copy_edd() should be __init.
warning msg:
WARNING: vmlinux.o(.text+0x7759): Section mismatch in reference from the
function copy_edd() to the variable .init.data:boot_params
The function copy_edd() references
the variable __initdata boot_params.
This is often because copy_edd lacks a __initdata
annotation or the annotation of boot_params is wrong.

Signed-off-by: ZhenwenXu <helight.xu@gmail.com>
LKML-Reference: <4B139F8F.4000907@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-11-30 11:16:49 -08:00
Thomas Gleixner b8b7d791a8 x86: Use -maccumulate-outgoing-args for sane mcount prologues
commit 746357d (x86: Prevent GCC 4.4.x (pentium-mmx et al) function
prologue wreckage) uses -mtune=generic to work around the function
prologue problem with mcount on -march=pentium-mmx and others.

Jakub pointed out that we can use -maccumulate-outgoing-args instead
which is selected by -mtune=generic and prevents the problem without
losing the -march specific optimizations.

Pointed-out-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org
2009-11-28 15:08:30 +01:00
Thomas Gleixner 18ed61da98 x86: hpet: Make WARN_ON understandable
Andrew complained rightly that the WARN_ON in hpet_next_event() is
confusing and the code comment not really helpful.

Change it to WARN_ONCE and print the reason in clear text. Change the
comment to explain what kind of hardware wreckage we deal with.

Pointed-out-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
2009-11-27 20:37:41 +01:00
Joerg Roedel 492667dacc x86/amd-iommu: Remove amd_iommu_pd_table
The data that was stored in this table is now available in
dev->archdata.iommu. So this table is not longer necessary.
This patch removes the remaining uses of that variable and
removes it from the code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:37 +01:00
Joerg Roedel 8eed983334 x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
This patch removes the ugly contruct where the
iommu->lock must be released while before calling the
reset_iommu_command_buffer function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:37 +01:00
Joerg Roedel b00d3bcff4 x86/amd-iommu: Cleanup DTE flushing code
This patch cleans up the code to flush device table entries
in the IOMMU. With this chance the driver can get rid of the
iommu_queue_inv_dev_entry() function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:36 +01:00
Joerg Roedel 3fa43655d8 x86/amd-iommu: Introduce iommu_flush_device() function
This patch adds a function to flush a DTE entry for a given
struct device and replaces iommu_queue_inv_dev_entry calls
with this function where appropriate.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:35 +01:00
Joerg Roedel 7f760ddd70 x86/amd-iommu: Cleanup attach/detach_device code
This patch cleans up the attach_device and detach_device
paths and fixes reference counting while at it.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:35 +01:00
Joerg Roedel 7c392cbe98 x86/amd-iommu: Keep devices per domain in a list
This patch introduces a list to each protection domain which
keeps all devices associated with the domain. This can be
used later to optimize certain functions and to completly
remove the amd_iommu_pd_table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:34 +01:00
Joerg Roedel 241000556f x86/amd-iommu: Add device bind reference counting
This patch adds a reference count to each device to count
how often the device was bound to that domain. This is
important for single devices that act as an alias for a
number of others. These devices must stay bound to their
domains until all devices that alias to it are unbound from
the same domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:33 +01:00
Joerg Roedel 657cbb6b6c x86/amd-iommu: Use dev->arch->iommu to store iommu related information
This patch changes IOMMU code to use dev->archdata->iommu to
store information about the alias device and the domain the
device is attached to.
This allows the driver to get rid of the amd_iommu_pd_table
in the future.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:32 +01:00
Joerg Roedel 8793abeb78 x86/amd-iommu: Remove support for domain sharing
This patch makes device isolation mandatory and removes
support for the amd_iommu=share option. This simplifies the
code in several places.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:32 +01:00
Joerg Roedel 171e7b3739 x86/amd-iommu: Rearrange dma_ops related functions
This patch rearranges two dma_ops related functions so that
their forward declarations are not longer necessary.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:31 +01:00
Joerg Roedel 308973d3b9 x86/amd-iommu: Move some pte allocation functions in the right section
This patch moves alloc_pte() and fetch_pte() into the page
table handling code section so that the forward declarations
for them could be removed.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:30 +01:00
Joerg Roedel 87a64d5238 x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
This function doesn't use the parameter anymore so it can be
removed.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:30 +01:00
Joerg Roedel 98fc5a693b x86/amd-iommu: Use get_device_id and check_device where appropriate
The logic of these two functions is reimplemented (at least
in parts) in places in the code. This patch removes these
code duplications and uses the functions instead. As a side
effect it moves check_device() to the helper function code
section.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:29 +01:00
Joerg Roedel 71c70984e5 x86/amd-iommu: Move find_protection_domain to helper functions
This is a helper function and when its placed in the helper
function section we can remove its forward declaration.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:28 +01:00
Joerg Roedel 94f6d190ee x86/amd-iommu: Simplify get_device_resources()
With the previous changes the get_device_resources function
can be simplified even more. The only important information
for the callers is the protection domain.
This patch renames the function to get_domain() and let it
only return the protection domain for a device.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:21 +01:00
Joerg Roedel 15898bbcb4 x86/amd-iommu: Let domain_for_device handle aliases
If there is no domain associated to a device yet and the
device has an alias device which already has a domain, the
original device needs to have the same domain as the alias
device.
This patch changes domain_for_device to handle this
situation and directly assigns the alias device domain to
the device in this situation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:17:09 +01:00
Joerg Roedel f3be07da53 x86/amd-iommu: Remove iommu specific handling from dma_ops path
This patch finishes the removal of all iommu specific
handling code in the dma_ops path.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:17:08 +01:00
Joerg Roedel cd8c82e875 x86/amd-iommu: Remove iommu parameter from __(un)map_single
With the prior changes this parameter is not longer
required. This patch removes it from the function and all
callers.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:17:08 +01:00
Joerg Roedel 576175c250 x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
Since the assumption that an dma_ops domain is only bound to
one IOMMU was given up we need to make alloc_new_range aware
of it.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:17:01 +01:00
Joerg Roedel 680525e06d x86/amd-iommu: Remove iommu parameter from dma_ops_domain_(un)map
The parameter is unused in these function so remove it from
the parameter list.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:31 +01:00
Joerg Roedel f99c0f1c75 x86/amd-iommu: Use check_device in get_device_resources
Every call-place of get_device_resources calls check_device
before it. So call it from get_device_resources directly and
simplify the code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:30 +01:00
Joerg Roedel 420aef8a3a x86/amd-iommu: Use check_device for amd_iommu_dma_supported
The check_device logic needs to include the dma_supported
checks to be really sure. Merge the dma_supported logic into
check_device and use it to implement dma_supported.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:30 +01:00
Joerg Roedel 318afd41d2 x86/amd-iommu: Make np-cache a global flag
The non-present cache flag was IOMMU local until now which
doesn't make sense. Make this a global flag so we can remove
the lase user of 'struct iommu' in the map/unmap path.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:29 +01:00
Joerg Roedel 09b4280439 x86/amd-iommu: Reimplement flush_all_domains_on_iommu()
This patch reimplements the function
flush_all_domains_on_iommu to use the global protection
domain list.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:28 +01:00
Joerg Roedel e3306664eb x86/amd-iommu: Reimplement amd_iommu_flush_all_domains()
This patch reimplementes the amd_iommu_flush_all_domains
function to use the global protection domain list instead
of flushing every domain on every IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:28 +01:00
Joerg Roedel aeb26f5533 x86/amd-iommu: Implement protection domain list
This patch adds code to keep a global list of all protection
domains. This allows to simplify the resume code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:27 +01:00
Joerg Roedel 601367d76b x86/amd-iommu: Remove iommu_flush_domain function
This iommu_flush_tlb_pde function does essentially the same.
So the iommu_flush_domain function is redundant and can be
removed.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:26 +01:00
Joerg Roedel dcd1e92e40 x86/amd-iommu: Use __iommu_flush_pages for tlb flushes
This patch re-implements iommu_flush_tlb functions to use
the __iommu_flush_pages logic.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:26 +01:00
Joerg Roedel 6de8ad9b9e x86/amd-iommu: Make iommu_flush_pages aware of multiple IOMMUs
This patch extends the iommu_flush_pages function to flush
the TLB entries on all IOMMUs the domain has devices on.
This basically gives up the former assumption that dma_ops
domains are only bound to one IOMMU in the system.
For dma_ops domains this is still true but not for
IOMMU-API managed domains. Giving this assumption up for
dma_ops domains too allows code simplification.
Further it splits out the main logic into a generic function
which can be used by iommu_flush_tlb too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:18 +01:00
Joerg Roedel 0518a3a458 x86/amd-iommu: Add function to complete a tlb flush
This patch adds a function to the AMD IOMMU driver which
completes all queued commands an all IOMMUs a specific
domain has devices attached on. This is required in a later
patch when per-domain flushing is implemented.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 11:45:50 +01:00
Joerg Roedel c459611424 x86/amd-iommu: Add per IOMMU reference counting
This patch adds reference counting for protection domains
per IOMMU. This allows a smarter TLB flushing strategy.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 11:45:50 +01:00
Joerg Roedel bb52777ec4 x86/amd-iommu: Add an index field to struct amd_iommu
This patch adds an index field to struct amd_iommu which can
be used to lookup it up in an array. This index will be used
in struct protection_domain to keep track which protection
domain has devices behind which IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 11:45:49 +01:00
Joerg Roedel bf3118c127 x86/amd-iommu: Update copyright headers
This patch updates the copyright headers in the relevant AMD
IOMMU driver files to match the date of the latest changes.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 11:45:49 +01:00
Joerg Roedel 6a9401a7ac x86/amd-iommu: Separate internal interface definitions
This patch moves all function declarations which are only
used inside the driver code to a seperate header file.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 11:45:48 +01:00
Frederic Weisbecker 5fa10b28e5 hw-breakpoints: Use struct perf_event_attr to define user breakpoints
In-kernel user breakpoints are created using functions in which
we pass breakpoint parameters as individual variables: address,
length and type.

Although it fits well for x86, this just does not scale across
archictectures that may support this api later as these may have
more or different needs. Pass in a perf_event_attr structure
instead because it is meant to evolve as much as possible into
a generic hardware breakpoint parameter structure.

Reported-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259294154-5197-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27 06:22:58 +01:00
Xiaotian Feng dd4377b02d x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
If pat is disabled (boot with nopat), there's no need to create
debugfs for it, it's empty all the time.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1259236428-16329-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 15:05:03 +01:00
Jack Steiner 918bc960dc x86: SGI UV: Map low MMR ranges
Explicitly mmap the UV chipset MMR address ranges used to
access blade-local registers. Although these same MMRs are also
mmaped at higher addresses, the low range is more
convenient when accessing blade-local registers.

The low range addresses always alias to the local blade
regardless of the blade id.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091125162018.GA25445@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:52:36 +01:00
Brian Gerst 8ec6993d9f x86, 64-bit: Set data segments to null after switching to 64-bit mode
This prevents kernel threads from inheriting non-null segment
selectors, and causing optimizations in __switch_to() to be
ineffective.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Tim Blechmann <tim@klingt.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jan Beulich <JBeulich@novell.com>
LKML-Reference: <1259165856-3512-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:44:30 +01:00
Ingo Molnar 64b028b226 x86: Clean up the loadsegment() macro
Make it readable in the source too, not just in the assembly output.
No change in functionality.

Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:38:52 +01:00
Brian Gerst 79b0379cee x86: Optimize loadsegment()
Zero the input register in the exception handler instead of
using an extra register to pass in a zero value.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:33:58 +01:00
Hidetoshi Seto 767df1bdd8 x86, mce: Add __cpuinit to hotplug callback functions
The mce_disable_cpu() and mce_reenable_cpu() are called only
from mce_cpu_callback() which is marked as __cpuinit.
So these functions can be __cpuinit too.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4B0E3C4E.4090809@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:29:41 +01:00
Mike Travis 9b3660a55a x86: Limit number of per cpu TSC sync messages
Limit the number of per cpu TSC sync messages by only printing
to the console if an error occurs, otherwise print as a DEBUG
message.

The info message "Skipping synchronization ..." is only printed
after the last cpu has booted.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091118002222.181053000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:17:45 +01:00
Frederic Weisbecker 2c31b7958f x86/hw-breakpoints: Don't lose GE flag while disabling a breakpoint
When we schedule out a breakpoint from the cpu, we also
incidentally remove the "Global exact breakpoint" flag from the
breakpoint control register. It makes us losing the fine grained
precision about the origin of the instructions that may trigger
breakpoint exceptions for the other breakpoints running in this
cpu.

Reported-by: Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259211878-6013-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 09:29:22 +01:00
Frederic Weisbecker 605bfaee90 hw-breakpoints: Simplify error handling in breakpoint creation requests
This simplifies the error handling when we create a breakpoint.
We don't need to check the NULL return value corner case anymore
since we have improved perf_event_create_kernel_counter() to
always return an error code in the failure case.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1259210142-5714-3-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 09:29:21 +01:00
Ilya Loginov 2d4dc890b5 block: add helpers to run flush_dcache_page() against a bio and a request's pages
Mtdblock driver doesn't call flush_dcache_page for pages in request.  So,
this causes problems on architectures where the icache doesn't fill from
the dcache or with dcache aliases.  The patch fixes this.

The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
pointless empty cache-thrashing loops on architectures for which
flush_dcache_page() is a no-op.  Every architecture was provided with this
flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
equal 1 or do nothing otherwise.

See "fix mtd_blkdevs problem with caches on some architectures" discussion
on LKML for more information.

Signed-off-by: Ilya Loginov <isloginov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Peter Horton <phorton@bitbox.co.uk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-26 09:16:19 +01:00
Ingo Molnar 67f2de0bf9 x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks
This warning:

[  847.140022] rb_producer   D 0000000000000000  5928   519      2 0x00000000
[  847.203627] BUG: using smp_processor_id() in preemptible [00000000] code: khungtaskd/517
[  847.207360] caller is show_stack_log_lvl+0x2e/0x241
[  847.210364] Pid: 517, comm: khungtaskd Not tainted 2.6.32-rc8-tip+ #13761
[  847.213395] Call Trace:
[  847.215847]  [<ffffffff81413bde>] debug_smp_processor_id+0x1f0/0x20a
[  847.216809]  [<ffffffff81015eae>] show_stack_log_lvl+0x2e/0x241
[  847.220027]  [<ffffffff81018512>] show_stack+0x1c/0x1e
[  847.223365]  [<ffffffff8107b7db>] sched_show_task+0xe4/0xe9
[  847.226694]  [<ffffffff8112f21f>] check_hung_task+0x140/0x199
[  847.230261]  [<ffffffff8112f4a8>] check_hung_uninterruptible_tasks+0x1b7/0x20f
[  847.233371]  [<ffffffff8112f500>] ? watchdog+0x0/0x50
[  847.236683]  [<ffffffff8112f54e>] watchdog+0x4e/0x50
[  847.240034]  [<ffffffff810cee56>] kthread+0x97/0x9f
[  847.243372]  [<ffffffff81012aea>] child_rip+0xa/0x20
[  847.246690]  [<ffffffff81e43494>] ? restore_args+0x0/0x30
[  847.250019]  [<ffffffff81e43083>] ? _spin_lock+0xe/0x10
[  847.253351]  [<ffffffff810cedbf>] ? kthread+0x0/0x9f
[  847.256833]  [<ffffffff81012ae0>] ? child_rip+0x0/0x20

Happens because on preempt-RCU, khungd calls show_stack() with
preemption enabled.

Make sure we are not preemptible while walking the IRQ and exception
stacks on 64-bit. (32-bit stack dumping is preemption safe.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 08:29:10 +01:00
Ingo Molnar b803090615 x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details
Make the initialization more readable, plus tidy up a few small
visual details as well.

No change in functionality.

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 08:24:33 +01:00
Tejun Heo 28b4e0d86a x86: Rename global percpu symbol dr7 to cpu_dr7
Percpu symbols now occupy the same namespace as other global
symbols and as such short global symbols without subsystem
prefix tend to collide with local variables.  dr7 percpu
variable used by x86 was hit by this. Rename it to cpu_dr7.

The rename also makes it more consistent with its fellow
cpu_debugreg percpu variable.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091125115856.GA17856@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
2009-11-25 14:30:04 +01:00
FUJITA Tomonori 273bee27fa x86: Fix iommu=soft boot option
iommu=soft boot option forces the kernel to use swiotlb.

( This has the side-effect of enabling the swiotlb over the
  GART if this boot option is provided. This is the desired
  behavior of the swiotlb boot option and works like that
  for all other hw-IOMMU drivers. )

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: yinghai@kernel.org
LKML-Reference: <20091125084611O.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-25 10:12:51 +01:00
Lin Ming 2263576cfc ACPICA: Add post-order callback to acpi_walk_namespace
The existing interface only has a pre-order callback. This change
adds an additional parameter for a post-order callback which will
be more useful for bus scans. ACPICA BZ 779.

Also update the external calls to acpi_walk_namespace.

http://www.acpica.org/bugzilla/show_bug.cgi?id=779

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-11-24 21:31:10 -05:00
Bjorn Helgaas f6e1d8cc38 x86/PCI: MMCONFIG: add lookup function
This patch factors out the search for an MMCONFIG region, which was
previously implemented in both mmconfig_32 and mmconfig_64.  No functional
change.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:30:36 -08:00
Bjorn Helgaas 8c57786ad3 x86/PCI: MMCONFIG: clean up printks
No functional change; just tidy up printks and make them more consistent
with the rest of PCI.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:30:30 -08:00
Bjorn Helgaas ba2afbabfc x86/PCI: MMCONFIG: add pci_mmconfig_remove() to remove MMCONFIG region
This is only used internally now, but eventually will be used in the
hot-remove path to remove the MMCONFIG region associated with a host bridge.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:30:24 -08:00
Bjorn Helgaas ff097ddd4a x86/PCI: MMCONFIG: manage pci_mmcfg_region as a list, not a table
This changes pci_mmcfg_region from a table to a list, to make it easier
to add and remove MMCONFIG regions for PCI host bridge hotplug.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:30:14 -08:00
Bjorn Helgaas 987c367b4e x86/PCI: MMCONFIG: remove typeof so we can use a list
This replaces "typeof(pci_mmcfg_config[0])" with the actual type because
I plan to convert pci_mmcfg_config to a list, and then "pci_mmcfg_config[0]"
won't mean anything.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:30:08 -08:00
Bjorn Helgaas 3f0f550392 x86/PCI: MMCONFIG: add virtual address to struct pci_mmcfg_region
The virtual address is only used for x86_64, but it's so much simpler
to manage it as part of the pci_mmcfg_region that I think it's worth
wasting a pointer per MMCONFIG region on x86_32.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:30:01 -08:00
Bjorn Helgaas 2f2a8b9c90 x86/PCI: MMCONFIG: trivial is_mmconf_reserved() interface simplification
Since pci_mmcfg_region contains the struct resource, no need to pass the
pci_mmcfg_region *and* the resource start/size.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:29:49 -08:00
Bjorn Helgaas 56ddf4d3cf x86/PCI: MMCONFIG: add resource to struct pci_mmcfg_region
This patch adds a resource and corresponding name to the MMCONFIG
structure.  This makes allocation simpler (we can allocate the
resource and name at the same time we allocate the pci_mmcfg_region),
and gives us a way to hang onto the resource after inserting it.
This will be needed so we can release and free it when hot-removing
a host bridge.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:29:41 -08:00
Bjorn Helgaas 95cf1cf0c5 x86/PCI: MMCONFIG: use pointer to simplify pci_mmcfg_config[] structure access
No functional change, but simplifies a future patch to convert the table
to a list.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:29:34 -08:00
Bjorn Helgaas d7e6b66fe8 x86/PCI: MMCONFIG: rename pci_mmcfg_region structure members
This only renames the struct pci_mmcfg_region members; no functional change.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:29:24 -08:00
Bjorn Helgaas d215a9c8b4 x86/PCI: MMCONFIG: use a private structure rather than the ACPI MCFG one
This adds a struct pci_mmcfg_region with a little more information
than the struct acpi_mcfg_allocation used previously.  The acpi_mcfg
structure is defined by the spec, so we can't change it.

To begin with, struct pci_mmcfg_region is basically the same as the
ACPI MCFG version, but future patches will add more information.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:29:17 -08:00
Bjorn Helgaas df5eb1d67e x86/PCI: MMCONFIG: add PCI_MMCFG_BUS_OFFSET() to factor common expression
This factors out the common "bus << 20" expression used when computing the
MMCONFIG address.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:29:11 -08:00
Bjorn Helgaas f7ca698487 x86/PCI: MMCONFIG: reject MMCONFIG apertures at address zero
Since all MMCONFIG regions go through pci_mmconfig_add(), we can test the
address once there.  If the caller supplies an address of zero, we never
insert it in the pci_mmcfg_config[] table, so no need to test it elsewhere.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:29:03 -08:00
Bjorn Helgaas 463a5df175 x86/PCI: MMCONFIG: simplify tests for empty pci_mmcfg_config table
We never set pci_mmcfg_config unless we increment pci_mmcfg_config_num,
so there's no need to test both pci_mmcfg_config_num and pci_mmcfg_config.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:28:56 -08:00
Bjorn Helgaas 7da7d360ae x86/PCI: MMCONFIG: centralize MCFG structure management
This patch encapsulate pci_mmcfg_config[] updates.  All alloc/free is now
done in pci_mmconfig_add() and free_all_mcfg(), so all updates to
pci_mmcfg_config[] and pci_mmcfg_config_num are in those two functions.

This replaces the previous sequence of extend_mmcfg(), fill_one_mmcfg()
with the single pci_mmconfig_add() interface.  This interface is currently
static but will eventually be used in the host bridge hot-add path.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:28:51 -08:00
Bjorn Helgaas d3578ef7aa x86/PCI: MMCONFIG: step through MCFG table, not pci_mmcfg_config[]
Step through the ACPI MCFG table, not pci_mmcfg_config[].  No functional
change, but simplifies future patches that encapsulate pci_mmcfg_config[].

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:28:43 -08:00
Bjorn Helgaas e823d6ff58 x86/PCI: MMCONFIG: count MCFG structures with local variable
Use a local variable, not pci_mmcfg_config_num, to count MCFG entries.
No functional change, but simplifies future changes.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:28:37 -08:00
Bjorn Helgaas 5663b1b963 x86/PCI: MMCONFIG: remove unused definitions
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:28:30 -08:00
Yinghai Lu 67f241f457 x86/pci: seperate x86_pci_rootbus_res_quirks from amd_bus.c
Those functions are used by intel_bus.c so seperate them to another file. and
make amd_bus a bit smaller.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:59 -08:00
Jiri Kosina 7b7a785942 PCI: fix comment typo in bus_numa.h
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:20 -08:00
Alex Chiang 2ed7a806d8 x86/PCI: remove early PCI pr_debug statements
commit db635adc turned -DDEBUG for x86/pci on when CONFIG_PCI_DEBUG
is set. In general, I agree with that change.

However, it exposes a bunch of very low level PCI debugging in the
early x86 path, such as:

	0 reading 2 from a: ffff
	1 reading 2 from a: ffff
	2 reading 2 from a: ffff
	3 reading 2 from a: 300
	3 reading 2 from 0: 1002
	3 reading 2 from 2: 515e

These statements add a lot of noise to the boot and aren't likely to
be necessary even when handling random upstream bug reports.

[In contrast, statements such as these:

	pci 0000:02:04.0: found [14e4:164a] class 000200 header type 00
	pci 0000:02:04.0: reg 10: [mem 0xf8000000-0xf9ffffff 64bit]
	pci 0000:02:04.0: reg 30: [mem 0x00000000-0x0001ffff pref]

are indeed useful when remote debugging users' machines]

Remove the noisy printks and save electrons everywhere.

Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:19 -08:00
Yinghai Lu 5bf65b9ba6 x86, mtrr: Fix sorting of mtrr after subtracting
In some cases we can coalesce MTRR entries after cleanup; this may
allow us to have more entries.  As such, introduce clean_sort_range to
to sort and coaelsce the MTRR entries.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9A3.5020908@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-11-24 13:06:16 -08:00
Thomas Renninger e2f74f355e [ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface
This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.

Why is this needed:

Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
  - any userspace prog writing to scaling_max_freq
  - thermal limitations
  - hardware (_PPC in ACPI case) limitiations

Therefore export bios_limit (in kHz) to:
  - Point the user that it's the BIOS (broken or intended) which limits
    frequency
  - Export it as a sysfs interface for userspace progs.
    While this was a rarely used feature on laptops, there will appear
    more and more server implemenations providing "Green IT" features like
    allowing the service processor to limit the frequency. People want
    to know about HW/BIOS frequency limitations.

All ACPI P-state driven cpufreq drivers are covered with this patch:
  - powernow-k8
  - powernow-k7
  - acpi-cpufreq

Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations

# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation

CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24 13:33:34 -05:00
Rusty Russell 1cce76c2ac [CPUFREQ] use an enum for speedstep processor identification
The "unsigned int processor" everywhere confused Rusty, leading to
breakage when he passed in smp_processor_id().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24 13:33:34 -05:00
Krzysztof Helt db2820dd54 [CPUFREQ] powernow-k6: set transition latency value so ondemand governor can be used
Set the transition latency to value smaller than CPUFREQ_ETERNAL so
governors other than "performance" work (like the "ondemand" one).

The value is found in "AMD PowerNow! Technology Platform Design Guide for
Embedded Processors" dated December 2000 (AMD doc #24267A). There is the
answer to one of FAQs on page 40 which states that suggested complete transition
period is 200 us.

Tested on K6-2+ CPU with K6-3 core (model 13, stepping 4).

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24 13:33:33 -05:00
Rusty Russell b8cbe7e82e [CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c
It's still mugging the current process's cpumask, but as comment in
1ff6e97f1d says, it's not a trivial fix.

So, at least we can use a cpumask_var_t to do the Wrong Thing the Right Way :)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Cc: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24 13:33:33 -05:00
Harald Welte d77b819745 [CPUFREQ] Enable ACPI PDC handshake for VIA/Centaur CPUs
In commit 0de51088e6, we introduced the
use of acpi-cpufreq on VIA/Centaur CPU's by removing a vendor check for
VENDOR_INTEL.  However, as it turns out, at least the Nano CPU's also
need the PDC (processor driver capabilities) handshake in order to
activate the methods required for acpi-cpufreq.

Since arch_acpi_processor_init_pdc() contains another vendor check for
Intel, the PDC is not initialized on VIA CPU's.  The resulting behavior
of a current mainline kernel on such systems is:  acpi-cpufreq
loads and it indicates CPU frequency changes.  However, the CPU stays at
a single frequency

This trivial patch ensures that init_intel_pdc() is called on Intel and
VIA/Centaur CPU's alike.

Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24 13:33:32 -05:00
Stephane Eranian 1261a02a0c perf_events, x86: Fix validate_event bug
The validate_event() was failing on valid event combinations. The
function was assuming that if x86_schedule_event() returned 0, it
meant error. But x86_schedule_event() returns the counter index and
0 is a perfectly valid value. An error is returned if the function
returns a negative value.

Furthermore, validate_event() was also failing for event groups
because the event->pmu was not set until after
hw_perf_event_init().

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: perfmon2-devel@lists.sourceforge.net
Cc: eranian@gmail.com
LKML-Reference: <4b0bdf36.1818d00a.07cc.25ae@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--
 arch/x86/kernel/cpu/perf_event.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
2009-11-24 19:23:48 +01:00