Commit Graph

11191 Commits

Author SHA1 Message Date
John Stultz 592913ecb8 time: Kill off CONFIG_GENERIC_TIME
Now that all arches have been converted over to use generic time via
clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
config option and simplify the generic code.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:54 +02:00
John Stultz 8c73626ab2 x86: Fix vtime/file timestamp inconsistencies
Due to vtime calling vgettimeofday(), its possible that an application
could call  time();create("stuff",O_RDRW);  only to see the file's
creation timestamp to be before the value returned by time.

A similar way to reproduce the issue is to compare the vsyscall time()
with the syscall time(), and observe ordering issues.

The modified test case from Oleg Nesterov below can illustrate this:

int main(void)
{
	time_t sec1,sec2;
	do {
		sec1 = time(&sec2);
		sec2 = syscall(__NR_time, NULL);
	} while (sec1 <= sec2);

	printf("vtime: %d.000000\n", sec1);
	printf("time: %d.000000\n", sec2);
	return 0;
}

The proper fix is to make vtime use the same time value as
current_kernel_time() (which is exported via update_vsyscall) instead of
vgettime().

Thanks to Jiri Olsa for bringing up the issue and catching bugs in
earlier verisons of this fix.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
LKML-Reference: <1279068988-21864-2-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:53 +02:00
Jeremy Fitzhardinge b43275d661 xen/pvhvm: fix build problem when !CONFIG_XEN
x86_hyper_xen_hvm is only defined when Xen is enabled in the kernel
config.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-26 23:13:28 -07:00
Stefano Stabellini 5915100106 x86: Call HVMOP_pagetable_dying on exit_mmap.
When a pagetable is about to be destroyed, we notify Xen so that the
hypervisor can clear the related shadow pagetable.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-26 23:13:26 -07:00
Stefano Stabellini c1c5413ad5 x86: Unplug emulated disks and nics.
Add a xen_emul_unplug command line option to the kernel to unplug
xen emulated disks and nics.

Set the default value of xen_emul_unplug depending on whether or
not the Xen PV frontends and the Xen platform PCI driver have
been compiled for this kernel (modules or built-in are both OK).

The user can specify xen_emul_unplug=ignore to enable PV drivers on HVM
even if the host platform doesn't support unplug.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-26 23:13:25 -07:00
Stefano Stabellini 409771d258 x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
Use xen_vcpuop_clockevent instead of hpet and APIC timers as main
clockevent device on all vcpus, use the xen wallclock time as wallclock
instead of rtc and use xen_clocksource as clocksource.
The pv clock algorithm needs to work correctly for the xen_clocksource
and xen wallclock to be usable, only modern Xen versions offer a
reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock).

Using the hpet as clocksource means a VMEXIT every time we read/write to
the hpet mmio addresses, pvclock give us a better rating without
VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-26 23:13:25 -07:00
Linus Torvalds 1a041a23da Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Do not try to disable hpet if it hasn't been initialized before
  x86, i8259: Only register sysdev if we have a real 8259 PIC
2010-07-26 16:02:07 -07:00
Linus Torvalds ee13cbdec4 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] powernow-k8: Limit Pstate transition latency check
  [CPUFREQ] Fix PCC driver error path
  [CPUFREQ] fix double freeing in error path of pcc-cpufreq
  [CPUFREQ] pcc driver should check for pcch method before calling _OSC
  [CPUFREQ] fix memory leak in cpufreq_add_dev
  [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"
2010-07-26 15:35:53 -07:00
Borislav Petkov 3581ced3b6 [CPUFREQ] powernow-k8: Limit Pstate transition latency check
The Pstate transition latency check was added for broken F10h BIOSen
which wrongly contain a value of 0 for transition and bus master
latency. Fam11h and later, however, (will) have similar transition
latency so extend that behavior for them too.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-07-26 15:25:34 -04:00
Matthew Garrett 179ee43465 [CPUFREQ] Fix PCC driver error path
The PCC cpufreq driver unmaps the mailbox address range if any CPUs fail to
initialise, but doesn't do anything to remove the registered CPUs from the
cpufreq core resulting in failures further down the line. We're better off
simply returning a failure - the cpufreq core will unregister us cleanly if
we end up with no successfully registered CPUs. Tidy up the failure path
and also add a sanity check to ensure that the firmware gives us a realistic
frequency - the core deals badly with that being set to 0.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-07-26 15:25:34 -04:00
Daniel J Blueman 3847d223f2 [CPUFREQ] fix double freeing in error path of pcc-cpufreq
Prevent double freeing on error path.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-07-26 15:25:34 -04:00
Matthew Garrett 47f8bcf362 [CPUFREQ] pcc driver should check for pcch method before calling _OSC
The pcc specification documents an _OSC method that's incompatible with the
one defined as part of the ACPI spec. This shouldn't be a problem as both
are supposed to be guarded with a UUID. Unfortunately approximately nobody
(including HP, who wrote this spec) properly check the UUID on entry to the
_OSC call. Right now this could result in surprising behaviour if the pcc
driver performs an _OSC call on a machine that doesn't implement the pcc
specification. Check whether the PCCH method exists first in order to reduce
this probability.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-07-26 15:25:34 -04:00
Linus Torvalds 20ba5efb9c Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Use kmalloc() instead of vmalloc() for KVM_[GS]ET_MSR
  KVM: MMU: fix conflict access permissions in direct sp
2010-07-26 08:18:18 -07:00
Len Brown 0e1cf38889 Merge branch 'bugzilla-16396' into release 2010-07-24 23:26:22 -04:00
Rafael J. Wysocki 72ad5d77fb ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
Commit 2a6b69765a
(ACPI: Store NVS state even when entering suspend to RAM) caused the
ACPI suspend code save the NVS area during suspend and restore it
during resume unconditionally, although it is known that some systems
need to use acpi_sleep=s4_nonvs for hibernation to work.  To allow
the affected systems to avoid saving and restoring the NVS area
during suspend to RAM and resume, introduce kernel command line
option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
file).

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: tomas m <tmezzadra@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-24 23:26:09 -04:00
Stefano Stabellini ff4878089e x86: Do not try to disable hpet if it hasn't been initialized before
hpet_disable is called unconditionally on machine reboot if hpet support
is compiled in the kernel.
hpet_disable only checks if the machine is hpet capable but doesn't make
sure that hpet has been initialized.

[ tglx: Made it a one liner and removed the redundant hpet_address check ]

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.DEB.2.00.1007211726240.22235@kaball-desktop>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 12:53:00 +02:00
Avi Kivity 7a73c0283d KVM: Use kmalloc() instead of vmalloc() for KVM_[GS]ET_MSR
We don't need more than a page, and vmalloc() is slower (much
slower recently due to a regression).

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-23 09:07:14 +03:00
Xiao Guangrong 6aa0b9dec5 KVM: MMU: fix conflict access permissions in direct sp
In no-direct mapping, we mark sp is 'direct' when we mapping the
guest's larger page, but its access is encoded form upper page-struct
entire not include the last mapping, it will cause access conflict.

For example, have this mapping:
        [W]
      / PDE1 -> |---|
  P[W]          |   | LPA
      \ PDE2 -> |---|
        [R]

P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the
same lage page(LPA). The P's access is WR, PDE1's access is WR,
PDE2's access is RO(just consider read-write permissions here)

When guest access PDE1, we will create a direct sp for LPA, the sp's
access is from P, is W, then we will mark the ptes is W in this sp.

Then, guest access PDE2, we will find LPA's shadow page, is the same as
PDE's, and mark the ptes is RO.

So, if guest access PDE1, the incorrect #PF is occured.

Fixed by encode the last mapping access into direct shadow page

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-23 09:07:04 +03:00
Stefano Stabellini 016b6f5fe8 xen: Add suspend/resume support for PV on HVM guests.
Suspend/resume requires few different things on HVM: the suspend
hypercall is different; we don't need to save/restore memory related
settings; except the shared info page and the callback mechanism.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:46:21 -07:00
Sheng Yang 38e20b07ef x86/xen: event channels delivery on HVM.
Set the callback to receive evtchns from Xen, using the
callback vector delivery mechanism.

The traditional way for receiving event channel notifications from Xen
is via the interrupts from the platform PCI device.
The callback vector is a newer alternative that allow us to receive
notifications on any vcpu and doesn't need any PCI support: we allocate
a vector exclusively to receive events, in the vector handler we don't
need to interact with the vlapic, therefore we avoid a VMEXIT.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:45:59 -07:00
Sheng Yang bee6ab53e6 x86: early PV on HVM features initialization.
Initialize basic pv on hvm features adding a new Xen HVM specific
hypervisor_x86 structure.

Don't try to initialize xen-kbdfront and xen-fbfront when running on HVM
because the backends are not available.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:45:35 -07:00
Jeremy Fitzhardinge 18f19aa62a xen: Add support for HVM hypercalls.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-07-22 16:45:31 -07:00
Len Brown 4a973f2495 Merge branch 'bugzilla-15886' into release 2010-07-22 18:18:28 -04:00
Len Brown 718be4aaf3 ACPI: skip checking BM_STS if the BIOS doesn't ask for it
It turns out that there is a bit in the _CST for Intel FFH C3
that tells the OS if we should be checking BM_STS or not.

Linux has been unconditionally checking BM_STS.
If the chip-set is configured to enable BM_STS,
it can retard or completely prevent entry into
deep C-states -- as illustrated by turbostat:

http://userweb.kernel.org/~lenb/acpi/utils/pmtools/turbostat/

ref: Intel Processor Vendor-Specific ACPI Interface Specification
table 4 "_CST FFH GAS Field Encoding"
Bit 1: Set to 1 if OSPM should use Bus Master avoidance for this C-state

https://bugzilla.kernel.org/show_bug.cgi?id=15886

Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-22 16:54:27 -04:00
Thomas Renninger 4c21adf26f x86 cpufreq, perf: Make trace_power_frequency cpufreq driver independent
and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric
way in acpi-cpufreq driver's target() function only.

-> Move the call to trace_power_frequency to
   cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
   notifier is triggered.
   This will support power frequency tracing by all cpufreq
   drivers.

trace_power_frequency did not trace frequency changes correctly
when the userspace governor was used or when CPU cores'
frequency depend on each other.

-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
   which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my
initial quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

[akpm@linux-foundation.org: build fix]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: davej@codemonkey.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Cc: Robert Schoene <robert.schoene@tu-dresden.de>
Tested-by: Robert Schoene <robert.schoene@tu-dresden.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2010-07-22 12:08:27 +02:00
Brian Gerst 650fb4393d x86-64: Simplify loading initial_gs
Load initial_gs as two 32-bit values instead of splitting a 64-bit value.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1279371808-24804-3-git-send-email-brgerst@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-21 21:23:51 -07:00
Brian Gerst cfaa71ee97 x86: Use symbolic MSR names
Use symbolic MSR names instead of hardcoding the MSR index.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1279371808-24804-2-git-send-email-brgerst@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-21 21:23:40 -07:00
Brian Gerst 8c06585d64 x86: Remove redundant K6 MSRs
MSR_K6_EFER is unused, and MSR_K6_STAR is redundant with MSR_STAR.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1279371808-24804-1-git-send-email-brgerst@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-21 21:23:05 -07:00
Roland McGrath 0327559151 x86: auditsyscall: fix fastpath return value after reschedule
In the CONFIG_AUDITSYSCALL fast-path for x86 64-bit system calls,
we can pass a bad return value and/or error indication for the
system call to audit_syscall_exit().  This happens when
TIF_NEED_RESCHED was set as the system call returned, so we went
out to schedule() and came back to the exit-audit fast-path.  The
fix is to reload the user return value register from the pt_regs
before using it for audit_syscall_exit().

Both the 32-bit kernel's fast path and the 64-bit kernel's 32-bit
system call fast paths work slightly differently, so that they
always leave the fast path entirely to reschedule and don't return
there, so they don't have the analogous bugs.

Reported-by: Alexander Viro <aviro@redhat.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
2010-07-21 17:44:12 -07:00
H. Peter Anvin 1cff92d8fd x86, xsave: Make xstate_enable_boot_cpu() __init, protect on CPU 0
xstate_enable_boot_cpu() is, as the name implies, only used on the
boot CPU; furthermore, it invokes alloc_bootmem(), which is __init;
hence it needs to be tagged __init rather than __cpuinit.

Furthermore, it is *not* safe in the long run to rely on CPU 0 only
coming online during the early boot -- at some point we're going to
support offlining (and re-onlining) the boot CPU, and at that point we
must not call xstate_enable_boot_cpu() again.

The code is a fair bit more obscure than one would like, because the
__ref overrides aren't quite powerful enough.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4C476236.1020302@zytor.com>
2010-07-21 15:33:54 -07:00
Robert Richter 4995b9dba9 x86, xsave: Add __init attribute to setup_xstate_features()
This is called only from initialization code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-6-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-21 14:06:05 -07:00
Robert Richter 45c2d7f462 x86, xsave: Make init_xstate_buf static
The pointer is only used in xsave.c. Making it static.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-5-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-21 14:06:05 -07:00
Robert Richter ee813d53a8 x86, xsave: Check cpuid level for XSTATE_CPUID (0x0d)
The patch introduces the XSTATE_CPUID macro and adds a check that
tests if XSTATE_CPUID exists.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-4-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-21 14:06:04 -07:00
Robert Richter 97e80a70db x86, xsave: Introduce xstate enable functions
The patch renames xsave_cntxt_init() and __xsave_init() into
xstate_enable_boot_cpu() and xstate_enable() as this names are more
meaningful.

It also removes the duplicate xcr setup for the boot cpu.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-3-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-21 14:06:04 -07:00
Robert Richter 0e49bf66d2 x86, xsave: Separate fpu and xsave initialization
As xsave also supports other than fpu features, it should be
initialized independently of the fpu. This patch moves this out of fpu
initialization.

There is also a lot of cross referencing between fpu and xsave
code. This patch reduces this by making xsave_cntxt_init() and
init_thread_xstate() static functions.

The patch moves the cpu_has_xsave check at the beginning of
xsave_init(). All other checks may removed then.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-2-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-21 14:06:04 -07:00
Borislav Petkov 3f8afb77cd x86, tlb: Clean up and correct used type
smp_processor_id() returns an int and not an unsigned long.
Also, since the function is small enough, there's no need for a
local variable caching its value.

No functionality change, just cleanup.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100721124705.GA674@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:48:15 +02:00
Ingo Molnar 9dcdbf7a33 Merge branch 'linus' into perf/core
Merge reason: Pick up the latest perf fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:43:06 +02:00
Cliff Wickman 5edd19af18 x86, UV: Make kdump avoid stack dumps
UV NMI callback's should not write stack dumps when a kdump is to be written.

When invoking the crash kernel to write a dump, kdump_nmi_shootdown_cpus()
uses NMI's to get all the cpu's to save their register context and halt.

But the NMI interrupt handler runs a callback list.  This patch sets a flag
to prevent any of those callbacks from interfering with the halt of the cpu.

For UV, which currently has the only callback to which this is relevant, the
uv_handle_nmi() callback should not do dumping of stacks.

The 'in_crash_kexec' flag is defined as an extern in kdebug.h firstly
because x2apic_uv_x.c includes it.  Secondly because some future callback
might need the flag to know that it should not enter the debugger.
(Such a scenario was in fact present in the 2.6.32 kernel, SuSE distribution,
 where a call to kdb needed to be avoided.)

Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1ObLvt-0005UZ-Va@eag09.americas.sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-21 11:33:27 -07:00
Linus Torvalds a4ce96ac35 Fix up trivial spelling errors ('taht' -> 'that')
Pointed out by Lucas who found the new one in a comment in
setup_percpu.c. And then I fixed the others that I grepped
for.

Reported-by: Lucas <canolucas@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-21 09:25:42 -07:00
Michel Lespinasse b4bcb4c28c x86, rwsem: Minor cleanups
Clarified few comments and made initialization of %edx/%rdx more uniform
accross __down_write_nested, __up_read and __up_write functions.

Signed-off-by: Michel Lespinasse <walken@google.com>
LKML-Reference: <201007202219.o6KMJkiA021048@imap1.linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Mike Waychison <mikew@google.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 17:41:14 -07:00
Michel Lespinasse a751bd858b x86, rwsem: Stay on fast path when count > 0 in __up_write()
When count > 0 there is no need to take the call_rwsem_wake path.  If
we did take that path, it would just return without doing anything due
to the active count not being zero.

Signed-off-by: Michel Lespinasse <walken@google.com>
LKML-Reference: <201007202219.o6KMJj9x021042@imap1.linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Mike Waychison <mikew@google.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 17:41:00 -07:00
Florian Zumbiehl 468c30f2bb x86, iomap: Fix wrong page aligned size calculation in ioremapping code
x86 early_iounmap(): fix off-by-one error in page alignment of allocation
size for sizes where size%PAGE_SIZE==1.

Signed-off-by: Florian Zumbiehl <florz@florz.de>
LKML-Reference: <201007202219.o6KMJlES021058@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 16:56:35 -07:00
Andres Salomon 92851e2fca x86, mm: Create symbolic index into address_markers array
Without this, adding entries into the address_markers array means adding
more and more of an #ifdef maze in pt_dump_init().  By using indices, we
can keep it a bit saner.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <201007202219.o6KMJkUs021052@imap1.linux-foundation.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 16:56:19 -07:00
Yinghai Lu 9aebbdb637 x86, numa: fix boot without RAM on node0 again
Commit e534c7c5f8 ("numa: x86_64: use generic percpu var
numa_node_id() implementation") broke numa systems that don't have ram
on node0 when MEMORY_HOTPLUG is enabled, because cpu_up() will call
cpu_to_node() before per_cpu(numa_node) is setup for APs.

When Node0 doesn't have RAM, on x86, cpus already round it to nearest
node with RAM in x86_cpu_to_node_map.  and per_cpu(numa_node) is not set
up until in c_init for APs.

When later cpu_up() calling cpu_to_node() will get 0 again, and make it
online even there is no RAM on node0.  so later all APs can not booted up,
and later will have panic.

[    1.611101] On node 0 totalpages: 0
.........
[    2.608558] On node 0 totalpages: 0
[    2.612065] Brought up 1 CPUs
[    2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
...
   93.225341] calling  loop_init+0x0/0x1a4 @ 1
[   93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
[   93.246539] Pid: 1, comm: swapper Tainted: G        W   2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
[   93.264621] Call Trace:
[   93.266533]  [<ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
[   93.270710]  [<ffffffff81125f15>] __alloc_percpu+0x10/0x12
[   93.285849]  [<ffffffff8140786c>] alloc_disk_node+0x94/0x16d
[   93.291811]  [<ffffffff81407956>] alloc_disk+0x11/0x13
[   93.306157]  [<ffffffff81503e51>] loop_alloc+0xa7/0x180
[   93.310538]  [<ffffffff8277ef48>] loop_init+0x9b/0x1a4
[   93.324909]  [<ffffffff8277eead>] ? loop_init+0x0/0x1a4
[   93.329650]  [<ffffffff810001f2>] do_one_initcall+0x57/0x136
[   93.345197]  [<ffffffff827486d0>] kernel_init+0x184/0x20e
[   93.348146]  [<ffffffff81034954>] kernel_thread_helper+0x4/0x10
[   93.365194]  [<ffffffff81c7cc3c>] ? restore_args+0x0/0x30
[   93.369305]  [<ffffffff8274854c>] ? kernel_init+0x0/0x20e
[   93.386011]  [<ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
[   93.392047] loop: out of memory
...

Try to assign per_cpu(numa_node) early

[akpm@linux-foundation.org: tidy up code comment]
Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-20 16:25:40 -07:00
Robert Richter 82d4150cec x86, xsave: Move boot cpu initialization to xsave_init()
This patch moves boot cpu initialization to xsave_init(). Now all cpus
are initialized in one single function.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279651857-24639-5-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 16:21:44 -07:00
Robert Richter db10db48b2 x86, xsave: 32/64 bit boot cpu check unification in initialization
Boot cpu id is always 0, thus simplifying and unifying boot cpu check.

boot_cpu_id is there for historical reasons and was renamed to
boot_cpu_physical_apicid in patch:

 c70dcb7 x86: change boot_cpu_id to boot_cpu_physical_apicid

However, there are some remaining occurrences of boot_cpu_id that are
never touched in the kernel and thus its value is always 0.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279651857-24639-3-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 16:21:38 -07:00
Robert Richter 7aa2b5f8ec x86, xsave: Do not include asm/i387.h in asm/xsave.h
There are no dependencies to asm/i387.h. Instead, if including only
xsave.h the following error occurs:

 .../arch/x86/include/asm/i387.h:110: error: ‘XSTATE_FP’ undeclared (first use in this function)
 .../arch/x86/include/asm/i387.h:110: error: (Each undeclared identifier is reported only once
 .../arch/x86/include/asm/i387.h:110: error: for each function it appears in.)

This patch fixes this.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279651857-24639-2-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 16:21:38 -07:00
Andi Kleen fa10ba64ac x86, gcc-4.6: Fix set but not read variables
Just some dead code, no real bugs.

Found by gcc 4.6 -Wall

Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <201007202219.o6KMJnQ0021072@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 15:38:30 -07:00
Andi Kleen 5f755293ca x86, gcc-4.6: Avoid unused by set variables in rdmsr
Avoids quite a lot of warnings with a gcc 4.6 -Wall build
because this happens in a commonly used header file (apic.h)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <201007202219.o6KMJme6021066@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 15:38:18 -07:00
Adam Lackorzynski 087b255a2b x86, i8259: Only register sysdev if we have a real 8259 PIC
My platform makes use of the null_legacy_pic choice and oopses when doing
a shutdown as the shutdown code goes through all the registered sysdevs
and calls their shutdown method which in my case poke on a non-existing
i8259.  Imho the i8259 specific sysdev should only be registered if the
i8259 is actually there.

Do not register the sysdev function when the null_legacy_pic is used so
that the i8259 resume, suspend and shutdown functions are not called.

Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
LKML-Reference: <201007202218.o6KMIJ3m020955@imap1.linux-foundation.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: <stable@kernel.org> 2.6.34
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 15:27:33 -07:00
Jeremy Fitzhardinge f89e048e76 xen: make sure pages are really part of domain before freeing
Scan the set of pages we're freeing and make sure they're actually
owned by the domain before freeing.  This generally won't happen on a
domU (since Xen gives us contigious memory), but it could happen if
there are some hardware mappings passed through.

We only bother going up to the highest page Xen actually claimed to
give us, since there's definitely nothing of ours above that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-20 12:57:46 -07:00
Miroslav Rezanina 093d7b4639 xen: release unused free memory
Scan an e820 table and release any memory which lies between e820 entries,
as it won't be used and would just be wasted.  At present this is just to
release any memory beyond the end of the e820 map, but it will also deal
with holes being punched in the map.

Derived from patch by Miroslav Rezanina <mrezanin@redhat.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-20 12:51:22 -07:00
H. Peter Anvin 2decb194e6 x86, cpu: Split addon_cpuid_features.c
addon_cpuid_features.c contains exactly two almost completely
unrelated functions, plus has a long and very generic name.  Split it
into two files, scattered.c for the scattered feature flags, and
topology.c for the topology information.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>
2010-07-19 19:02:41 -07:00
H. Peter Anvin 278bc5f6ab x86, cpu: Clean up formatting in cpufeature.h, remove override
Clean up the formatting in cpufeature.h, and remove an unnecessary
name override.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <tip-*@git.kernel.org>
2010-07-19 19:02:35 -07:00
Suresh Siddha 6bad06b768 x86, xsave: Use xsaveopt in context-switch path when supported
xsaveopt is a more optimized form of xsave specifically designed
for the context switch usage. xsaveopt doesn't save the state that's not
modified from the prior xrstor. And if a specific feature state gets
modified to the init state, then xsaveopt just updates the header bit
in the xsave memory layout without updating the corresponding memory
layout.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-19 17:52:24 -07:00
Suresh Siddha 29104e101d x86, xsave: Sync xsave memory layout with its header for user handling
With xsaveopt, if a processor implementation discern that a processor state
component is in its initialized state it may modify the corresponding bit in
the xsave_hdr.xstate_bv as '0', with out modifying the corresponding memory
layout. Hence wHile presenting the xstate information to the user, we always
ensure that the memory layout of a feature will be in the init state if the
corresponding header bit is zero. This ensures the consistency and avoids the
condition of the user seeing some some stale state in the memory layout during
signal handling, debugging etc.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100719230205.351459480@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-19 17:51:30 -07:00
Suresh Siddha a1488f8bf4 x86, xsave: Track the offset, size of state in the xsave layout
Subleaves of the cpuid vector 0xd provides the offset and size of different
feature state that are managed by the xsave/xrstor. Track this for the upcoming
usage during signal handling.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100719230205.262987929@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-19 17:51:17 -07:00
Suresh Siddha 5734f62b66 x86, cpu: Enumerate xsaveopt
Enumerate the xsaveopt feature.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-19 16:50:40 -07:00
Suresh Siddha 40e1d7a4ff x86, cpu: Add xsaveopt cpufeature
Add cpu feature bit support for the XSAVEOPT instruction.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100719230205.523204988@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-19 16:48:06 -07:00
Suresh Siddha edb18f8ab0 x86, cpu: Make init_scattered_cpuid_features() consider cpuid subleaves
Some cpuid features (like xsaveopt) are enumerated using cpuid
subleaves.

Extend init_scattered_cpuid_features() to take subleaf into account.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100719230205.439900717@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-19 16:47:59 -07:00
Linus Torvalds d0c6f62584 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain
  x86: Fix x2apic preenabled system with kexec
  x86: Force HPET readback_cmp for all ATI chipsets
2010-07-19 13:19:32 -07:00
Kulikov Vasiliy 6c54aabd5e x86/amd-iommu: Use for_each_pci_dev()
Use for_each_pci_dev() to simplify the code.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-07-19 15:41:13 +02:00
Pavel Machek a2531293db update email address
pavel@suse.cz no longer works, replace it with working address.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-19 10:56:54 +02:00
Dave Chinner 7f8275d0d6 mm: add context argument to shrinker callback
The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-19 14:56:17 +10:00
Linus Torvalds a9f7f2e74a Merge branch 'x86/kprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'x86/kprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  x86: kprobes: fix swapped segment registers in kretprobe
2010-07-18 15:13:30 -07:00
Roland McGrath a197479848 x86: kprobes: fix swapped segment registers in kretprobe
In commit f007ea26, the order of the %es and %ds segment registers
got accidentally swapped, so synthesized 'struct pt_regs' frames
have the two values inverted.  It's almost sure that these values
never matter, and that they also never differ.  But wrong is wrong.

Signed-off-by: Roland McGrath <roland@redhat.com>
2010-07-18 15:05:34 -07:00
Cliff Wickman 93a7ca0c3e x86, UV: Initialize BAU MMRs only on hubs with cpus
Remove the initialization of MMRs
UVH_LB_BAU_SB_ACTIVATION_CONTROL and UVH_BAU_DATA_BROADCAST on
UV hubs that have no active cpus. Such initialization on hubs
with no active cpus would result in a kernel page fault.

This is not of real high priority, because we don't have any
such systems (with UV hubs that have no active cpus).  But they
will be coming.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1OZmZN-0006cW-RC@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-17 12:11:48 +02:00
Jacob Pan f82c3d71d6 x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain
The fixed bar capability structure is searched in PCI extended
configuration space.  We need to make sure there is a valid capability
ID to begin with otherwise, the search code may stuck in a infinite
loop which results in boot hang.  This patch adds additional check for
cap ID 0, which is also invalid, and indicates end of chain.

End of chain is supposed to have all fields zero, but that doesn't
seem to always be the case in the field.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <1279306706-27087-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-16 16:52:15 -07:00
Yinghai Lu fd19dce7ac x86: Fix x2apic preenabled system with kexec
Found one x2apic system kexec loop test failed
when CONFIG_NMI_WATCHDOG=y (old) or CONFIG_LOCKUP_DETECTOR=y (current tip)

first kernel can kexec second kernel, but second kernel can not kexec third one.

it can be duplicated on another system with BIOS preenabled x2apic.
First kernel can not kexec second kernel.

It turns out, when kernel boot with pre-enabled x2apic, it will not execute
disable_local_APIC on shutdown path.

when init_apic_mappings() is called in setup_arch, it will skip setting of
apic_phys when x2apic_mode is set. ( x2apic_mode is much early check_x2apic())
Then later, disable_local_APIC() will bail out early because !apic_phys.

So check !x2apic_mode in x2apic_mode in disable_local_APIC with !apic_phys.

another solution could be updating init_apic_mappings() to set apic_phys even
for preenabled x2apic system. Actually even for x2apic system, that lapic
address is mapped already in early stage.

BTW: is there any x2apic preenabled system with apicid of boot cpu > 255?

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C3EB22B.3000701@kernel.org>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-16 16:49:41 -07:00
Bjorn Helgaas 58c84eda07 PCI: fall back to original BIOS BAR addresses
If we fail to assign resources to a PCI BAR, this patch makes us try the
original address from BIOS rather than leaving it disabled.

Linux tries to make sure all PCI device BARs are inside the upstream
PCI host bridge or P2P bridge apertures, reassigning BARs if necessary.
Windows does similar reassignment.

Before this patch, if we could not move a BAR into an aperture, we left
the resource unassigned, i.e., at address zero.  Windows leaves such BARs
at the original BIOS addresses, and this patch makes Linux do the same.

This is a bit ugly because we disable the resource long before we try to
reassign it, so we have to keep track of the BIOS BAR address somewhere.
For lack of a better place, I put it in the struct pci_dev.

I think it would be cleaner to attempt the assignment immediately when the
claim fails, so we could easily remember the original address.  But we
currently claim motherboard resources in the middle, after attempting to
claim PCI resources and before assigning new PCI resources, and changing
that is a fairly big job.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263

Reported-by: Andrew <nitr0@seti.kr.ua>
Tested-by: Andrew <nitr0@seti.kr.ua>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-16 11:39:48 -07:00
Jiri Slaby f33ebbe9da unistd: add __NR_prlimit64 syscall numbers
Add __NR_prlimit64 syscall numbers to asm-generic. Add them also to
asm-x86, both 32 and 64-bit.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2010-07-16 09:52:32 +02:00
Joe Perches b2691085d1 x86: Clean up arch/x86/kernel/cpu/mtrr/cleanup.c: use ";" not "," to terminate statements
Also needed if pr_<level> becomes a bit more space efficient.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <1277768808.29157.280.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-15 18:21:22 +02:00
Thomas Gleixner 08be97962b x86: Force HPET readback_cmp for all ATI chipsets
commit 30a564be (x86, hpet: Restrict read back to affected ATI
chipset) restricted the workaround for the HPET bug to SMX00
chipsets. This was reasonable as those were the only ones against
which we ever got a bug report.

Stephan Wolf reported now that this patch breaks his IXP400 based
machine. Though it's confirmed to work on other IXP400 based systems.

To error out on the safe side, we force the HPET readback workaround
for all ATI SMbus class chipsets.

Reported-by: Stephan Wolf <stephan@letzte-bankreihe.de>
LKML-Reference: <alpine.LFD.2.00.1007142134140.3321@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Stephan Wolf <stephan@letzte-bankreihe.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
2010-07-15 17:10:02 +02:00
Yinghai Lu 70b0d22d58 x86, setup: Only set early_serial_base after port is initialized
putchar is using early_serial_base to check if port is initialized.

So we only assign it after early_serial_init() is called,
in case we need use VGA to debug early serial console.

Also add display for port addr and baud.

-v2: update to current tip

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C3E0171.6050008@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-14 12:07:16 -07:00
Linus Torvalds bcefc8d0d3 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  input: i8042 - add runtime check in x86's i8042_platform_init
  Revert "Input: fixup X86_MRST selects"
  Revert "Input: do not force selecting i8042 on Moorestown"
  x86, mrst: Add i8042_detect API for Moorestwon platform
  x86: Add i8042 pre-detection hook to x86_platform_ops
  x86, platform: Export x86_platform to modules
2010-07-13 17:31:11 -07:00
Linus Torvalds 177dd7e1eb Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: MMU: flush remote tlbs when overwriting spte with different pfn
  KVM: VMX: Fix host MSR_KERNEL_GS_BASE corruption
2010-07-13 17:30:49 -07:00
H. Peter Anvin 3b770a2128 x86, alternatives: BUG on encountering an invalid CPU feature number
Make the alternatives-patching code BUG on encountering an invalid CPU
feature number.  Should have done this a long time ago.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Yinghai Lu <yinhai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <tip-df378ccfc4dd04e263426ad805516915874774aa@git.kernel.org>
2010-07-13 14:57:50 -07:00
H. Peter Anvin df378ccfc4 x86, alternatives: Fix one more open-coded 8-bit alternative number
Fix a missing case of an 8-bit alternative number, buried inside an
assembly macro.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reported-by: Yinghai Lu <yinhai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4C3BDDA3.2060900@kernel.org>
2010-07-13 14:56:16 -07:00
Yinghai Lu ce0aa5dd20 x86, setup: Make the setup code also accept console=uart8250
Make the boot code also accept the console=uart8250,io,0x2f8,115200n
form of early console.

Also add back simple_guess_base(), otherwise those simple_strtoull(,,0)
are not going to work.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C3CCE05.4090505@kernel.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-13 14:08:12 -07:00
Pekka Enberg fa97bdf927 x86, setup: Early-boot serial I/O support
This patch adds serial I/O support to the real-mode setup (very early
boot) printf(). It's useful for debugging boot code when running Linux
under KVM, for example. The actual code was lifted from early printk.

Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1278835617-11368-1-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-12 14:46:00 -07:00
Xiao Guangrong 91546356d0 KVM: MMU: flush remote tlbs when overwriting spte with different pfn
After remove a rmap, we should flush all vcpu's tlb

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-07-12 14:05:56 -03:00
Kenji Kaneshige 35be1b716a x86, ioremap: Fix normal ram range check
Check for normal RAM in x86 ioremap() code seems to not work for the
last page frame in the specified physical address range.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <4C1AE6CD.1080704@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-09 11:42:11 -07:00
Kenji Kaneshige ffa71f33a8 x86, ioremap: Fix incorrect physical address handling in PAE mode
Current x86 ioremap() doesn't handle physical address higher than
32-bit properly in X86_32 PAE mode. When physical address higher than
32-bit is passed to ioremap(), higher 32-bits in physical address is
cleared wrongly. Due to this bug, ioremap() can map wrong address to
linear address space.

In my case, 64-bit MMIO region was assigned to a PCI device (ioat
device) on my system. Because of the ioremap()'s bug, wrong physical
address (instead of MMIO region) was mapped to linear address space.
Because of this, loading ioatdma driver caused unexpected behavior
(kernel panic, kernel hangup, ...).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-09 11:42:03 -07:00
Ky Srinivasan 9279aa5506 x86: Export the symbol ms_hyperv
This is needed so that the staging hyperv can properly access this
symbol.

Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-08 14:12:03 -07:00
Javier Martinez Canillas 5bbd4a336c x86/apic/es7000_32: Remove unused variable
In today's linux-next I got this compile warning:

 arch/x86/kernel/apic/es7000_32.c:132: warning: 'base' defined but not used

Current patch solves the issue removing the unused variable.

Signed-off-by: Javier Martinez Canillas <martinez.javier@gmail.com>
Cc: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <1278546719.9020.4.camel@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-08 09:14:37 +02:00
H. Peter Anvin bdc802dcca x86, cpu: Support the features flags in new CPUID leaf 7
Intel has defined CPUID leaf 7 as the next set of feature flags (see
the AVX specification, version 007).  Add support for this new feature
flags word.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@vger.kernel.org>
2010-07-07 17:29:18 -07:00
Feng Tang 6d2cce6201 x86, mrst: Add i8042_detect API for Moorestwon platform
It will just return 0 as there is no i8042 controller

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <1278342202-10973-3-git-send-email-feng.tang@intel.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-07 17:05:06 -07:00
Feng Tang c516ac5839 x86: Add i8042 pre-detection hook to x86_platform_ops
Some x86 platforms like Intel MID platforms don't have i8042 controllers,
and i8042 driver's probe to some legacy IO ports may hang the MID
processor. With this hook, i8042 driver can runtime check and skip the
probe when the pretection fail which also saves some probe time

[ hpa note: this is currently a compile-time check, which breaks the
  i386 allyesconfig build.  This patch series thus does fix a regression. ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <1278342202-10973-2-git-send-email-feng.tang@intel.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-07 17:05:06 -07:00
H. Peter Anvin 72550b3ae5 x86, platform: Export x86_platform to modules
Export x86_platform to modules in preparation of using it for i8042
discovery control.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <1278342202-10973-1-git-send-email-feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
2010-07-07 17:05:05 -07:00
H. Peter Anvin 83a7a2ad2a x86, alternatives: Use 16-bit numbers for cpufeature index
We already have cpufeature indicies above 255, so use a 16-bit number
for the alternatives index.  This consumes a padding field and so
doesn't add any size, but it means that abusing the padding field to
create assembly errors on overflow no longer works.  We can retain the
test simply by redirecting it to the .discard section, however.

[ v3: updated to include open-coded locations ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-f88731e3068f9d1392ba71cc9f50f035d26a0d4f@git.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-07 10:36:28 -07:00
H. Peter Anvin 24da9c26f3 x86, cpu: Add CPU flags for F16C and RDRND
Add support for the newly documented F16C (16-bit floating point
conversions) and RDRND (RDRAND instruction) CPU feature flags.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-07 10:15:12 -07:00
Linus Torvalds 47a716cf0c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rbtree: Undo augmented trees performance damage and regression
  x86, Calgary: Limit the max PHB number to 256
2010-07-06 17:16:09 -07:00
Suresh Siddha 8e221b6db4 x86: Avoid unnecessary __clear_user() and xrstor in signal handling
fxsave/xsave doesn't touch all the bytes in the memory layout used by
these instructions. Specifically SW reserved (bytes 464..511) fields
in the fxsave frame and the reserved fields in the xsave header.

To present a clean context for the signal handling, just clear these fields
instead of clearing the complete fxsave/xsave memory layout, when we dump these
registers directly to the user signal frame.

Also avoid the call to second xrstor (which inits the state not passed
in the signal frame) in restore_user_xstate() if all the state has already
been restored by the first xrstor.

These changes improve the performance of signal handling(by ~3-5% as measured
by the lat_sig).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1277249017.2847.85.camel@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-06 16:31:04 -07:00
Avi Kivity da38f43859 KVM: VMX: Fix host MSR_KERNEL_GS_BASE corruption
enter_lmode() and exit_lmode() modify the guest's EFER.LMA before calling
vmx_set_efer().  However, the latter function depends on the value of EFER.LMA
to determine whether MSR_KERNEL_GS_BASE needs reloading, via
vmx_load_host_state().  With EFER.LMA changing under its feet, it took the
wrong choice and corrupted userspace's %gs.

This causes 32-on-64 host userspace to fault.

Fix not touching EFER.LMA; instead ask vmx_set_efer() to change it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-06 11:41:31 +03:00
Peter Zijlstra b945d6b255 rbtree: Undo augmented trees performance damage and regression
Reimplement augmented RB-trees without sprinkling extra branches
all over the RB-tree code (which lives in the scheduler hot path).

This approach is 'borrowed' from Fabio's BFQ implementation and
relies on traversing the rebalance path after the RB-tree-op to
correct the heap property for insertion/removal and make up for
the damage done by the tree rotations.

For insertion the rebalance path is trivially that from the new
node upwards to the root, for removal it is that from the deepest
node in the path from the to be removed node that will still
be around after the removal.

[ This patch also fixes a video driver regression reported by
  Ali Gholami Rudi - the memtype->subtree_max_end was updated
  incorrectly. ]

Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Ali Gholami Rudi <ali@rudi.ir>
Cc: Fabio Checconi <fabio@gandalf.sssup.it>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1275414172.27810.27961.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-05 14:43:50 +02:00
Cyrill Gorcunov 39ef13a4ac perf, x86: P4 PMU -- redesign cache events
To support cache events we have reserved the low 6 bits in
hw_perf_event::config (which is a part of CCCR register
configuration actually).

These bits represent Replay Event mertic enumerated in
enum P4_PEBS_METRIC. The caller should not care about
which exact bits should be set and how -- the caller
just chooses one P4_PEBS_METRIC entity and puts it into
the config. The kernel will track it and set appropriate
additional MSR registers (metrics) when needed.

The reason for this redesign was the PEBS enable bit, which
should not be set until DS (and PEBS sampling) support will
be implemented properly.

TODO
====

 - PEBS sampling (note it's tricky and works with _one_ counter only
   so for HT machines it will be not that easy to handle both threads)

 - tracking of PEBS registers state, a user might need to turn
   PEBS off completely (ie no PEBS enable, no UOP_tag) but some
   other event may need it, such events clashes and should not
   run simultaneously, at moment we just don't support such events

 - eventually export user space bits in separate header which will
   allow user apps to configure raw events more conveniently.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1278295769.9540.15.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-05 08:34:36 +02:00
Ingo Molnar 08f8ba0799 Merge commit 'v2.6.35-rc4' into perf/core
Merge reason: Pick up the latest perf fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-05 08:30:58 +02:00
Linus Torvalds 3f7d7b4bde Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Fix incorrect branches event on AMD CPUs
  perf tools: Fix find tids routine by excluding "." and ".."
  x86: Send a SIGTRAP for user icebp traps
2010-07-04 20:20:53 -07:00
Vince Weaver f287d332ce perf, x86: Fix incorrect branches event on AMD CPUs
While doing some performance counter validation tests on some
assembly language programs I noticed that the "branches:u"
count was very wrong on AMD machines.

It looks like the wrong event was selected.

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1007011526010.23160@cl320.eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-03 15:19:34 +02:00
Alexander Duyck 7475271004 x86: Drop CONFIG_MCORE2 check around setting of NET_IP_ALIGN
This patch removes the CONFIG_MCORE2 check from around NET_IP_ALIGN.  It is
based on a suggestion from Andi Kleen.  The assumption is that there are
not any x86 cores where unaligned access is really slow, and this change
would allow for a performance improvement to still exist on configurations
that are not necessarily optimized for Core 2.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-01 22:45:54 -07:00
Ingo Molnar 0a54cec0c2 Merge branch 'linus' into core/rcu
Conflicts:
	fs/fs-writeback.c

Merge reason: Resolve the conflict

Note, i picked the version from Linus's tree, which effectively reverts
the fs-writeback.c bits of:

  b97181f: fs: remove all rcu head initializations, except on_stack initializations

As the upstream changes to this file changed this code heavily and the
first attempt to resolve the conflict resulted in a non-booting kernel.
It's safer to re-try this portion of the commit cleanly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-01 09:31:25 +02:00
Darrick J. Wong d596043d71 x86, Calgary: Limit the max PHB number to 256
The x3950 family can have as many as 256 PCI buses in a single system, so
change the limits to the maximum.  Since there can only be 256 PCI buses in one
domain, we no longer need the BUG_ON check.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
LKML-Reference: <20100701004519.GQ15515@tux1.beaverton.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-06-30 22:41:42 -07:00
Alexander Duyck ea812ca1b0 x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch
x86 architectures can handle unaligned accesses in hardware, and it has
been shown that unaligned DMA accesses can be expensive on Nehalem
architectures.  As such we should overwrite NET_IP_ALIGN to resolve
this issue.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30 14:34:09 -07:00
Frederic Weisbecker a1e80fafc9 x86: Send a SIGTRAP for user icebp traps
Before we had a generic breakpoint layer, x86 used to send a
sigtrap for any debug event that happened in userspace,
except if it was caused by lazy dr7 switches.

Currently we only send such signal for single step or breakpoint
events.

However, there are three other kind of debug exceptions:

- debug register access detected: trigger an exception if the
  next instruction touches the debug registers. We don't use
  it.
- task switch, but we don't use tss.
- icebp/int01 trap. This instruction (0xf1) is undocumented and
  generates an int 1 exception. Unlike single step through TF
  flag, it doesn't set the single step origin of the exception
  in dr6.

icebp then used to be reported in userspace using trap signals
but this have been incidentally broken with the new breakpoint
code. Reenable this. Since this is the only debug event that
doesn't set anything in dr6, this is all we have to check.

This fixes a regression in Wine where World Of Warcraft got broken
as it uses this for software protection checks purposes. And
probably other apps do.

Reported-and-tested-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: 2.6.33.x 2.6.34.x <stable@kernel.org>
2010-06-30 16:16:20 +02:00
Masami Hiramatsu 567a9fd867 kprobes/x86: Fix kprobes to skip prefixes correctly
Fix resume_execution() and is_IF_modifier() to skip x86
instruction prefixes correctly by using x86 instruction
attribute.

Without this fix, resume_execution() can't handle instructions
which have non-REX prefixes (REX prefixes are skipped). This
will cause unexpected kernel panic by hitting bad address when a
kprobe hits on two-byte ret (e.g. "repz ret" generated for
Athlon/K8 optimization), because it just checks "repz" and can't
recognize the "ret" instruction.

These prefixes can be found easily with x86 instruction
attribute. This patch introduces skip_prefixes() and uses it in
resume_execution() and is_IF_modifier() to skip prefixes.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <4C298A6E.8070609@hitachi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-29 10:43:41 +02:00
Tejun Heo b71ab8c202 workqueue: increase max_active of keventd and kill current_is_keventd()
Define WQ_MAX_ACTIVE and create keventd with max_active set to half of
it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1
works concurrently.  Unless some combination can result in dependency
loop longer than max_active, deadlock won't happen and thus it's
unnecessary to check whether current_is_keventd() before trying to
schedule a work.  Kill current_is_keventd().

(Lockdep annotations are broken.  We need lock_map_acquire_read_norecurse())

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
2010-06-29 10:07:14 +02:00
Thomas Gleixner f384c954c9 Merge branch 'linus' into perf/core
Reason: Further changes conflict with upstream fixes

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-06-28 22:33:24 +02:00
Linus Torvalds 5904b3b81d Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Fix undeclared ENOSYS in include/linux/tracepoint.h
  perf record: prevent kill(0, SIGTERM);
  perf session: Remove threads from tree on PERF_RECORD_EXIT
  perf/tracing: Fix regression of perf losing kprobe events
  perf_events: Fix Intel Westmere event constraints
  perf record: Don't call newt functions when not initialized
2010-06-28 12:24:43 -07:00
Linus Torvalds ab8aadbda7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, Calgary: Increase max PHB number
  x86: Fix rebooting on Dell Precision WorkStation T7400
  x86: Fix vsyscall on gcc 4.5 with -Os
  x86, pat: Proper init of memtype subtree_max_end
  um, hweight: Fix UML boot crash due to x86 optimized hweight
  x86, setup: Set ax register in boot vga query
  percpu, x86: Avoid warnings of unused variables in per cpu
  x86, irq: Rename gsi_end gsi_top, and fix off by one errors
  x86: use __ASSEMBLY__ rather than __ASSEMBLER__
2010-06-28 12:06:25 -07:00
Darrick J. Wong 499a00e92d x86, Calgary: Increase max PHB number
Newer systems (x3950M2) can have 48 PHBs per chassis and 8
chassis, so bump the limits up and provide an explanation
of the requirements for each class.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Corinna Schultz <cschultz@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
LKML-Reference: <20100624212647.GI15515@tux1.beaverton.ibm.com>
[ v2: Fixed build bug, added back PHBS_PER_CALGARY == 4 ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-25 16:14:58 +02:00
Frederic Weisbecker f7809daf64 x86: Support for instruction breakpoints
Instruction breakpoints need to have a specific length of 0 to
be working. Bring this support but also take care the user is not
trying to set an unsupported length, like a range breakpoint for
example.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
2010-06-24 23:35:27 +02:00
Frederic Weisbecker 0c4519e825 x86: Set resume bit before returning from breakpoint exception
Instruction breakpoints trigger before the instruction executes,
and returning back from the breakpoint handler brings us again
to the instruction that breakpointed. This naturally bring to
a breakpoint recursion.

To solve this, x86 has the Resume Bit trick. When the cpu flags
have the RF flag set, the next instruction won't trigger any
instruction breakpoint, and once this instruction is executed,
RF is cleared back.

This let's us jump back to the instruction that triggered the
breakpoint without recursion.

Use this when an instruction breakpoint triggers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jason Wessel <jason.wessel@windriver.com>
2010-06-24 23:35:15 +02:00
Andres Salomon 75a9cac430 x86, olpc: Add comment about implicit optimization barrier
Signed-off-by: Andres Salomon <dilinger@queued.net>
Cc: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <20100618174653.7755a39a@dev.queued.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-24 11:40:47 +02:00
Thomas Backlund 890ffedc7c x86: Fix rebooting on Dell Precision WorkStation T7400
Dell Precision WorkStation T7400 freezes on reboot unless
reboot=b is used.

Reference: https://qa.mandriva.com/show_bug.cgi?id=58017

Signed-off-by: Thomas Backlund <tmb@mandriva.org>
LKML-Reference: <4C1CC6E9.6000701@mandriva.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-20 09:24:13 +02:00
Andres Salomon fd699c7655 x86, olpc: Add support for calling into OpenFirmware
Add support for saving OFW's cif, and later calling into it to run OFW
commands.  OFW remains resident in memory, living within virtual range
0xff800000 - 0xffc00000.  A single page directory entry points to the
pgdir that OFW actually uses, so rather than saving the entire page
table, we grab and install that one entry permanently in the kernel's
page table.

This is currently only used by the OLPC XO.  Note that this particular
calling convention breaks PAE and PAT, and so cannot be used on newer
x86 hardware.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20100618174653.7755a39a@dev.queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-18 14:54:36 -07:00
H. Peter Anvin 05d0b0889c x86, vdso: Error out if the vdso contains external references
The vdso is a piece of userspace code which is supposed to be fully
self-contained.  Any external (undefined) reference is an error, and
should be caught at compile time.  This was giving us trouble when
compiling with -Os on gcc 4.5.0, for example (failed inline).

The need to do a buildtime check was pointed out by Andi Kleen.

Reported-by: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <tip-*@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-18 14:46:21 -07:00
Andi Kleen 124482935f x86: Fix vsyscall on gcc 4.5 with -Os
This fixes the -Os breaks with gcc 4.5 bug.  rdtsc_barrier needs to be
force inlined, otherwise user space will jump into kernel space and
kill init.

This also addresses http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44129
I believe.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100618210859.GA10913@basil.fritz.box>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
2010-06-18 14:16:31 -07:00
Jiri Slaby d7a0380dc3 x86-64, mm: Initialize VDSO earlier on 64 bits
When initrd is in use and a driver does request_module() in its
module_init (i.e. __initcall or device_initcall), a modprobe process
is created with VDSO mapping. But VDSO is inited even in __initcall,
i.e. on the same level (at the same time), so it may not be inited
yet (link order matters).

Move the VDSO initialization code earlier by switching to something
before rootfs_initcall where initrd is loaded as rootfs. Specifically
to subsys_initcall. Do it for standard 64-bit path (init_vdso_vars)
and for compat (sysenter_setup), just in case people have 32-bit
initrd and ia32 emulation built-in.

i386 (pure 32-bit) is not affected, since sysenter_setup() is called
from check_bugs()->identify_boot_cpu() in start_kernel() before
rest_init()->kernel_thread(kernel_init) where even kernel_init() calls
do_basic_setup()->do_initcalls().

What this patch fixes are early modprobe crashes such as:
Unpacking initramfs...
Freeing initrd memory: 9324k freed
modprobe[368]: segfault at 7fff4429c020 ip 00007fef397e160c \
    sp 00007fff442795c0 error 4 in ld-2.11.2.so[7fef397df000+1f000]

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
LKML-Reference: <1276720242-13365-1-git-send-email-jslaby@suse.cz>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-18 13:48:14 -07:00
Robert Schöne c882e0feb9 x86, perf: Add power_end event to process_*.c cpu_idle routine
Systems using the idle thread from process_32.c and process_64.c
do not generate power_end events which could be traced using
perf. This patch adds the event generation for such systems.

Signed-off-by: Robert Schoene <robert.schoene@tu-dresden.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1276515440.5441.45.camel@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-18 11:35:10 +02:00
Marcin Slusarz 8b8f79b927 x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
After every iounmap mmiotrace has to free kmmio_fault_pages, but
it can't do it directly, so it defers freeing by RCU.

It usually works, but when mmiotraced code calls ioremap-iounmap
multiple times without sleeping between (so RCU won't kick in
and start freeing) it can be given the same virtual address, so
at every iounmap mmiotrace will schedule the same pages for
release. Obviously it will explode on second free.

Fix it by marking kmmio_fault_pages which are scheduled for
release and not adding them second time.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Marcin Kocielnicki <koriakin@0x04.net>
Tested-by: Shinpei KATO <shinpei@il.is.s.u-tokyo.ac.jp>
Acked-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Marcin Kocielnicki <koriakin@0x04.net>
Cc: nouveau@lists.freedesktop.org
Cc: <stable@kernel.org>
LKML-Reference: <20100613215654.GA3829@joi.lan>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-18 11:30:09 +02:00
Ingo Molnar 646b1db495 Merge commit 'v2.6.35-rc3' into perf/core
Merge reason: Go from -rc1 base to -rc3 base, merge in fixes.
2010-06-18 10:53:19 +02:00
Venkatesh Pallipadi 23016bf0d2 x86: Look for IA32_ENERGY_PERF_BIAS support
The new IA32_ENERGY_PERF_BIAS MSR allows system software to give
hardware a hint whether OS policy favors more power saving,
or more performance.  This allows the OS to have some influence
on internal hardware power/performance tradeoffs where the OS
has previously had no influence.

The support for this feature is indicated by CPUID.06H.ECX.bit3,
as documented in the Intel Architectures Software Developer's Manual.

This patch discovers support of this feature and displays it
as "epb" in /proc/cpuinfo.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.LFD.2.00.1006032310160.6669@localhost.localdomain>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-16 13:37:32 -07:00
Jiri Kosina f1bbbb6912 Merge branch 'master' into for-next 2010-06-16 18:08:13 +02:00
Uwe Kleine-König 421f91d21a fix typos concerning "initiali[zs]e"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-06-16 18:05:05 +02:00
Paul E. McKenney ec8c27e04f mce: convert to rcu_dereference_index_check()
The mce processing applies rcu_dereference_check() to integers used as
array indices.  This patch therefore moves mce to the new RCU API
rcu_dereference_index_check() that avoids the sparse processing that
would otherwise result in compiler errors.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2010-06-14 16:37:28 -07:00
Len Brown 42de5532f4 Merge branch 'bugzilla-13931-sleep-nvs' into release
Conflicts:
	drivers/acpi/sleep.c

Signed-off-by: Len Brown <len.brown@intel.com>
2010-06-12 01:15:40 -04:00
Linus Torvalds 7ae1277a52 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / x86: Save/restore MISC_ENABLE register
2010-06-11 14:19:45 -07:00
Linus Torvalds eda054770e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: clear bridge resource range if BIOS assigned bad one
  PCI: hotplug/cpqphp, fix NULL dereference
  Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/"
  PCI: change resource collision messages from KERN_ERR to KERN_INFO
2010-06-11 14:15:44 -07:00
Venkatesh Pallipadi 6a4f3b5237 x86, pat: Proper init of memtype subtree_max_end
subtree_max_end that was recently added to struct memtype was not getting
properly initialized resulting in

WARNING: kmemcheck: Caught 64-bit read from uninitialized memory
in memtype_rb_augment_cb()
reported here
https://bugzilla.kernel.org/show_bug.cgi?id=16092

This change fixes the problem.

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Tested-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1276217101-11515-1-git-send-email-venki@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2010-06-11 14:12:22 -07:00
Yinghai Lu 837c4ef13c PCI: clear bridge resource range if BIOS assigned bad one
Yannick found that video does not work with 2.6.34.  The cause of this
bug was that the BIOS had assigned the wrong range to the PCI bridge
above the video device.  Before 2.6.34 the kernel would have shrunk
the size of the bridge window, but since
  d65245c PCI: don't shrink bridge resources
the kernel will avoid shrinking BIOS ranges.

So zero out the old range if we fail to claim it at boot time; this will
cause us to allocate a new range at startup, restoring the 2.6.34
behavior.

Fixes regression https://bugzilla.kernel.org/show_bug.cgi?id=16009.

Reported-by: Yannick <yannick.roehlly@free.fr>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-06-11 13:24:51 -07:00
Huang Ying a2d7b0d485 x86, mce: Use HW_ERR in MCE handler
Use HW_ERR printk prefix in MCE handler. To make it more explicit that
this is hardware error instead of software error.

Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1275978939.3444.668.camel@yhuang-dev.sh.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-06-10 21:28:49 -07:00
Huang Ying 3c41758860 x86, mce: Fix MSR_IA32_MCI_CTL2 CMCI threshold setup
It is reported that CMCI is not raised when number of corrected error
reaches preset threshold. After inspection, it is found that
MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
fixed it.

Value of MCI_CTL2_CMCI_THRESHOLD_MASK is fixed according to x86_64
Software Developer's Manual too.

Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1275977350.3444.660.camel@yhuang-dev.sh.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-06-10 21:27:36 -07:00
Huang Ying 1f9a0bd498 x86, mce: Rename MSR_IA32_MCx_CTL2 value
Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.

Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1275977348.3444.659.camel@yhuang-dev.sh.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-06-10 21:27:26 -07:00
Andi Kleen cf3bdc29fc x86, setup: Set ax register in boot vga query
Catch missing conversion to the register structure "glove box" scheme.

Found by gcc 4.6's new warnings.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100610111040.F1781B1A2B@basil.firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-06-10 15:24:29 -07:00
Andi Kleen 23b764d056 percpu, x86: Avoid warnings of unused variables in per cpu
Avoid hundreds of warnings with a gcc 4.6 -Wall build.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-11 00:03:45 +02:00
Matthew Garrett dd4c4f17d7 suspend: Move NVS save/restore code to generic suspend functionality
Saving platform non-volatile state may be required for suspend to RAM as
well as hibernation. Move it to more generic code.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-06-10 11:02:34 -04:00
Stephane Eranian d11007703c perf_events: Fix Intel Westmere event constraints
Based on Intel Vol3b (March 2010), the event
SNOOPQ_REQUEST_OUTSTANDING is restricted to counters 0,1 so
update the event table for Intel Westmere accordingly.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: <stable@kernel.org> # .34.x
LKML-Reference: <4c10cb56.5120e30a.2eb4.ffffc3de@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-10 14:16:32 +02:00
Borislav Petkov 12d8a96128 x86, AMD: Extend support to future families
Extend support to future families, and in particular:

* extend direct mapping split of Tseg SMM area.
* extend K8 flavored alternatives (NOPS).
* rep movs* prefix is fast in ucode.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100602182921.GA21557@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-09 15:57:47 -07:00
Borislav Petkov 8cc1176e5d x86, cacheinfo: Carve out L3 cache slot accessors
This is in preparation for disabling L3 cache indices after having
received correctable ECCs in the L3 cache. Now we allow for initial
setting of a disabled index slot (write once) and deny writing new
indices to it after it has been disabled. Also, we deny using both slots
to disable one and the same index.

Userspace can restore the previously disabled indices by rewriting those
sysfs entries when booting.

Cleanup and reorganize code while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100602161840.GI18327@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-09 15:57:41 -07:00
Dan Carpenter d6d4d4205c x86, xsave: Cleanup return codes in check_for_xstate()
The places which call check_for_xstate() only care about zero or
non-zero so this patch doesn't change how the code runs, but it's a
cleanup.  The main reason for this patch is that I'm looking for places
which don't return -EFAULT for copy_from_user() failures.

Signed-off-by: Dan Carpenter <error27@gmail.com>
LKML-Reference: <20100603100746.GU5483@bicker>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2010-06-09 15:57:36 -07:00
Eric W. Biederman a4384df3e2 x86, irq: Rename gsi_end gsi_top, and fix off by one errors
When I introduced the global variable gsi_end I thought gsi_end on
io_apics was one past the end of the gsi range for the io_apic.  After
it was pointed out the the range on io_apics was inclusive I changed
my global variable to match.  That was a big mistake.  Inclusive
semantics without a range start cannot describe the case when no gsi's
are allocated.  Describing the case where no gsi's are allocated is
important in sfi.c and mpparse.c so that we can assign gsi numbers
instead of blindly copying the gsi assignments the BIOS has done as we
do in the acpi case.

To keep from getting the global variable confused with the gsi range
end rename it gsi_top.

To allow describing the case where no gsi's are allocated have gsi_top
be one place the highest gsi number seen in the system.

This fixes an off by one bug in sfi.c:
Reported-by: jacob pan <jacob.jun.pan@linux.intel.com>

This fixes the same off by one bug in mpparse.c:

This fixes an off unreachable by one bug in acpi/boot.c:irq_to_gsi
Reported-by: Yinghai <yinghai.lu@oracle.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <m17hm9jre7.fsf_-_@fess.ebiederm.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-09 13:34:06 -07:00
Ingo Molnar c726b61c6a Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-06-09 18:55:57 +02:00
Avi Kivity 69325a1225 KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page
If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0.
We do that by setting spte.w=1, since the host cr0.wp must remain set so the
host can write protect pages.  Once we allow write access, we must remove
user access otherwise we mistakenly allow the user to write the page.

Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09 18:48:37 +03:00
Marcelo Tosatti 3be2264be3 KVM: MMU: invalidate and flush on spte small->large page size change
Always invalidate spte and flush TLBs when changing page size, to make
sure different sized translations for the same address are never cached
in a CPU's TLB.

Currently the only case where this occurs is when a non-leaf spte pointer is
overwritten by a leaf, large spte entry. This can happen after dirty
logging is disabled on a memslot, for example.

Noticed by Andrea.

KVM-Stable-Tag
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09 18:48:36 +03:00
Joerg Roedel 67ec660777 KVM: SVM: Implement workaround for Erratum 383
This patch implements a workaround for AMD erratum 383 into
KVM. Without this erratum fix it is possible for a guest to
kill the host machine. This patch implements the suggested
workaround for hypervisors which will be published by the
next revision guide update.

[jan: fix overflow warning on i386]
[xiao: fix unused variable warning]

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09 18:47:20 +03:00
Joerg Roedel fe5913e4e1 KVM: SVM: Handle MCEs early in the vmexit process
This patch moves handling of the MC vmexits to an earlier
point in the vmexit. The handle_exit function is too late
because the vcpu might alreadry have changed its physical
cpu.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09 18:39:10 +03:00
Oleg Nesterov 018378c55b x86: Unify save_stack_address() and save_stack_address_nosched()
Cleanup. Factor the common code in save_stack_address() and
save_stack_address_nosched().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20100603193243.GA31534@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-06-09 17:32:19 +02:00
Oleg Nesterov 147ec4d236 x86: Make save_stack_address() !CONFIG_FRAME_POINTER friendly
If CONFIG_FRAME_POINTER=n, print_context_stack() shouldn't neglect the
non-reliable addresses on stack, this is all we have if dump_trace(bp)
is called with the wrong or zero bp.

For example, /proc/pid/stack doesn't work if CONFIG_FRAME_POINTER=n.

This patch obviously has no effect if CONFIG_FRAME_POINTER=y, otherwise
it reverts 1650743c "x86: don't save unreliable stack trace entries".

Also, remove the unnecessary type-cast.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100603193239.GA31530@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-06-09 17:32:15 +02:00
Peter Zijlstra e78505958c perf: Convert perf_event to local_t
Since now all modification to event->count (and ->prev_count
and ->period_left) are local to a cpu, change then to local64_t so we
avoid the LOCK'ed ops.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-09 11:12:37 +02:00
Peter Zijlstra 1996bda2a4 arch: Implement local64_t
On 64bit, local_t is of size long, and thus we make local64_t an alias.
On 32bit, we fall back to atomic64_t. (architecture can provide optimized
32-bit version)

(This new facility is to be used by perf events optimizations.)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-arch@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-09 11:12:36 +02:00
Cyrill Gorcunov 68aa00ac0a perf, x86: Make a second write to performance counter if needed
On Netburst PMU we need a second write to a performance counter
due to cpu erratum.

A simple flag test instead of alternative instructions was choosen
because wrmsrl is already a macro and if virtualization is turned
on will need an additional wrapper call which is more expencise.

nb: we should propably switch to jump-labels as only this facility
reach the mainline.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100602212304.GC5264@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-09 11:12:35 +02:00
Peter Zijlstra 8d2cacbbb8 perf: Cleanup {start,commit,cancel}_txn details
Clarify some of the transactional group scheduling API details
and change it so that a successfull ->commit_txn also closes
the transaction.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1274803086.5882.1752.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-09 11:12:34 +02:00
Frederic Weisbecker b0f82b81fe perf: Drop the skip argument from perf_arch_fetch_regs_caller
Drop this argument now that we always want to rewind only to the
state of the first caller.
It means frame pointers are not necessary anymore to reliably get
the source of an event. But this also means we need this helper
to be a macro now, as an inline function is not an option since
we need to know when to provide a default implentation.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-06-08 23:31:27 +02:00
Frederic Weisbecker c9cf4dbb4d x86: Unify dumpstack.h and stacktrace.h
arch/x86/include/asm/stacktrace.h and arch/x86/kernel/dumpstack.h
declare headers of objects that deal with the same topic.
Actually most of the files that include stacktrace.h also include
dumpstack.h

Although dumpstack.h seems more reserved for internals of stack
traces, those are quite often needed to define specialized stack
trace operations. And perf event arch headers are going to need
access to such low level operations anyway. So don't continue to
bother with dumpstack.h as it's not anymore about isolated deep
internals.

v2: fix struct stack_frame definition conflict in sysprof

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Soeren Sandmann <sandmann@daimi.au.dk>
2010-06-08 23:29:52 +02:00
Cliff Wickman f6d8a56693 x86, UV: Modularize BAU send and wait
Streamline the large uv_flush_send_and_wait() function by use of
a couple of helper functions.

And remove some excess comments.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004ay-IH@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:48 +02:00
Cliff Wickman 450a007eeb x86, UV: BAU broadcast to the local hub
Make the Broadcast Assist Unit driver use the BAU for TLB
shootdowns of cpu's on the local uvhub.

It was previously thought that IPI might be faster to the cpu's
on the local hub.  But the IPI operation would have to follow
the completion of the BAU broadcast anyway.  So we broadcast to
the local uvhub in all cases except when the current cpu was the
only local cpu in the mask.

This simplifies uv_flush_send_and_wait() in that it returns
either all shootdowns complete, or none.

Adjust the statistics to account for shootdowns on the local
uvhub.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aq-G7@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:48 +02:00
Cliff Wickman 7fba1bcd48 x86, UV: Correct BAU regular message type
The Broadcast Assist Unit messages have a regular or retry
message type. The regular type was not being set, but needs to
be, because the lack of a message type is sometimes used to
identify an unused entry in the message queue.
Also removing some excess comments.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004ak-Dy@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:47 +02:00
Cliff Wickman 90cc7d9449 x86, UV: Remove BAU check for stay-busy
Remove a faulty assumption that a long running BAU request has
encountered a hardware problem and will never finish.

Numalink congestion can make a request appear to have
encountered such a problem, but it is not safe to cancel the
request.  If such a cancel is done but a reply is later received
we can miss a TLB shootdown.

We depend upon the max_bau_concurrent 'throttle' to prevent the
stay-busy case from happening.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004ad-BV@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:47 +02:00
Cliff Wickman a8328ee58c x86, UV: Correct BAU discovery of hubs and sockets
Correct the initialization-time assumption of contigous blade
numbers and of sockets numbered from zero.

There may be hubs present with no cpu's enabled.
There may be disabled sockets such that the active socket is not
number zero.

And assign a 'socket master' by assuming that a socket is a
node. (it is not safe to extract socket number from an apicid)

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aW-9S@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:46 +02:00
Cliff Wickman 39847e7f3c x86, UV: Correct BAU software acknowledge
Correct the acknowledgment and the reset of a BAU
software-acknowledged message.

A retry message should be testing only for timed-out resources
(mask << 8). (And we delete a log message that might cause
unnecessary concern) The acknowledge MMR is
|--timed-out--|---pending--|,  each is 8 bits.

The IPI-driven reset of software acknowledge resources frees
both timed out and pending resources.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aP-7O@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:46 +02:00
Cliff Wickman 4faca15508 x86, UV: BAU structure rearranging
Move some structure definitions from the C code to the BAU
header file, and change the organization of that header file a
little.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aI-54@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:45 +02:00
Cliff Wickman 712157aa70 x86, UV: Shorten access to BAU statistics structure
Use a pointer from the per-cpu BAU control structure to the
per-cpu BAU statistics structure.
We nearly always know the first before needing the second.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aB-2k@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:45 +02:00
Cliff Wickman 50fb55acc5 x86, UV: Disable BAU on network congestion
The numalink network can become so congested that TLB shootdown
using the Broadcast Assist Unit becomes slower than using IPI's.

In that case, disable the use of the BAU for a period of time.
The period is tunable.  When the period expires the use of the
BAU is re-enabled. A count of these actions is added to the
statistics file.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004a4-0a@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:45 +02:00
Cliff Wickman e8e5e8a804 x86, UV: BAU tunables into a debugfs file
Make the Broadcast Assist Unit driver's nine tuning values variable by
making them accessible through a read/write debugfs file.

The file will normally be mounted as
/sys/kernel/debug/sgi_uv/bau_tunables. The tunables are kept in each
cpu's per-cpu BAU structure.

The patch also does a little name improvement, and corrects the reset of
two destination timeout counters.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNx-0004Zx-Uo@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:44 +02:00
Cliff Wickman 12a6611fa1 x86, UV: Calculate BAU destination timeout
Calculate the Broadcast Assist Unit's destination timeout period from the
values in the relevant MMR's.

Store it in each cpu's per-cpu BAU structure so that a destination
timeout can be differentiated from a 'plugged' situation in which all
software ack resources are already allocated and a timeout is pending.
That case returns an immediate destination error.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNx-0004Zq-RK@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 21:13:44 +02:00
Livio Soares e768aee89c perf, x86: Small fix to cpuid10_edx
Fixes to 'cpuid10_edx' to comply with Intel documentation.
According to the Intel Manual, Volume 2A, Table 3-12, the cpuid for
architecture performance monitoring returns, in EDX, two pieces of
information:

  1) Number of fixed-function counters (5 bits, not 4)
  2) Width of fixed-function counters (8 bits)

Signed-off-by: Livio Soares <livio@eecg.toronto.edu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 20:27:04 +02:00
Andres Salomon fc4ac7a5f5 x86: use __ASSEMBLY__ rather than __ASSEMBLER__
As Ingo pointed out in a separate patch, we should be using __ASSEMBLY__.
Make that the case in pgtable headers.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20100605114042.35ac69c1@dev.queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-06-07 17:27:11 -07:00
Ondrej Zary 85a0e75397 PM / x86: Save/restore MISC_ENABLE register
Save/restore MISC_ENABLE register on suspend/resume.
This fixes OOPS (invalid opcode) on resume from STR on Asus P4P800-VM,
which wakes up with MWAIT disabled.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=15385

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Tested-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-06-08 00:32:49 +02:00
Alex Nixon 08bbc9da92 xen: Add xen_create_contiguous_region
A memory region must be physically contiguous in order to be accessed
through DMA.  This patch adds xen_create_contiguous_region, which
ensures a region of contiguous virtual memory is also physically
contiguous.

Based on Stephen Tweedie's port of the 2.6.18-xen version.

Remove contiguous_bitmap[] as it's no longer needed.

Ported from linux-2.6.18-xen.hg 707:e410857fd83c

[ Impact: add Xen-internal API to make pages phys-contig ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-06-07 15:37:53 -04:00
Alex Nixon 19001c8c5b xen: Rename the balloon lock
* xen_create_contiguous_region needs access to the balloon lock to
  ensure memory doesn't change under its feet, so expose the balloon
  lock
* Change the name of the lock to xen_reservation_lock, to imply it's
  now less-specific usage.

[ Impact: cleanup ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-06-07 14:34:07 -04:00
Alex Nixon 7347b4082e xen: Allow unprivileged Xen domains to create iomap pages
PV DomU domains are allowed to map hardware MFNs for PCI passthrough,
but are not generally allowed to map raw machine pages.  In particular,
various pieces of code try to map DMI and ACPI tables in the ISA ROM
range.  We disallow _PAGE_IOMAP for those mappings, so that they are
redirected to a set of local zeroed pages we reserve for that purpose.

[ Impact: prevent passthrough of ISA space, as we only allow PCI ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-06-07 14:33:13 -04:00
Jeremy Fitzhardinge c0011dbfce xen: use _PAGE_IOMAP in ioremap to do machine mappings
In a Xen domain, ioremap operates on machine addresses, not
pseudo-physical addresses.  We use _PAGE_IOMAP to determine whether a
mapping is intended for machine addresses.

[ Impact: allow Xen domain to map real hardware ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-06-07 14:32:33 -04:00
Linus Torvalds 9a9620db07 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits)
  i7core_edac: Better describe the supported devices
  Add support for Westmere to i7core_edac driver
  i7core_edac: don't free on success
  i7core_edac: Add support for X5670
  Always call i7core_[ur]dimm_check_mc_ecc_err
  i7core_edac: fix memory leak of i7core_dev
  EDAC: add __init to i7core_xeon_pci_fixup
  i7core_edac: Fix wrong device id for channel 1 devices
  i7core: add support for Lynnfield alternate address
  i7core_edac: Add initial support for Lynnfield
  i7core_edac: do not export static functions
  edac: fix i7core build
  edac: i7core_edac produces undefined behaviour on 32bit
  i7core_edac: Use a more generic approach for probing PCI devices
  i7core_edac: PCI device is called NONCORE, instead of NOCORE
  i7core_edac: Fix ringbuffer maxsize
  i7core_edac: First store, then increment
  i7core_edac: Better parse "any" addrmask
  i7core_edac: Use a lockless ringbuffer
  edac: Create an unique instance for each kobj
  ...
2010-06-04 15:39:54 -07:00
Robert Richter d8a382d266 Merge remote branch 'tip/perf/urgent' into oprofile/urgent 2010-06-04 11:33:10 +02:00
Linus Torvalds 167b712904 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, smpboot: Fix cores per node printing on boot
  x86/amd-iommu: Fall back to GART if initialization fails
  x86/amd-iommu: Fix crash when request_mem_region fails
  x86/mm: Remove unused DBG() macro
  arch/x86/kernel: Add missing spin_unlock
2010-06-03 15:47:22 -07:00
Linus Torvalds b01b7dc283 Merge branch 'for-linus/bugfixes' of git://xenbits.xensource.com/people/ianc/linux-2.6
* 'for-linus/bugfixes' of git://xenbits.xensource.com/people/ianc/linux-2.6:
  xen: avoid allocation causing potential swap activity on the resume path
  xen: ensure timer tick is resumed even on CPU driving the resume
2010-06-03 15:46:09 -07:00
Linus Torvalds f150dba6d4 Merge branch 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix crash in swevents
  perf buildid-list: Fix --with-hits event processing
  perf scripts python: Give field dict to unhandled callback
  perf hist: fix objdump output parsing
  perf-record: Check correct pid when forking
  perf: Do the comm inheritance per thread in event__process_task
  perf: Use event__process_task from perf sched
  perf: Process comm events by tid
  blktrace: Fix new kernel-doc warnings
  perf_events: Fix unincremented buffer base on partial copy
  perf_events: Fix event scheduling issues introduced by transactional API
  perf_events, trace: Fix perf_trace_destroy(), mutex went missing
  perf_events, trace: Fix probe unregister race
  perf_events: Fix races in group composition
  perf_events: Fix races and clean up perf_event and perf_mmap_data interaction
2010-06-03 15:45:26 -07:00
Ian Campbell cd52e17ea8 xen: ensure timer tick is resumed even on CPU driving the resume
The core suspend/resume code is run from stop_machine on CPU0 but
parts of the suspend/resume machinery (including xen_arch_resume) are
run on whichever CPU happened to schedule the xenwatch kernel thread.

As part of the non-core resume code xen_arch_resume is called in order
to restart the timer tick on non-boot processors. The boot processor
itself is taken care of by core timekeeping code.

xen_arch_resume uses smp_call_function which does not call the given
function on the current processor. This means that we can end up with
one CPU not receiving timer ticks if the xenwatch thread happened to
be scheduled on CPU > 0.

Use on_each_cpu instead of smp_call_function to ensure the timer tick
is resumed everywhere.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Stable Kernel <stable@kernel.org> # .32.x
2010-06-03 09:34:04 +01:00
Borislav Petkov 4adc8b71cc x86, smpboot: Fix cores per node printing on boot
Percpu initialization happens now after booting the cores on the
machine and this causes them all to be displayed as belonging to
node 0:

Jun  8 05:57:21 kepek kernel: [    0.106999] Booting Node   0,
Processors  #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 Ok.

Use early_cpu_to_node() to get the correct node of each core
instead.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mike Travis <travis@sgi.com>
LKML-Reference: <20100601190455.GA14237@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-02 09:38:53 +02:00
Linus Torvalds 1f73897861 Merge branch 'for-35' of git://repo.or.cz/linux-kbuild
* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
  kbuild: Revert part of e8d400a to resolve a conflict
  kbuild: Fix checking of scm-identifier variable
  gconfig: add support to show hidden options that have prompts
  menuconfig: add support to show hidden options which have prompts
  gconfig: remove show_debug option
  gconfig: remove dbg_print_ptype() and dbg_print_stype()
  kconfig: fix zconfdump()
  kconfig: some small fixes
  add random binaries to .gitignore
  kbuild: Include gen_initramfs_list.sh and the file list in the .d file
  kconfig: recalc symbol value before showing search results
  .gitignore: ignore *.lzo files
  headerdep: perlcritic warning
  scripts/Makefile.lib: Align the output of LZO
  kbuild: Generate modules.builtin in make modules_install
  Revert "kbuild: specify absolute paths for cscope"
  kbuild: Do not unnecessarily regenerate modules.builtin
  headers_install: use local file handles
  headers_check: fix perl warnings
  export_report: fix perl warnings
  ...
2010-06-01 08:55:52 -07:00
Ingo Molnar c8fcb14fec Merge branch 'amd-iommu/2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-06-01 11:45:45 +02:00
Joerg Roedel d7f0776975 x86/amd-iommu: Fall back to GART if initialization fails
This patch implements a fallback to the GART IOMMU if this
is possible and the AMD IOMMU initialization failed.
Otherwise the fallback would be nommu which is very
problematic on machines with more than 4GB of memory or
swiotlb which hurts io-performance.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-06-01 10:20:15 +02:00
Joerg Roedel e82752d8b5 x86/amd-iommu: Fix crash when request_mem_region fails
When request_mem_region fails the error path tries to
disable the IOMMUs. This accesses the mmio-region which was
not allocated leading to a kernel crash. This patch fixes
the issue.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-06-01 10:03:08 +02:00
Joerg Roedel 1d61e73ab4 Merge commit 'v2.6.35-rc1' into amd-iommu/2.6.35 2010-06-01 09:57:49 +02:00
Akinobu Mita e565813ab9 x86/mm: Remove unused DBG() macro
DBG() macro for CONFIG_DEBUG_PER_CPU_MAPS is unused.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <1274706291-13554-1-git-send-email-akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-31 10:01:53 +02:00
Stephane Eranian 90151c35b1 perf_events: Fix event scheduling issues introduced by transactional API
The transactional API patch between the generic and model-specific
code introduced several important bugs with event scheduling, at
least on X86. If you had pinned events, e.g., watchdog,  and were
over-committing the PMU, you would get bogus counts. The bug was
showing up on Intel CPU because events would move around more
often that on AMD. But the problem also existed on AMD, though
harder to expose.

The issues were:

 - group_sched_in() was missing a cancel_txn() in the error path

 - cpuc->n_added was not properly maintained, leading to missing
   actions in hw_perf_enable(), i.e., n_running being 0. You cannot
   update n_added until you know the transaction has succeeded. In
   case of failed transaction n_added was not adjusted back.

 - in case of failed transactions, event_sched_out() was called
   and eventually invoked x86_disable_event() to touch the HW reg.
   But with transactions, on X86, event_sched_in() does not touch
   HW registers, it simply collects events into a list. Thus, you
   could end up calling x86_disable_event() on a counter which
   did not correspond to the current event when idx != -1.

The patch modifies the generic and X86 code to avoid all those problems.

First, we keep track of the number of events added last. In case the
transaction fails, we substract them from n_added. This approach is
necessary (as opposed to delaying updates to n_added) because not all
event updates use the transaction API, e.g., single events.

Second, we encapsulate the event_sched_in() and event_sched_out() in
group_sched_in() inside the transaction. That makes the operations
symmetrical and you can also detect that you are inside a transaction
and skip the HW reg access by checking cpuc->group_flag.

With this patch, you can now overcommit the PMU even with pinned
system-wide events present and still get valid counts.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274796225.5882.1389.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-31 08:46:10 +02:00
Linus Torvalds 17d30ac077 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (47 commits)
  mfd: Rename twl5031 sih modules
  mfd: Storage class for timberdale should be before const qualifier
  mfd: Remove unneeded and dangerous clearing of clientdata
  mfd: New AB8500 driver
  gpio: Fix inverted rdc321x gpio data out registers
  mfd: Change rdc321x resources flags to IORESOURCE_IO
  mfd: Move pcf50633 irq related functions to its own file.
  mfd: Use threaded irq for pcf50633
  mfd: pcf50633-adc: Fix potential race in pcf50633_adc_sync_read
  mfd: Fix pcf50633 bitfield logic in interrupt handler
  gpio: rdc321x needs to select MFD_CORE
  mfd: Use menuconfig for quicker config editing
  ARM: AB3550 board configuration and irq for U300
  mfd: AB3550 core driver
  mfd: AB3100 register access change to abx500 API
  mfd: Renamed ab3100.h to abx500.h
  gpio: Add TC35892 GPIO driver
  mfd: Add Toshiba's TC35892 MFD core
  mfd: Delay to mask tsc irq in max8925
  mfd: Remove incorrect wm8350 kfree
  ...
2010-05-30 09:13:08 -07:00
Linus Torvalds 021fad8b70 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cpufeature: Unbreak compile with gcc 3.x
  x86, pat: Fix memory leak in free_memtype
  x86, k8: Fix section mismatch for powernowk8_exit()
  lib/atomic64_test: fix missing include of linux/kernel.h
  x86: remove last traces of quicklist usage
  x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012
  x86: "nosmp" command line option should force the system into UP mode
  arch/x86/pci: use kasprintf
  x86, apic: ack all pending irqs when crashed/on kexec
2010-05-30 09:06:13 -07:00
Linus Torvalds 35926ff5fb Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()"
This reverts commit 0ac0c0d0f8, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.

Requested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-30 09:00:03 -07:00
Linus Torvalds e4f2e5eaac Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  intel_idle: native hardware cpuidle driver for latest Intel processors
  ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case
  acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING
  sched: clarify commment for TS_POLLING
  ACPI: allow a native cpuidle driver to displace ACPI
  cpuidle: make cpuidle_curr_driver static
  cpuidle: add cpuidle_unregister_driver() error check
  cpuidle: fail to register if !CONFIG_CPU_IDLE
2010-05-28 16:14:17 -07:00
Linus Torvalds 9a90e09854 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits)
  ACPI: Don't let acpi_pad needlessly mark TSC unstable
  drivers/acpi/sleep.h: Checkpatch cleanup
  ACPI: Minor cleanup eliminating redundant PMTIMER_TICKS to NS conversion
  ACPI: delete unused c-state promotion/demotion data strucutures
  ACPI: video: fix acpi_backlight=video
  ACPI: EC: Use kmemdup
  drivers/acpi: use kasprintf
  ACPI, APEI, EINJ injection parameters support
  Add x64 support to debugfs
  ACPI, APEI, Use ERST for persistent storage of MCE
  ACPI, APEI, Error Record Serialization Table (ERST) support
  ACPI, APEI, Generic Hardware Error Source memory error support
  ACPI, APEI, UEFI Common Platform Error Record (CPER) header
  Unified UUID/GUID definition
  ACPI Hardware Error Device (PNP0C33) support
  ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup
  ACPI, APEI, Document for APEI
  ACPI, APEI, EINJ support
  ACPI, APEI, HEST table parsing
  ACPI, APEI, APEI supporting infrastructure
  ...
2010-05-28 14:42:18 -07:00
Len Brown d3b383338f Merge branch 'ht-delete-2.6.35' into release 2010-05-28 16:20:35 -04:00
Len Brown 91dd696439 Merge branch 'acpi_enable' into release 2010-05-28 16:17:27 -04:00
Len Brown dc1544ea5d Merge branch 'bjorn-pci-root-v4-2.6.35' into release 2010-05-28 16:17:16 -04:00
Len Brown e45b7fa230 sched: clarify commment for TS_POLLING
TS_POLLING set tells the scheduler an idle_task will poll
need_resched() to look for work.

TS_POLLING clear tells resched_task() and wake_up_idle_cpu()
that the remote CPU's idle_task is now sleeping in idle,
and thus requires a reschedule interrupt notice work.

Update the description of TS_POLLING to reflect how it works.
"idle task polling need_resched, skip sending interrupt"

Wordsmithing-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
2010-05-27 21:07:05 -04:00
Florian Fainelli f52e62dcc1 x86: remove rdc321x_defs.h
This file is replaced by a cleaner version with the adding of a MFD driver for
the southbridge.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:30 +02:00
Linus Torvalds c5617b200a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
  tracing: Add __used annotation to event variable
  perf, trace: Fix !x86 build bug
  perf report: Support multiple events on the TUI
  perf annotate: Fix up usage of the build id cache
  x86/mmiotrace: Remove redundant instruction prefix checks
  perf annotate: Add TUI interface
  perf tui: Remove annotate from popup menu after failure
  perf report: Don't start the TUI if -D is used
  perf: Fix getline undeclared
  perf: Optimize perf_tp_event_match()
  perf: Remove more code from the fastpath
  perf: Optimize the !vmalloc backed buffer
  perf: Optimize perf_output_copy()
  perf: Fix wakeup storm for RO mmap()s
  perf-record: Share per-cpu buffers
  perf-record: Remove -M
  perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
  perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
  perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
  perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
  ...
2010-05-27 15:23:47 -07:00
H. Peter Anvin 1ba4f22c42 x86, cpufeature: Unbreak compile with gcc 3.x
gcc 3 is too braindamaged to be able to compile static_cpu_has() --
apparently it can't tell that a constant passed to an inline function
is still a constant -- so if we're using gcc 3, just use the dynamic
test.  This is bad for performance, but if you care about performance,
don't use an ancient, known-to-optimize-poorly compiler.

Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4BF2FF82.7090005@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-27 12:02:00 -07:00
Lee Schermerhorn e534c7c5f8 numa: x86_64: use generic percpu var numa_node_id() implementation
x86 arch specific changes to use generic numa_node_id() based on generic
percpu variable infrastructure.  Back out x86's custom version of
numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:57 -07:00
Lee Schermerhorn 7281201922 numa: add generic percpu var numa_node_id() implementation
Rework the generic version of the numa_node_id() function to use the new
generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option to 'y'
when NUMA is configured.  This config option could be removed if/when all
archs switch over to the generic percpu implementation of numa_node_id().
Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured.  This is required because I have
     retained the old implementation by default to allow archs to
     be modified incrementally, as desired.

Subsequent patches will convert x86_64 and ia64 to use this implemenation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:57 -07:00
FUJITA Tomonori 1ef04370d8 asm-generic: remove ARCH_HAS_SG_CHAIN in scatterlist.h
There are more architectures that don't support ARCH_HAS_SG_CHAIN than
those that support it.  This removes removes ARCH_HAS_SG_CHAIN in
asm-generic/scatterlist.h and lets arhictectures to define it.

It's clearer than defining ARCH_HAS_SG_CHAIN asm-generic/scatterlist.h and
undefing it in arhictectures that don't support it.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:54 -07:00
Andrew Morton 4a14d84ea2 x86_32: use asm-generic/scatterlist.h
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:54 -07:00
FUJITA Tomonori 18e98307de asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()
There are only two ways to define sg_dma_len(); use sg->dma_length or
sg->length.  This patch introduces NEED_SG_DMA_LENGTH that enables
architectures to choose sg->dma_length or sg->length.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:54 -07:00
FUJITA Tomonori de006a071c x86: remove unnecessary sync_single_range_* in swiotlb_dma_ops
sync_single_range_for_cpu and sync_single_range_for_device hooks in
swiotlb_dma_ops are unnecessary because sync_single_for_cpu and
sync_single_for_device are used there.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Akinobu Mita a94247e7fb x86: convert cpu notifier to return encapsulate errno value
By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for msr, cpuid, and
therm_throt.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:48 -07:00
Jack Steiner 0ac0c0d0f8 cpusets: randomize node rotor used in cpuset_mem_spread_node()
Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
Julia Lawall 84fe6c19e4 arch/x86/kernel: Add missing spin_unlock
Add a spin_unlock missing on the error path.  The locks and unlocks are
balanced in other functions, so it seems that the same should be the case
here.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* spin_lock(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* spin_unlock(E1,...);
// </smpl>

Cc: stable@kernel.org
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-05-27 12:40:11 +02:00
Xiaotian Feng 20413f2716 x86, pat: Fix memory leak in free_memtype
Reserve_memtype will allocate memory for new memtype, but
in free_memtype, after the memtype erased from rbtree, the
memory is not freed.

Changes since V1:
	make rbt_memtype_erase return erased memtype so that
	it can be freed in free_memtype.

[ hpa: not for -stable: 2.6.34 and earlier not affected ]

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1274838670-8731-1-git-send-email-dfeng@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Jack Steiner <steiner@sgi.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-26 11:26:04 -07:00
Linus Torvalds 13da9e200f Revert "endian: #define __BYTE_ORDER"
This reverts commit b3b77c8cae, which was
also totally broken (see commit 0d2daf5cc8 that reverted the crc32
version of it).  As reported by Stephen Rothwell, it causes problems on
big-endian machines:

> In file included from fs/jfs/jfs_types.h:33,
>                  from fs/jfs/jfs_incore.h:26,
>                  from fs/jfs/file.c:22:
> fs/jfs/endian24.h:36:101: warning: "__LITTLE_ENDIAN" is not defined

The kernel has never had that crazy "__BYTE_ORDER == __LITTLE_ENDIAN"
model.  It's not how we do things, and it isn't how we _should_ do
things.  So don't go there.

Requested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-26 08:30:15 -07:00
Borislav Petkov fe501f1e89 x86, k8: Fix section mismatch for powernowk8_exit()
Fix the following warning:

"WARNING: arch/x86/kernel/built-in.o(.exit.text+0x72):
Section mismatch in reference from the function powernowk8_exit() to the variable .cpuinit.data:cpb_nb

The function __exit powernowk8_exit() references a variable
__cpuinitdata cpb_nb. This is often seen when error handling in the exit
function uses functionality in the init path. The fix is often to remove
the __cpuinitdata annotation of cpb_nb so it may be used outside an init
section."

Cc: <stable@kernel.org>
Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100525152858.GA24836@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-25 15:42:21 -07:00
Kay Sievers 578454ff7e driver core: add devname module aliases to allow module on-demand auto-loading
This adds:
  alias: devname:<name>
to some common kernel modules, which will allow the on-demand loading
of the kernel module when the device node is accessed.

Ideally all these modules would be compiled-in, but distros seems too
much in love with their modularization that we need to cover the common
cases with this new facility. It will allow us to remove a bunch of pretty
useless init scripts and modprobes from init scripts.

The static device node aliases will be carried in the module itself. The
program depmod will extract this information to a file in the module directory:
  $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname
  # Device nodes to trigger on-demand module loading.
  microcode cpu/microcode c10:184
  fuse fuse c10:229
  ppp_generic ppp c108:0
  tun net/tun c10:200
  dm_mod mapper/control c10:235

Udev will pick up the depmod created file on startup and create all the
static device nodes which the kernel modules specify, so that these modules
get automatically loaded when the device node is accessed:
  $ /sbin/udevd --debug
  ...
  static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
  static_dev_create_from_modules: mknod '/dev/fuse' c10:229
  static_dev_create_from_modules: mknod '/dev/ppp' c108:0
  static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
  static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
  udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
  udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666

A few device nodes are switched to statically allocated numbers, to allow
the static nodes to work. This might also useful for systems which still run
a plain static /dev, which is completely unsafe to use with any dynamic minor
numbers.

Note:
The devname aliases must be limited to the *common* and *single*instance*
device nodes, like the misc devices, and never be used for conceptually limited
systems like the loop devices, which should rather get fixed properly and get a
control node for losetup to talk to, instead of creating a random number of
device nodes in advance, regardless if they are ever used.

This facility is to hide the mess distros are creating with too modualized
kernels, and just to hide that these modules are not compiled-in, and not to
paper-over broken concepts. Thanks! :)

Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Ian Kent <raven@themaw.net>
Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-25 15:08:26 -07:00
Carsten Emde a321cedb12 drivers/hwmon/coretemp.c: get TjMax value from MSR
The MSR IA32_TEMPERATURE_TARGET contains the TjMax value in the newer
Intel processors.

Signed-off-by: Huaxu Wan <huaxu.wan@linux.intel.com>
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Yong Wang <yong.y.wang@linux.intel.com>
Cc: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:07 -07:00
Joakim Tjernlund b3b77c8cae endian: #define __BYTE_ORDER
Linux does not define __BYTE_ORDER in its endian header files which makes
some header files bend backwards to get at the current endian.  Lets
#define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for
header files that are used in user space too.

In userspace the convention is that

  1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined,
  2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Peter Zijlstra 87f44bbc24 perf, trace: Fix !x86 build bug
Patch b7e2ecef92 (perf, trace: Optimize tracepoints by removing
IRQ-disable from perf/tracepoint interaction) made the
unfortunate mistake of assuming the world is x86 only, correct
this.

The problem was that perf_fetch_caller_regs() did
local_save_flags() into regs->flags, and I re-used that to
remove another local_save_flags(), forgetting !x86 doesn't have
regs->flags.

Do the reverse, remove the local_save_flags() from
perf_fetch_caller_regs() and let the ftrace site do the
local_save_flags() instead.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: acme@redhat.com
Cc: efault@gmx.de
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
LKML-Reference: <1274778175.5882.623.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-25 11:28:49 +02:00
Peter Zijlstra 48691ff86d x86: remove last traces of quicklist usage
We still have a stray quicklist header included even though we axed
quicklist usage quite a while back.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201005241913.o4OJDJe9010881@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-24 13:33:31 -07:00
Gabor Gombas 3d6e77a3dd x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012
The low-memory corruption checker triggers during suspend/resume, so we
need to reserve the low 64k.  Don't be fooled that the BIOS identifies
itself as "Dell Inc.", it's still Phoenix BIOS.

[ hpa: I think we blacklist almost every BIOS in existence.  We should
either change this to a whitelist or just make it unconditional. ]

Signed-off-by: Gabor Gombas <gombasg@digikabel.hu>
LKML-Reference: <201005241913.o4OJDIMM010877@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
2010-05-24 13:33:14 -07:00
Jan Beulich 5f2eb55026 x86: "nosmp" command line option should force the system into UP mode
Bits set in cpu_possible_mask prior to the execution of
prefill_possible_map() (i.e.  when parsing ACPI or MPS tables) would
prevent the SMP alternatives logic from switching to UP mode, plus
unnecessary setup of per-CPU data for CPUs that can never come online.

Additionally, without CONFIG_HOTPLUG_CPU disabled CPUs can never come
online, and hence setting cpu_possible_mask bits for them is again a
simple waste of resources.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <201005241913.o4OJDH3Z010874@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-24 13:31:52 -07:00
Julia Lawall b46fc5f235 arch/x86/pci: use kasprintf
kasprintf combines kmalloc and sprintf, and takes care of the size
calculation itself.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression a,flag;
expression list args;
statement S;
@@

  a =
-  \(kmalloc\|kzalloc\)(...,flag)
+  kasprintf(flag,args)
  <... when != a
  if (a == NULL || ...) S
  ...>
- sprintf(a,args);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
LKML-Reference: <201005241913.o4OJDG3R010871@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-24 13:31:45 -07:00
Kerstin Jonsson 8c3ba8d049 x86, apic: ack all pending irqs when crashed/on kexec
When the SMP kernel decides to crash_kexec() the local APICs may have
pending interrupts in their vector tables.

The setup routine for the local APIC has a deficient mechanism for
clearing these interrupts, it only handles interrupts that has already
been dispatched to the local core for servicing (the ISR register) safely,
it doesn't consider lower prioritized queued interrupts stored in the IRR
register.

If you have more than one pending interrupt within the same 32 bit word in
the LAPIC vector table registers you may find yourself entering the IO
APIC setup with pending interrupts left in the LAPIC.  This is a situation
for wich the IO APIC setup is not prepared.  Depending of what/which
interrupt vector/vectors are stuck in the APIC tables your system may show
various degrees of malfunctioning.  That was the reason why the
check_timer() failed in our system, the timer interrupts was blocked by
pending interrupts from the old kernel when routed trough the IO APIC.

Additional comment from Jiri Bohac:
==============
If this should go into stable release,
I'd add some kind of limit on the number of iterations, just to be safe from
hard to debug lock-ups:

+if (loops++  > MAX_LOOPS) {
+        printk("LAPIC pending clean-up")
+        break;
+}
 while (queued);

with MAX_LOOPS something like 1E9 this would leave plenty of time for the
pending IRQs to be cleared and would and still cause at most a second of delay
if the loop were to lock-up for whatever reason.

[trenn@suse.de:

V2: Use tsc if avail to bail out after 1 sec due to possible virtual
    apic_read calls which may take rather long (suggested by: Avi Kivity
    <avi@redhat.com>) If no tsc is available bail out quickly after
    cpu_khz, if we broke out too early and still have irqs pending (which
    should never happen?) we still get a WARN_ON...

V3: - Fixed indentation -> checkpatch clean
    - max_loops must be signed

V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call

V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case]

Cc: <jbohac@novell.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
LKML-Reference: <201005241913.o4OJDGWM010865@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-24 13:31:35 -07:00
Akinobu Mita 841bca1393 x86/mmiotrace: Remove redundant instruction prefix checks
Get rid of the duplicated entries in prefix_codes[]
to eliminate redundant checks by skip_prefix().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Pekka Paalanen <pq@iki.fi>
LKML-Reference: <1274140110-5841-1-git-send-email-akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-23 11:02:43 +02:00
Linus Torvalds 6109e2ce26 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (36 commits)
  PCI: hotplug: pciehp: Removed check for hotplug of display devices
  PCI: read memory ranges out of Broadcom CNB20LE host bridge
  PCI: Allow manual resource allocation for PCI hotplug bridges
  x86/PCI: make ACPI MCFG reserved error messages ACPI specific
  PCI hotplug: Use kmemdup
  PM/PCI: Update PCI power management documentation
  PCI: output FW warning in pci_read/write_vpd
  PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments
  PCI quirks: disable msi on AMD rs4xx internal gfx bridges
  PCI: Disable MSI for MCP55 on P5N32-E SLI
  x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs
  PCI: aerdrv: trivial cleanup for aerdrv_core.c
  PCI: aerdrv: trivial cleanup for aerdrv.c
  PCI: aerdrv: introduce default_downstream_reset_link
  PCI: aerdrv: rework find_aer_service
  PCI: aerdrv: remove is_downstream
  PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS
  PCI: aerdrv: redefine PCI_ERR_ROOT_*_SRC
  PCI: aerdrv: rework do_recovery
  PCI: aerdrv: rework get_e_source()
  ...
2010-05-21 18:58:52 -07:00
Linus Torvalds 98edb6ca41 Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
  KVM: x86: Add missing locking to arch specific vcpu ioctls
  KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
  KVM: MMU: Segregate shadow pages with different cr0.wp
  KVM: x86: Check LMA bit before set_efer
  KVM: Don't allow lmsw to clear cr0.pe
  KVM: Add cpuid.txt file
  KVM: x86: Tell the guest we'll warn it about tsc stability
  x86, paravirt: don't compute pvclock adjustments if we trust the tsc
  x86: KVM guest: Try using new kvm clock msrs
  KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
  KVM: x86: add new KVMCLOCK cpuid feature
  KVM: x86: change msr numbers for kvmclock
  x86, paravirt: Add a global synchronization point for pvclock
  x86, paravirt: Enable pvclock flags in vcpu_time_info structure
  KVM: x86: Inject #GP with the right rip on efer writes
  KVM: SVM: Don't allow nested guest to VMMCALL into host
  KVM: x86: Fix exception reinjection forced to true
  KVM: Fix wallclock version writing race
  KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
  KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
  ...
2010-05-21 17:16:21 -07:00
Linus Torvalds 27a3353a45 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (32 commits)
  Move N014, N051 and CR620 dmi information to load scm dmi table
  drivers/platform/x86/eeepc-wmi.c: fix build warning
  X86 platfrom wmi: Add debug facility to dump WMI data in a readable way
  X86 platform wmi: Also log GUID string when an event happens and debug is set
  X86 platform wmi: Introduce debug param to log all WMI events
  Clean up all objects used by scm model when driver initial fail or exit
  msi-laptop: fix up some coding style issues found by checkpatch
  msi-laptop: Add i8042 filter to sync sw state with BIOS when function key pressed
  msi-laptop: Set rfkill init state when msi-laptop intiial
  msi-laptop: Add MSI CR620 notebook dmi information to scm models table
  msi-laptop: Add N014 N051 dmi information to scm models table
  drivers/platform/x86: Use kmemdup
  drivers/platform/x86: Use kzalloc
  drivers/platform/x86: Clarify the MRST IPC driver description slightly
  eeepc-wmi: depends on BACKLIGHT_CLASS_DEVICE
  IPC driver for Intel Mobile Internet Device (MID) platforms
  classmate-laptop: Add RFKILL support.
  thinkpad-acpi: document backlight level writeback at driver init
  thinkpad-acpi: clean up ACPI handles handling
  thinkpad-acpi: don't depend on led_path for led firmware type (v2)
  ...
2010-05-21 17:13:24 -07:00
Linus Torvalds 2a8ba8f032 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
  random: simplify fips mode
  crypto: authenc - Fix cryptlen calculation
  crypto: talitos - add support for sha224
  crypto: talitos - add hash algorithms
  crypto: talitos - second prepare step for adding ahash algorithms
  crypto: talitos - prepare for adding ahash algorithms
  crypto: n2 - Add Niagara2 crypto driver
  crypto: skcipher - Add ablkcipher_walk interfaces
  crypto: testmgr - Add testing for async hashing and update/final
  crypto: tcrypt - Add speed tests for async hashing
  crypto: scatterwalk - Fix scatterwalk_done() test
  crypto: hifn_795x - Rename ablkcipher_walk to hifn_cipher_walk
  padata: Use get_online_cpus/put_online_cpus in padata_free
  padata: Add some code comments
  padata: Flush the padata queues actively
  padata: Use a timer to handle remaining objects in the reorder queues
  crypto: shash - Remove usage of CRYPTO_MINALIGN
  crypto: mv_cesa - Use resource_size
  crypto: omap - OMAP macros corrected
  padata: Use get_online_cpus/put_online_cpus
  ...

Fix up conflicts in arch/arm/mach-omap2/devices.c
2010-05-21 14:46:51 -07:00
Ira W. Snyder 3f6ea84a30 PCI: read memory ranges out of Broadcom CNB20LE host bridge
Read the memory ranges behind the Broadcom CNB20LE host bridge out of the
hardware. This allows PCI hotplugging to work, since we know which memory
range to allocate PCI BAR's from.

The x86 PCI code automatically prefers the ACPI _CRS information when it is
available. In that case, this information is not used.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-21 14:43:46 -07:00
Linus Torvalds 59534f7298 Merge branch 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (207 commits)
  drm/radeon/kms/pm/r600: select the mid clock mode for single head low profile
  drm/radeon: fix power supply kconfig interaction.
  drm/radeon/kms: record object that have been list reserved
  drm/radeon: AGP memory is only I/O if the aperture can be mapped by the CPU.
  drm/radeon/kms: don't default display priority to high on rs4xx
  drm/edid: fix typo in 1600x1200@75 mode
  drm/nouveau: fix i2c-related init table handlers
  drm/nouveau: support init table i2c device identifier 0x81
  drm/nouveau: ensure we've parsed i2c table entry for INIT_*I2C* handlers
  drm/nouveau: display error message for any failed init table opcode
  drm/nouveau: fix init table handlers to return proper error codes
  drm/nv50: support fractional feedback divider on newer chips
  drm/nv50: fix monitor detection on certain chipsets
  drm/nv50: store full dcb i2c entry from vbios
  drm/nv50: fix suspend/resume with DP outputs
  drm/nv50: output calculated crtc pll when debugging on
  drm/nouveau: dump pll limits entries when debugging is on
  drm/nouveau: bios parser fixes for eDP boards
  drm/nouveau: fix a nouveau_bo dereference after it's been destroyed
  drm/nv40: remove some completed ctxprog TODOs
  ...
2010-05-21 11:14:52 -07:00
Jason Wessel 61eaf539b9 earlyprintk,vga,kdb: Fix \b and \r for earlyprintk=vga with kdb
Allow kdb to work properly with with earlyprintk=vga by interpreting
the backspace and carriage return output characters.  These
interpretation of these characters is used for simple line editing
provided in the kdb shell.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: x86@kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:31 -05:00
Jason Wessel 0bb9fef913 x86,early dr regs,kgdb: Allow kernel debugger early dr register access
If the kernel debugger was configured, attached and started with
kgdbwait, the hardware breakpoint registers should get restored by the
kgdb code which is managing the dr registers.

CC: x86@kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:30 -05:00
Jason Wessel 031acd8c42 x86,kgdb: Implement early hardware breakpoint debugging
It is not possible to use the hw_breakpoint.c API prior to mm_init(),
but it is possible to use hardware breakpoints with the kernel
debugger.

Prior to smp_init() it is possible to simply write to the dr registers
of the boot cpu directly.  This can be used up until the
kgdb_arch_late() is invoked, at which point the standard hw_breakpoint.c
API will get used.

CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:30 -05:00
Jason Wessel 0b4b3827db x86, kgdb, init: Add early and late debug states
The kernel debugger can operate well before mm_init(), but the x86
hardware breakpoint code which uses the perf api requires that the
kernel allocators are initialized.

This means the kernel debug core needs to provide an optional arch
specific call back to allow the initialization functions to run after
the kernel has been further initialized.

The kdb shell already had a similar restriction with an early
initialization and late initialization.  The kdb_init() was moved into
the debug core's version of the late init which is called
dbg_late_init();

CC: kgdb-bugreport@lists.sourceforge.net
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:29 -05:00
Jan Kiszka 29c843912a x86, kgdb: early trap init for early debug
Allow the x86 arch to have early exception processing for the purpose
of debugging via the kgdb.

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:29 -05:00
Jason Wessel f503b5ae53 x86,kgdb: Add low level debug hook
The only way the debugger can handle a trap in inside rcu_lock,
notify_die, or atomic_notifier_call_chain without a triple fault is
to have a low level "first opportunity handler" in the int3 exception
handler.

Generally this will be something the vast majority of folks will not
need, but for those who need it, it is added as a kernel .config
option called KGDB_LOW_LEVEL_TRAP.

CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: H. Peter Anvin <hpa@zytor.com>
CC: x86@kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:25 -05:00
Jason Wessel 98ec1878ca kgdb: remove post_primary_code references
Remove all the references to the kgdb_post_primary_code.  This
function serves no useful purpose because you can obtain the same
information from the "struct kgdb_state *ks" from with in the
debugger, if for some reason you want the data.

Also remove the unintentional duplicate assignment for ks->ex_vector.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:25 -05:00
Jason Wessel dcc7871128 kgdb: core changes to support kdb
These are the minimum changes to the kgdb core in order to enable an
API to connect a new front end (kdb) to the debug core.

This patch introduces the dbg_kdb_mode variable controls where the
user level I/O is routed.  It will be routed to the gdbstub (kgdb) or
to the kdb front end which is a simple shell available over the kgdboc
connection.

You can switch back and forth between kdb or the gdb stub mode of
operation dynamically.  From gdb stub mode you can blindly type
"$3#33", or from the kdb mode you can enter "kgdb" to switch to the
gdb stub.

The logic in the debug core depends on kdb to look for the typical gdb
connection sequences and return immediately with KGDB_PASS_EVENT if a
gdb serial command sequence is detected.  That should allow a
reasonably seamless transition between kdb -> gdb without leaving the
kernel exception state.  The two gdb serial queries that kdb is
responsible for detecting are the "?" and "qSupported" packets.

CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Martin Hicks <mort@sgi.com>
2010-05-20 21:04:21 -05:00
Linus Torvalds 04afb40593 Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (22 commits)
  ACPI: fix early DSDT dmi check warnings on ia64
  ACPICA: Update version to 20100428.
  ACPICA: Update/clarify some parameter names associated with acpi_handle
  ACPICA: Rename acpi_ex_system_do_suspend->acpi_ex_system_do_sleep
  ACPICA: Prevent possible allocation overrun during object copy
  ACPICA: Split large file, evgpeblk
  ACPICA: Add GPE support for dynamically loaded ACPI tables
  ACPICA: Clarify/rename some root table descriptor fields
  ACPICA: Update version to 20100331.
  ACPICA: Minimize the differences between linux GPE code and ACPICA code base
  ACPI: add boot option acpi=copy_dsdt to fix corrupt DSDT
  ACPICA: Update DSDT copy/detection.
  ACPICA: Add subsystem option to force copy of DSDT to local memory
  ACPICA: Add detection of corrupted/replaced DSDT
  ACPICA: Add write support for DataTable operation regions
  ACPICA: Fix for acpi_reallocate_root_table for incorrect root table copy
  ACPICA: Update comments/headers, no functional change
  ACPICA: Update version to 20100304
  ACPICA: Fix for possible fault in acpi_ex_release_mutex
  ACPICA: Standardize integer output for ACPICA warnings/errors
  ...
2010-05-20 09:45:38 -07:00
Linus Torvalds f39d01be4c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
  vlynq: make whole Kconfig-menu dependant on architecture
  add descriptive comment for TIF_MEMDIE task flag declaration.
  EEPROM: max6875: Header file cleanup
  EEPROM: 93cx6: Header file cleanup
  EEPROM: Header file cleanup
  agp: use NULL instead of 0 when pointer is needed
  rtc-v3020: make bitfield unsigned
  PCI: make bitfield unsigned
  jbd2: use NULL instead of 0 when pointer is needed
  cciss: fix shadows sparse warning
  doc: inode uses a mutex instead of a semaphore.
  uml: i386: Avoid redefinition of NR_syscalls
  fix "seperate" typos in comments
  cocbalt_lcdfb: correct sections
  doc: Change urls for sparse
  Powerpc: wii: Fix typo in comment
  i2o: cleanup some exit paths
  Documentation/: it's -> its where appropriate
  UML: Fix compiler warning due to missing task_struct declaration
  UML: add kernel.h include to signal.c
  ...
2010-05-20 09:20:59 -07:00
Ingo Molnar dfacc4d6c9 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-05-20 14:38:55 +02:00
Huang Ying 482908b49e ACPI, APEI, Use ERST for persistent storage of MCE
Traditionally, fatal MCE will cause Linux print error log to console
then reboot. Because MCE registers will preserve their content after
warm reboot, the hardware error can be logged to disk or network after
reboot. But system may fail to warm reboot, then you may lose the
hardware error log. ERST can help here. Through saving the hardware
error log into flash via ERST before go panic, the hardware error log
can be gotten from the flash after system boot successful again.

The fatal MCE processing procedure with ERST involved is as follow:

- Hardware detect error, MCE raised
- MCE read MCE registers, check error severity (fatal), prepare error record
- Write MCE error record into flash via ERST
- Go panic, then trigger system reboot
- System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
  for error record of previous boot via ERST, and output and clear
  them if available
- /sbin/mcelog logs error records into disk or network

ERST only accepts CPER record format, but there is no pre-defined CPER
section can accommodate all information in struct mce, so a customized
section type is defined to hold struct mce inside a CPER record as an
error section.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-19 22:41:40 -04:00
Huang Ying d334a49113 ACPI, APEI, Generic Hardware Error Source memory error support
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.

On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-19 22:41:16 -04:00
Linus Torvalds 6fa0fddd5f Merge branch 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hpet: Add reference to chipset erratum documentation for disable-hpet-msi-quirk
  x86, hpet: Restrict read back to affected ATI chipsets
2010-05-19 17:10:24 -07:00
Linus Torvalds 7d02093e29 Merge branch 'timers-for-linus-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  avr32: Fix typo in read_persistent_clock()
  sparc: Convert sparc to use read/update_persistent_clock
  cris: Convert cris to use read/update_persistent_clock
  m68k: Convert m68k to use read/update_persistent_clock
  m32r: Convert m32r to use read/update_peristent_clock
  blackfin: Convert blackfin to use read/update_persistent_clock
  ia64: Convert ia64 to use read/update_persistent_clock
  avr32: Convert avr32 to use read/update_persistent_clock
  h8300: Convert h8300 to use read/update_persistent_clock
  frv: Convert frv to use read/update_persistent_clock
  mn10300: Convert mn10300 to use read/update_persistent_clock
  alpha: Convert alpha to use read/update_persistent_clock
  xtensa: Fix unnecessary setting of xtime
  time: Clean up direct xtime usage in xen
2010-05-19 17:10:06 -07:00
H. Peter Anvin 14671386dc x86, mrst: make mrst_timer_options an enum
We have an enum mrst_timer_options, use it so that the kernel knows if
we're missing something from a switch statement or equivalent.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <1274295685-6774-4-git-send-email-jacob.jun.pan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
2010-05-19 14:37:40 -07:00
H. Peter Anvin a75af580bb x86, mrst: make mrst_identify_cpu() an inline returning enum
We have an enum, might as well use it.  While we're at it, make it an
inline... there is really no point in calling a function for this
stuff.

LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
2010-05-19 13:47:11 -07:00
Jacob Pan a875c01944 x86, mrst: add more timer config options
Always-on local APIC timer (ARAT) has been introduced to Medfield, along
with the platform APB timers we have more timer configuration options
between Moorestown and Medfield.

This patch adds run-time detection of avaiable timer features so that
we can treat Medfield as a variant of Moorestown and set up the optimal
timer options for each platform. i.e.

Medfield: per cpu always-on local APIC timer
Moorestown: per cpu APB timer

Manual override is possible via cmdline option x86_mrst_timer.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-4-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-19 13:45:39 -07:00
Jacob Pan a0c173bd8a x86, mrst: add cpu type detection
Medfield is the follow-up of Moorestown, it is treated under the same
HW sub-architecture. However, we do need to know the CPU type in order
for some of the driver to act accordingly.
We also have different optimal clock configuration for each CPU type.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-19 13:32:29 -07:00
Jacob Pan 1dedefd1a0 x86: detect scattered cpuid features earlier
Some extra CPU features such as ARAT is needed in early boot so
that x86_init function pointers can be set up properly.
http://lkml.org/lkml/2010/5/18/519
At start_kernel() level, this patch moves init_scattered_cpuid_features()
from check_bugs() to setup_arch() -> early_cpu_init() which is earlier than
platform specific x86_init layer setup. Suggested by HPA.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-2-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-19 13:32:12 -07:00
Avi Kivity 8fbf065d62 KVM: x86: Add missing locking to arch specific vcpu ioctls
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:41:11 +03:00
Avi Kivity 3dbe141595 KVM: MMU: Segregate shadow pages with different cr0.wp
When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte
having u/s=0 and r/w=1.  This allows excessive access if the guest sets
cr0.wp=1 and accesses through this spte.

Fix by making cr0.wp part of the base role; we'll have different sptes for
the two cases and the problem disappears.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:09 +03:00
Sheng Yang a3d204e285 KVM: x86: Check LMA bit before set_efer
kvm_x86_ops->set_efer() would execute vcpu->arch.efer = efer, so the
checking of LMA bit didn't work.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:09 +03:00
Avi Kivity f78e917688 KVM: Don't allow lmsw to clear cr0.pe
The current lmsw implementation allows the guest to clear cr0.pe, contrary
to the manual, which breaks EMM386.EXE.

Fix by ORing the old cr0.pe with lmsw's operand.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:08 +03:00
Glauber Costa 371bcf646d KVM: x86: Tell the guest we'll warn it about tsc stability
This patch puts up the flag that tells the guest that we'll warn it
about the tsc being trustworthy or not. By now, we also say
it is not.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:06 +03:00
Glauber Costa 3a0d7256a6 x86, paravirt: don't compute pvclock adjustments if we trust the tsc
If the HV told us we can fully trust the TSC, skip any
correction

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:05 +03:00
Glauber Costa 838815a787 x86: KVM guest: Try using new kvm clock msrs
We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:04 +03:00
Glauber Costa 84478c829d KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:03 +03:00
Glauber Costa 0e6ac58acb KVM: x86: add new KVMCLOCK cpuid feature
This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest
that kvmclock is available through a new set of MSRs. The old ones
are deprecated.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:02 +03:00
Glauber Costa 11c6bffa42 KVM: x86: change msr numbers for kvmclock
Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:01 +03:00
Glauber Costa 489fb490db x86, paravirt: Add a global synchronization point for pvclock
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:00 +03:00
Glauber Costa 424c32f1aa x86, paravirt: Enable pvclock flags in vcpu_time_info structure
This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:40:59 +03:00
Roedel, Joerg b69e8caef5 KVM: x86: Inject #GP with the right rip on efer writes
This patch fixes a bug in the KVM efer-msr write path. If a
guest writes to a reserved efer bit the set_efer function
injects the #GP directly. The architecture dependent wrmsr
function does not see this, assumes success and advances the
rip. This results in a #GP in the guest with the wrong rip.
This patch fixes this by reporting efer write errors back to
the architectural wrmsr function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:39 +03:00
Joerg Roedel 0d945bd935 KVM: SVM: Don't allow nested guest to VMMCALL into host
This patch disables the possibility for a l2-guest to do a
VMMCALL directly into the host. This would happen if the
l1-hypervisor doesn't intercept VMMCALL and the l2-guest
executes this instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:38 +03:00
Joerg Roedel 3f0fd2927b KVM: x86: Fix exception reinjection forced to true
The patch merged recently which allowed to mark an exception
as reinjected has a bug as it always marks the exception as
reinjected. This breaks nested-svm shadow-on-shadow
implementation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:37 +03:00
Avi Kivity 9ed3c444ab KVM: Fix wallclock version writing race
Wallclock writing uses an unprotected global variable to hold the version;
this can cause one guest to interfere with another if both write their
wallclock at the same time.

Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:36 +03:00
Avi Kivity 8facbbff07 KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.

Push the spinlock into mmu_alloc_roots(), and only take it after we've read
the pdptr.

Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:35 +03:00
Shane Wang cafd66595d KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
Per document, for feature control MSR:

  Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
        of VMXON in SMX operation causes a general-protection exception.
  Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
        of VMXON outside SMX operation causes a general-protection exception.

This patch is to enable this kind of check with SMX for VMXON in KVM.

Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:34 +03:00
Marcelo Tosatti f1d86e469b KVM: x86: properly update ready_for_interrupt_injection
The recent changes to emulate string instructions without entering guest
mode exposed a bug where pending interrupts are not properly reflected
in ready_for_interrupt_injection.

The result is that userspace overwrites a previously queued interrupt,
when irqchip's are emulated in userspace.

Fix by always updating state before returning to userspace.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:33 +03:00
Avi Kivity 84ad33ef5d KVM: VMX: Atomically switch efer if EPT && !EFER.NX
When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page
tables.  This causes accesses through ptes with bit 63 set to succeed instead
of failing a reserved bit check.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:32 +03:00
Avi Kivity 61d2ef2ce3 KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit
Some guest msr values cannot be used on the host (for example. EFER.NX=0),
so we need to switch them atomically during guest entry or exit.

Add a facility to program the vmx msr autoload registers accordingly.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:31 +03:00
Avi Kivity 5dfa3d170e KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:31 +03:00
Avi Kivity 19b95dba03 KVM: VMX: Add definition for msr autoload entry
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:30 +03:00
Avi Kivity 0ee75bead8 KVM: Let vcpu structure alignment be determined at runtime
vmx and svm vcpus have different contents and therefore may have different
alignmment requirements.  Let each specify its required alignment.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19 11:36:29 +03:00
Xiao Guangrong 884a0ff0b6 KVM: MMU: cleanup invlpg code
Using is_last_spte() to cleanup invlpg code

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:36:28 +03:00
Xiao Guangrong 5e1b3ddbf2 KVM: MMU: move unsync/sync tracpoints to proper place
Move unsync/sync tracepoints to the proper place, it's good
for us to obtain unsync page live time

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:36:27 +03:00
Xiao Guangrong 85f2067c31 KVM: MMU: convert mmu tracepoints
Convert mmu tracepoints by using DECLARE_EVENT_CLASS

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:36:26 +03:00
Xiao Guangrong 22c9b2d166 KVM: MMU: fix for calculating gpa in invlpg code
If the guest is 32-bit, we should use 'quadrant' to adjust gpa
offset

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:36:25 +03:00
Gui Jianfeng d35b8dd935 KVM: Fix mmu shrinker error
kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims
only one sp, but that's not the case. This will cause mmu shrinker returns
a wrong number. This patch fix the counting error.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:36:23 +03:00
Eric Northup 5a7388c2d2 KVM: MMU: fix hashing for TDP and non-paging modes
For TDP mode, avoid creating multiple page table roots for the single
guest-to-host physical address map by fixing the inputs used for the
shadow page table hash in mmu_alloc_roots().

Signed-off-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:36:22 +03:00
Cyrill Gorcunov ce7f15452c perf, x86: P4 PMU -- add missing bit in CCCR mask
Should be there for the sake of RAW events.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100518212439.354345151@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-19 09:41:06 +02:00
Cyrill Gorcunov 9d36dfcf21 perf, x86: P4_pmu_schedule_events -- use smp_processor_id instead of raw_
This snippet somehow escaped the commit:

 | commit 137351e0fe
 | Author: Cyrill Gorcunov <gorcunov@openvz.org>
 | Date:   Sat May 8 15:25:52 2010 +0400
 |
 |    x86, perf: P4 PMU -- protect sensible procedures from preemption

so bring it eventually back. It helps to catch
preemption issue (if there will be, rule of thumb --
don't use raw_ if you can).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100518212439.167259349@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-19 09:41:05 +02:00
Cyrill Gorcunov 623aab896e perf, x86: P4 PMU -- do a real check for ESCR address being in hash
To prevent from clashes in future code modifications
do a real check for ESCR address being in hash. At
moment the callers are known to pass sane values but
better to be on a safe side.

And comment fix.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100518212439.004503600@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-19 09:41:05 +02:00
Feng Tang a02ce953a1 x86/PCI: make ACPI MCFG reserved error messages ACPI specific
Both ACPI and SFI firmwares will have MCFG space, but the error message
isn't valid on SFI, so don't print the message in that case.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-18 15:03:27 -07:00
Linus Torvalds 537b60d178 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: uv_irq.c: Fix all sparse warnings
  x86, UV: Improve BAU performance and error recovery
  x86, UV: Delete unneeded boot messages
  x86, UV: Clean up UV headers for MMR definitions
2010-05-18 09:46:35 -07:00
Linus Torvalds 3ae684e1c4 Merge branch 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, tboot: Add support for S3 memory integrity protection
2010-05-18 09:28:24 -07:00
Linus Torvalds c4fd308ed6 Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pat: Update the page flags for memtype atomically instead of using memtype_lock
  x86, pat: In rbt_memtype_check_insert(), update new->type only if valid
  x86, pat: Migrate to rbtree only backend for pat memtype management
  x86, pat: Preparatory changes in pat.c for bigger rbtree change
  rbtree: Add support for augmented rbtrees
2010-05-18 09:28:04 -07:00
Linus Torvalds 96fbeb973a Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mrst: add nop functions to x86_init mpparse functions
  x86, mrst, pci: return 0 for non-present pci bars
  x86: Avoid check hlt for newer cpus
2010-05-18 09:27:49 -07:00
Linus Torvalds 1f8caa986a Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64: Combine SRAT regions when possible
2010-05-18 09:17:50 -07:00
Linus Torvalds d6f3875252 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/microcode: Use nonseekable_open()
  x86: Improve Intel microcode loader performance
2010-05-18 09:17:17 -07:00
Linus Torvalds cb41838bbc Merge branch 'core-hweight-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-hweight-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hweight: Use a 32-bit popcnt for __arch_hweight32()
  arch, hweight: Fix compilation errors
  x86: Add optimized popcnt variants
  bitops: Optimize hweight() by making use of compile-time evaluation
2010-05-18 09:17:01 -07:00
Linus Torvalds 98f01720cb Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, acpi/irq: Define gsi_end when X86_IO_APIC is undefined
  x86, irq: Kill io_apic_renumber_irq
  x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's.
  x86, ioapic: Simplify probe_nr_irqs_gsi.
  x86, ioapic: Optimize pin_2_irq
  x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.
  x86, ioapic: In mpparse use mp_register_ioapic
  x86, ioapic: Teach mp_register_ioapic to compute a global gsi_end
  x86, ioapic: Fix the types of gsi values
  x86, ioapic: Fix io_apic_redir_entries to return the number of entries.
  x86, ioapic: Only export mp_find_ioapic and mp_find_ioapic_pin in io_apic.h
  x86, acpi/irq: Generalize mp_config_acpi_legacy_irqs
  x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsi
  x86, acpi/irq: pci device dev->irq is an isa irq not a gsi
  x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq
  x86, acpi/irq: Introduce apci_isa_irq_to_gsi
2010-05-18 09:15:57 -07:00
Linus Torvalds 41d59102e1 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, fpu: Use static_cpu_has() to implement use_xsave()
  x86: Add new static_cpu_has() function using alternatives
  x86, fpu: Use the proper asm constraint in use_xsave()
  x86, fpu: Unbreak FPU emulation
  x86: Introduce 'struct fpu' and related API
  x86: Eliminate TS_XSAVE
  x86-32: Don't set ignore_fpu_irq in simd exception
  x86: Merge kernel_math_error() into math_error()
  x86: Merge simd_math_error() into math_error()
  x86-32: Rework cache flush denied handler

Fix trivial conflict in arch/x86/kernel/process.c
2010-05-18 08:58:16 -07:00
Linus Torvalds 07d77759c9 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hypervisor: add missing <linux/module.h>
  Modify the VMware balloon driver for the new x86_hyper API
  x86, hypervisor: Export the x86_hyper* symbols
  x86: Clean up the hypervisor layer
  x86, HyperV: fix up the license to mshyperv.c
  x86: Detect running on a Microsoft HyperV system
  x86, cpu: Make APERF/MPERF a normal table-driven flag
  x86, k8: Fix build error when K8_NB is disabled
  x86, cacheinfo: Disable index in all four subcaches
  x86, cacheinfo: Make L3 cache info per node
  x86, cacheinfo: Reorganize AMD L3 cache structure
  x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
  x86, cacheinfo: Unify AMD L3 cache index disable checking
  cpufreq: Unify sysfs attribute definition macros
  powernow-k8: Fix frequency reporting
  x86, cpufreq: Add APERF/MPERF support for AMD processors
  x86: Unify APERF/MPERF support
  powernow-k8: Add core performance boost support
  x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo

Fix up trivial conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c and
drivers/cpufreq/cpufreq_ondemand.c
2010-05-18 08:49:13 -07:00
Linus Torvalds b7723f9d21 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Clean up arch/x86/Kconfig*
  x86-64: Don't export init_level4_pgt
2010-05-18 08:40:21 -07:00
Linus Torvalds 93c9d7f60c Merge branch 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix LOCK_PREFIX_HERE for uniprocessor build
  x86, atomic64: In selftest, distinguish x86-64 from 586+
  x86-32: Fix atomic64_inc_not_zero return value convention
  lib: Fix atomic64_inc_not_zero test
  lib: Fix atomic64_add_unless return value convention
  x86-32: Fix atomic64_add_unless return value convention
  lib: Fix atomic64_add_unless test
  x86: Implement atomic[64]_dec_if_positive()
  lib: Only test atomic64_dec_if_positive on archs having it
  x86-32: Rewrite 32-bit atomic64 functions in assembly
  lib: Add self-test for atomic64_t
  x86-32: Allow UP/SMP lock replacement in cmpxchg64
  x86: Add support for lock prefix in alternatives
2010-05-18 08:40:05 -07:00
Linus Torvalds 7421a10de7 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Use .cfi_sections for assembly code
  x86-64: Reduce SMP locks table size
  x86, asm: Introduce and use percpu_inc()
2010-05-18 08:35:37 -07:00
Linus Torvalds 4d7b4ac22f Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits)
  perf tools: Add mode to build without newt support
  perf symbols: symbol inconsistency message should be done only at verbose=1
  perf tui: Add explicit -lslang option
  perf options: Type check all the remaining OPT_ variants
  perf options: Type check OPT_BOOLEAN and fix the offenders
  perf options: Check v type in OPT_U?INTEGER
  perf options: Introduce OPT_UINTEGER
  perf tui: Add workaround for slang < 2.1.4
  perf record: Fix bug mismatch with -c option definition
  perf options: Introduce OPT_U64
  perf tui: Add help window to show key associations
  perf tui: Make <- exit menus too
  perf newt: Add single key shortcuts for zoom into DSO and threads
  perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed
  perf newt: Fix the 'A'/'a' shortcut for annotate
  perf newt: Make <- exit the ui_browser
  x86, perf: P4 PMU - fix counters management logic
  perf newt: Make <- zoom out filters
  perf report: Report number of events, not samples
  perf hist: Clarify events_stats fields usage
  ...

Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c
2010-05-18 08:19:03 -07:00
Linus Torvalds 3aaf51ace5 Merge branch 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
  oprofile/x86: make AMD IBS hotplug capable
  oprofile/x86: notify cpus only when daemon is running
  oprofile/x86: reordering some functions
  oprofile/x86: stop disabled counters in nmi handler
  oprofile/x86: protect cpu hotplug sections
  oprofile/x86: remove CONFIG_SMP macros
  oprofile/x86: fix uninitialized counter usage during cpu hotplug
  oprofile/x86: remove duplicate IBS capability check
  oprofile/x86: move IBS code
  oprofile/x86: return -EBUSY if counters are already reserved
  oprofile/x86: moving shutdown functions
  oprofile/x86: reserve counter msrs pairwise
  oprofile/x86: rework error handler in nmi_setup()
  oprofile: update file list in MAINTAINERS file
  oprofile: protect from not being in an IRQ context
  oprofile: remove double ring buffering
  ring-buffer: Add lost event count to end of sub buffer
  tracing: Show the lost events in the trace_pipe output
  ring-buffer: Add place holder recording of dropped events
  tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set
  ...
2010-05-18 08:18:07 -07:00
Linus Torvalds 1014cfe2fb Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: Reduce stack_trace usage
  lockdep: No need to disable preemption in debug atomic ops
  lockdep: Actually _dec_ in debug_atomic_dec
  lockdep: Provide off case for redundant_hardirqs_on increment
  lockdep: Simplify debug atomic ops
  lockdep: Fix redundant_hardirqs_on incremented with irqs enabled
  lockstat: Make lockstat counting per cpu
  i8253: Convert i8253_lock to raw_spinlock
2010-05-18 08:17:35 -07:00
Linus Torvalds 8123d8f17d Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Add amd_iommu=off command line option
  iommu-api: Remove iommu_{un}map_range functions
  x86/amd-iommu: Implement ->{un}map callbacks for iommu-api
  x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes
  x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes
  x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
  kvm: Change kvm_iommu_map_pages to map large pages
  VT-d: Change {un}map_range functions to implement {un}map interface
  iommu-api: Add ->{un}map callbacks to iommu_ops
  iommu-api: Add iommu_map and iommu_unmap functions
  iommu-api: Rename ->{un}map function pointers to ->{un}map_range
2010-05-18 07:22:37 -07:00
Cyrill Gorcunov ef4f30f54e perf, x86: P4 PMU -- fix typo in unflagged NMI handling
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <1274174954.22793.17.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-18 12:05:20 +02:00
Cyrill Gorcunov 0db1a7bc00 perf, x86: P4 PMU -- handle unflagged events
It might happen that an event can overflow without
the proper overflow flag set. Check the sign bit in
the raw counter value to solve this problem.

Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: fweisbec@gmail.com
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1274083984.6540.15.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-18 08:25:34 +02:00
H. Peter Anvin c59bd56882 x86, hweight: Use a 32-bit popcnt for __arch_hweight32()
Use a 32-bit popcnt instruction for __arch_hweight32(), even on
x86-64.  Even though the input register will *usually* be
zero-extended due to the standard operation of the hardware, it isn't
necessarily so if the input value was the result of truncating a
64-bit operation.

Note: the POPCNT32 variant used on x86-64 has a technically
unnecessary REX prefix to make it five bytes long, the same as a CALL
instruction, therefore avoiding an unnecessary NOP.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <alpine.LFD.2.00.1005171443060.4195@i5.linux-foundation.org>
2010-05-17 15:17:16 -07:00
Andreas Herrmann fec84e3307 x86, hpet: Add reference to chipset erratum documentation for disable-hpet-msi-quirk
(At the moment the "SB700 Family Product Errata" document is available
at http://support.amd.com/us/Embedded_TechDocs/46837.pdf)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100517164324.GB10254@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-17 10:04:43 -07:00
Sreedhara DS 9a58a33339 IPC driver for Intel Mobile Internet Device (MID) platforms
The IPC (inter processor communications) is used to provide the
communications between kernel and system control units on some embedded
Intel x86 platforms.

(Various bits of clean up and restructuring by Alan Cox)

Signed-off-by: Sreedhara DS <sreedhara.ds@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
2010-05-17 12:06:07 -04:00
Anton Blanchard f3d46f9d31 atomic_t: Cast to volatile when accessing atomic variables
In preparation for removing volatile from the atomic_t definition, this
patch adds a volatile cast to all the atomic read functions.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-17 07:57:27 -07:00
Gui Jianfeng df2fb6e710 KVM: MMU: fix sp->unsync type error in trace event definition
sp->unsync is bool now, so update trace event declaration.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:19:29 +03:00
Joerg Roedel ff47a49b23 KVM: SVM: Handle MCE intercepts always on host level
This patch prevents MCE intercepts from being propagated
into the L1 guest if they happened in an L2 guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:27 +03:00
Joerg Roedel ce7ddec4bb KVM: x86: Allow marking an exception as reinjected
This patch adds logic to kvm/x86 which allows to mark an
injected exception as reinjected. This allows to remove an
ugly hack from svm_complete_interrupts that prevented
exceptions from being reinjected at all in the nested case.
The hack was necessary because an reinjected exception into
the nested guest could cause a nested vmexit emulation. But
reinjected exceptions must not intercept. The downside of
the hack is that a exception that in injected could get
lost.
This patch fixes the problem and puts the code for it into
generic x86 files because. Nested-VMX will likely have the
same problem and could reuse the code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:26 +03:00
Joerg Roedel c2c63a4939 KVM: SVM: Report emulated SVM features to userspace
This patch implements the reporting of the emulated SVM
features to userspace instead of the real hardware
capabilities. Every real hardware capability needs emulation
in nested svm so the old behavior was broken.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:24 +03:00
Joerg Roedel d4330ef2fb KVM: x86: Add callback to let modules decide over some supported cpuid bits
This patch adds the get_supported_cpuid callback to
kvm_x86_ops. It will be used in do_cpuid_ent to delegate the
decission about some supported cpuid bits to the
architecture modules.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:23 +03:00
Joerg Roedel 228070b1b3 KVM: SVM: Propagate nested entry failure into guest hypervisor
This patch implements propagation of a failes guest vmrun
back into the guest instead of killing the whole guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:21 +03:00
Joerg Roedel 2be4fc7a02 KVM: SVM: Sync cr0 and cr3 to kvm state before nested handling
This patch syncs cr0 and cr3 from the vmcb to the kvm state
before nested intercept handling is done. This allows to
simplify the vmexit path.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:20 +03:00
Joerg Roedel 2041a06a50 KVM: SVM: Make sure rip is synced to vmcb before nested vmexit
This patch fixes a bug where a nested guest always went over
the same instruction because the rip was not advanced on a
nested vmexit.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:18 +03:00
Joerg Roedel 924584ccb0 KVM: SVM: Fix nested nmi handling
The patch introducing nested nmi handling had a bug. The
check does not belong to enable_nmi_window but must be in
nmi_allowed. This patch fixes this.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:17 +03:00
Avi Kivity 8d3b932309 Merge remote branch 'tip/perf/core'
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:15 +03:00
Lai Jiangshan cdbecfc398 KVM: VMX: free vpid when fail to create vcpu
Fix bug of the exception path, free allocated vpid when fail
to create vcpu.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:19:10 +03:00
Wei Yongjun 77a1a71570 KVM: MMU: cleanup for function unaccount_shadowed()
Since gfn is not changed in the for loop, we do not need to call
gfn_to_memslot_unaliased() under the loop, and it is safe to move
it out.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:12 +03:00
Gui Jianfeng 2a059bf444 KVM: Get rid of dead function gva_to_page()
Nobody use gva_to_page() anymore, get rid of it.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:10 +03:00
Gui Jianfeng b2fc15a5ef KVM: MMU: Remove unused varialbe in rmap_next()
Remove unused varialbe in rmap_next()

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:09 +03:00
Gui Jianfeng 814a59d207 KVM: MMU: Make use of is_large_pte() in walker
Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK
bit directly.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:07 +03:00
Gui Jianfeng 51fb60d81b KVM: MMU: Move sync_page() first pte address calculation out of loop
Move first pte address calculation out of loop to save some cycles.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:06 +03:00
Avi Kivity 87bc3bf972 KVM: MMU: Drop cr4.pge from shadow page role
Since commit bf47a760f6, we no longer handle ptes with the global bit
set specially, so there is no reason to distinguish between shadow pages
created with cr4.gpe set and clear.

Such tracking is expensive when the guest toggles cr4.pge, so drop it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:03 +03:00
Lai Jiangshan 90d83dc3d4 KVM: use the correct RCU API for PROVE_RCU=y
The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm]
 #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
 [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3
 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
 [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
 [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm]
 [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm]
 [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
 [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
 [<ffffffff810a8692>] ? unlock_page+0x27/0x2c
 [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1
 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm]
 [<ffffffff81060cfa>] ? up_read+0x23/0x3d
 [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6
 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db
 [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241
 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116
 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13
 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a
 [<ffffffff810021db>] system_call_fastpath+0x16/0x1b

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:01 +03:00
Avi Kivity 9beeaa2d68 Merge branch 'perf'
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:58 +03:00
Xiao Guangrong 3246af0ece KVM: MMU: cleanup for hlist walk restart
Quote from Avi:

|Just change the assignment to a 'goto restart;' please,
|I don't like playing with list_for_each internals.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:56 +03:00
Gleb Natapov acb5451789 KVM: prevent spurious exit to userspace during task switch emulation.
If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:55 +03:00
Xiao Guangrong 6b18493d60 KVM: MMU: remove unused parameter in mmu_parent_walk()
'vcpu' is unused, remove it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:53 +03:00
Xiao Guangrong 0571d366e0 KVM: MMU: reduce 'struct kvm_mmu_page' size
Define 'multimapped' as 'bool'.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:52 +03:00
Xiao Guangrong 1b8c7934a4 KVM: MMU: remove unused struct kvm_unsync_walk
Remove 'struct kvm_unsync_walk' since it's not used.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:50 +03:00
Gleb Natapov 19d0443726 KVM: fix emulator_task_switch() return value.
emulator_task_switch() should return -1 for failure and 0 for success to
the caller, just like x86_emulate_insn() does.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:49 +03:00
Avi Kivity 5b7e0102ae KVM: MMU: Replace role.glevels with role.cr4_pae
There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way.  Drop
role.glevels and replace is with role.cr4_pae (which is meaningful).  This
simplifies the code a bit.

As a side effect, it allows sharing shadow page tables between pae and
longmode guest page tables at the same guest page.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:47 +03:00
Jan Kiszka e269fb2189 KVM: x86: Push potential exception error code on task switches
When a fault triggers a task switch, the error code, if existent, has to
be pushed on the new task's stack. Implement the missing bits.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:46 +03:00
Jan Kiszka 0760d44868 KVM: x86: Terminate early if task_switch_16/32 failed
Stop the switch immediately if task_switch_16/32 returned an error. Only
if that step succeeded, the switch should actually take place and update
any register states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:44 +03:00
Gleb Natapov 8f6abd06f5 KVM: x86: get rid of mmu_only parameter in emulator_write_emulated()
We can call kvm_mmu_pte_write() directly from
emulator_cmpxchg_emulated() instead of passing mmu_only down to
emulator_write_emulated_onepage() and call it there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:43 +03:00
Gleb Natapov 020df0794f KVM: move DR register access handling into generic code
Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.

Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:39 +03:00
Andre Przywara 6bc31bdc55 KVM: SVM: implement NEXTRIPsave SVM feature
On SVM we set the instruction length of skipped instructions
to hard-coded, well known values, which could be wrong when (bogus,
but valid) prefixes (REX, segment override) are used.
Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or
AthlonII) have an explicit NEXTRIP field in the VMCB containing the
desired information.
Since it is cheap to do so, we use this field to override the guessed
value on newer processors.
A fix for older CPUs would be rather expensive, as it would require
to fetch and partially decode the instruction. As the problem is not
a security issue and needs special, handcrafted code to trigger
(no compiler will ever generate such code), I omit a fix for older
CPUs.
If someone is interested, I have both a patch for these CPUs as well as
demo code triggering this issue: It segfaults under KVM, but runs
perfectly on native Linux.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:38 +03:00
Avi Kivity f7a711971e KVM: Fix MAXPHYADDR calculation when cpuid does not support it
MAXPHYADDR is derived from cpuid 0x80000008, but when that isn't present, we
get some random value.

Fix by checking first that cpuid 0x80000008 is supported.

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:36 +03:00
Avi Kivity e46479f852 KVM: Trace emulated instructions
Log emulated instructions in ftrace, especially if they failed.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:35 +03:00
Avi Kivity 2fb53ad811 KVM: x86 emulator: Don't overwrite decode cache
Currently if we an instruction spans a page boundary, when we fetch the
second half we overwrite the first half.  This prevents us from tracing
the full instruction opcodes.

Fix by appending the second half to the first.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:33 +03:00
Xiao Guangrong 24222c2fec KVM: MMU: remove unnecessary NX check in walk_addr
After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:30 +03:00
Xiao Guangrong f84cbb0561 KVM: MMU: remove unused field
kvm_mmu_page.oos_link is not used, so remove it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:29 +03:00
Xiao Guangrong 805d32dea4 KVM: MMU: cleanup/fix mmu audit code
This patch does:
- 'sp' parameter in inspect_spte_fn() is not used, so remove it
- fix 'kvm' and 'slots' is not defined in count_rmaps()
- fix a bug in inspect_spte_has_rmap()

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:27 +03:00
Avi Kivity 84b0c8c6a6 KVM: MMU: Disassociate direct maps from guest levels
Direct maps are linear translations for a section of memory, used for
real mode or with large pages.  As such, they are independent of the guest
levels.

Teach the mmu about this by making page->role.glevels = 0 for direct maps.
This allows direct maps to be shared among real mode and the various paging
modes.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:44 +03:00
Xiao Guangrong f815bce894 KVM: MMU: check reserved bits only if CR4.PSE=1 or CR4.PAE=1
- Check reserved bits only if CR4.PAE=1 or CR4.PSE=1 when guest #PF occurs
- Fix a typo in reset_rsvds_bits_mask()

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:42 +03:00
Marcelo Tosatti 041b1359a2 KVM: x86: document KVM_REQ_PENDING_TIMER usage
Document that KVM_REQ_PENDING_TIMER is implicitly used during guest
entry.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:40 +03:00
Gleb Natapov de3e6480f7 KVM: x86 emulator: fix unlocked CMPXCHG8B emulation
When CMPXCHG8B is executed without LOCK prefix it is racy. Preserve this
behaviour in emulator too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:38 +03:00
Gleb Natapov 6550e1f165 KVM: x86 emulator: add decoding of CMPXCHG8B dst operand
Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
introduced by "If LOCK prefix is used dest arg should be memory" commit.
This commit relies on dst operand be decoded at the beginning of an
instruction emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:37 +03:00
Gleb Natapov 482ac18ae2 KVM: x86 emulator: commit rflags as part of registers commit
Make sure that rflags is committed only after successful instruction
emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:35 +03:00
Jan Kiszka 9749a6c0f0 KVM: x86: Fix 32-bit build breakage due to typo
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:34 +03:00
Gleb Natapov 92bf9748b5 KVM: small kvm_arch_vcpu_ioctl_run() cleanup.
Unify all conditions that get us back into emulator after returning from
userspace.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:32 +03:00
Gleb Natapov 7b262e90fc KVM: x86 emulator: introduce pio in string read ahead.
To optimize "rep ins" instruction do IO in big chunks ahead of time
instead of doing it only when required during instruction emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:31 +03:00
Gleb Natapov 5cd21917da KVM: x86 emulator: restart string instruction without going back to a guest.
Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode, but return to a guest
mode each 1024 iterations to allow interrupt injection. Pending
exception causes immediate guest entry too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:29 +03:00