Commit Graph

7208 Commits

Author SHA1 Message Date
Jeremy Fitzhardinge b40c757964 x86/32: no need to use set_pte_present in set_pte_vaddr
Impact: cleanup, remove last user of set_pte_present

set_pte_vaddr() is only used to install ptes in fixmaps, and
should never be used to overwrite a present mapping.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <1237406613-2929-1-git-send-email-jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-19 14:04:18 +01:00
Ingo Molnar c58603e81b x86: mpparse: clean up code by introducing a few helper functions, fix
Impact: fix boot crash

This fixes commit a683027856.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1237403503.22438.21.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-19 08:52:13 +01:00
Lai Jiangshan e9d9df4473 ftrace: protect running nmi (V3)
When I review the sensitive code ftrace_nmi_enter(), I found
the atomic variable nmi_running does protect NMI VS do_ftrace_mod_code(),
but it can not protects NMI(entered nmi) VS NMI(ftrace_nmi_enter()).

cpu#1                   | cpu#2                 | cpu#3
ftrace_nmi_enter()      | do_ftrace_mod_code()  |
  not modify            |                       |
------------------------|-----------------------|--
executing               | set mod_code_write = 1|
executing             --|-----------------------|--------------------
executing               |                       | ftrace_nmi_enter()
executing               |                       |    do modify
------------------------|-----------------------|-----------------
ftrace_nmi_exit()       |                       |

cpu#3 may be being modified the code which is still being executed on cpu#1,
it will have undefined results and possibly take a GPF, this patch
prevents it occurred.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <49C0B411.30003@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-18 20:36:59 -04:00
H. Peter Anvin 5f64135612 x86, setup: fix the setting of 480-line VGA modes
Impact: fix rarely-used feature

The VGA Miscellaneous Output Register is read from address 0x3CC but
written to address 0x3C2.  This was missed when this code was
converted from assembly to C.  While we're at it, clean up the code by
making the overflow bits and the math used to set the bits explicit.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-18 16:54:05 -07:00
Jaswinder Singh Rajput a683027856 x86: mpparse: clean up code by introducing a few helper functions
Impact: cleanup

Refactor the MP-table parsing code via the introduction of the
following helper functions:

  skip_entry()
  smp_reserve_bootmem()
  check_irq_src()
  check_slot()

To simplify the code flow and to reduce the size of the
following oversized functions: smp_read_mpc(), smp_scan_config().

There should be no impact to functionality.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 17:15:05 +01:00
Rusty Russell c38da5692e x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus
Impact: cleanup, reduce memory usage for CONFIG_CPUMASK_OFFSTACK=y

Part of the "getting rid of obsolete cpumask_t" patch:

 1) Use cpumask_var_t: this is a pointer if CONFIG_CPUMASK_OFFSTACK=y
 2) Call alloc_cpumask_var() on first entry into enter_uniprocessor()
 3) Use modern cpumask_* functions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Pekka Paalanen <pq@iki.fi>
LKML-Reference: <200903111633.55952.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 13:51:51 +01:00
Rusty Russell 89bd55d185 x86: cpumask: update 32-bit APM not to mug current->cpus_allowed
Impact: cleanup, avoid cpumask games

The APM code wants to run on CPU 0: we create an "on_cpu0" wrapper
which uses work_on_cpu() if we're not already on cpu 0.

This introduces a new failure mode: -ENOMEM, so we add an explicit
err arg and handle Linux-style errnos in apm_err().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <200903111631.29787.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 13:51:45 +01:00
Ingo Molnar 4bae196735 x86: microcode: cleanup
Impact: cleanup

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Peter Oruba <peter.oruba@amd.com>
LKML-Reference: <200903111632.37279.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 13:51:17 +01:00
Rusty Russell af5c820a31 x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c
Impact: don't play with current's cpumask

Straightforward indirection through work_on_cpu().  One change is
that the error code from microcode_update_cpu() is now actually
plumbed back to microcode_init_cpu(), so now we printk if it fails
on cpu hotplug.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Peter Oruba <peter.oruba@amd.com>
LKML-Reference: <200903111632.37279.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 13:50:47 +01:00
Jaswinder Singh Rajput cde5edbda8 x86: kprobes.c fix compilation warning
arch/x86/kernel/kprobes.c:196: warning: passing argument 1 of ‘search_exception_tables’ makes integer from pointer without a cast

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference:<49BED952.2050809@redhat.com>
LKML-Reference: <1237378065.13488.2.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 13:21:01 +01:00
Ingo Molnar 705bb9dc72 Merge branches 'x86/cleanups', 'x86/cpu', 'x86/debug', 'x86/mce2', 'x86/mm', 'x86/mtrr', 'x86/setup', 'x86/setup-memory', 'x86/urgent', 'x86/uv', 'x86/x2apic' and 'linus' into x86/core
Conflicts:
	arch/parisc/kernel/irq.c
2009-03-18 13:19:49 +01:00
Jaswinder Singh Rajput 4e16c88875 x86: cpu/mttr/cleanup.c fix compilation warning
arch/x86/kernel/cpu/mtrr/cleanup.c:197: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long unsigned int’

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1237378015.13488.1.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 13:14:31 +01:00
Ingo Molnar 95f3c4ebff Merge branch 'dma-api/debug' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2009-03-18 10:37:48 +01:00
Ingo Molnar 04dfcfcb54 Merge branch 'linus' into core/iommu 2009-03-18 10:37:43 +01:00
Ingo Molnar 37ba317c9e Merge branches 'sched/cleanups' and 'linus' into sched/core 2009-03-18 09:57:02 +01:00
Rusty Russell 2c74d66624 x86, uv: fix cpumask iterator in uv_bau_init()
Impact: fix boot crash on UV systems

Commit 76ba0ecda0 "cpumask: use
cpumask_var_t in uv_flush_tlb_others" used cur_cpu as an iterator;
it was supposed to be zero for the code below it.

Reported-by: Cliff Wickman <cpw@sgi.com>
Original-From: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Mike Travis <travis@sgi.com>
Cc: steiner@sgi.com
Cc: <stable@kernel.org>
LKML-Reference: <200903180822.31196.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 09:47:54 +01:00
Rusty Russell 30e1e6d1af cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash
Impact: Fix cpu offline when CONFIG_MAXSMP=y

Changeset bc9b83dd1f "cpumask: convert
c1e_mask in arch/x86/kernel/process.c to cpumask_var_t" contained a
bug: c1e_mask is manipulated even if C1E isn't detected (and hence
not allocated).

This is simply fixed by checking for NULL (which gcc optimizes out
anyway of CONFIG_CPUMASK_OFFSTACK=n, since it knows ce1_mask can never
be NULL).

In addition, fix a leak where select_idle_routine re-allocates
(and re-clears) c1e_mask on every cpu init.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mike Travis <travis@sgi.com>
LKML-Reference: <200903171450.34549.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 09:40:35 +01:00
Suresh Siddha ce4e240c27 x86: add x2apic_wrmsr_fence() to x2apic flush tlb paths
Impact: optimize APIC IPI related barriers

Uncached MMIO accesses for xapic are inherently serializing and hence
we don't need explicit barriers for xapic IPI paths.

x2apic MSR writes/reads don't have serializing semantics and hence need
a serializing instruction or mfence, to make all the previous memory
stores globally visisble before the x2apic msr write for IPI.

Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "steiner@sgi.com" <steiner@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 09:36:14 +01:00
Andrew Morton a6b6a14e0c x86: use smp_call_function_single() in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Attempting to rid us of the problematic work_on_cpu().  Just use
smp_call_function_single() here.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090318042217.EF3F1DDF39@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 07:03:12 +01:00
Ingo Molnar 03418c7efa Merge branches 'tracing/ftrace' and 'linus' into tracing/core 2009-03-18 06:58:45 +01:00
Suresh Siddha 68a8ca593f x86: fix broken irq migration logic while cleaning up multiple vectors
Impact: fix spurious IRQs

During irq migration, we send a low priority interrupt to the previous
irq destination. This happens in non interrupt-remapping case after interrupt
starts arriving at new destination and in interrupt-remapping case after
modifying and flushing the interrupt-remapping table entry caches.

This low priority irq cleanup handler can cleanup multiple vectors, as
multiple irq's can be migrated at almost the same time. While
there will be multiple invocations of irq cleanup handler (one cleanup
IPI for each irq migration), first invocation of the cleanup handler
can potentially cleanup more than one vector (as the first invocation can
see the requests for more than vector cleanup). When we cleanup multiple
vectors during the first invocation of the smp_irq_move_cleanup_interrupt(),
other vectors that are to be cleanedup can still be pending in the local
cpu's IRR (as smp_irq_move_cleanup_interrupt() runs with interrupts disabled).

When we are ready to unhook a vector corresponding to an irq, check if that
vector is registered in the local cpu's IRR. If so skip that cleanup and
do a self IPI with the cleanup vector, so that we give a chance to
service the pending vector interrupt and then cleanup that vector
allocation once we execute the lowest priority handler.

This fixes spurious interrupts seen when migrating multiple vectors
at the same time.

[ This is apparently possible even on conventional xapic, although to
  the best of our knowledge it has never been seen.  The stable
  maintainers may wish to consider this one for -stable. ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: stable@kernel.org
2009-03-17 16:49:30 -07:00
Suresh Siddha 05c3dc2c4b x86, ioapic: Fix non atomic allocation with interrupts disabled
Impact: fix possible race

save_mask_IO_APIC_setup() was using non atomic memory allocation while getting
called with interrupts disabled. Fix this by splitting this into two different
function. Allocation part save_IO_APIC_setup() now happens before
disabling interrupts.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 15:45:29 -07:00
Suresh Siddha 29b61be65a x86, x2apic: cleanup ifdef CONFIG_INTR_REMAP in io_apic code
Impact: cleanup

Clean up #ifdefs and replace them with helper functions.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 15:45:07 -07:00
Suresh Siddha 0280f7c416 x86, x2apic: cleanup the IO-APIC level migration with interrupt-remapping
Impact: simplification

In the current code, for level triggered migration, we need to modify the
io-apic RTE with the update vector information, along with modifying interrupt
remapping table entry(IRTE) with vector and destination. This is to ensure that
remote IRR bit inthe IOAPIC RTE gets cleared when the cpu does EOI.

With this patch, for level triggered, we eliminate the io-apic RTE modification
(with the updated vector information), by using a virtual vector (io-apic pin
number).  Real vector that is used for interrupting cpu will be coming from
the interrupt-remapping table entry. Trigger mode in the IRTE will always be
edge, and the actual level or edge trigger will be setup in the IO-APIC RTE.
So a level triggered interrupt will appear as an edge to the local apic
cpu but still as level to the IO-APIC.

With this change, level irq migration can be done by simply modifying
the interrupt-remapping table entry with out changing the io-apic RTE.
And as the interrupt appears as edge at the cpu, in addition to do the
local apic EOI, we need to do IO-APIC directed EOI to clear the remote
IRR bit in  the IO-APIC RTE.

This simplies the irq migration in the presence of interrupt-remapping.

Idea-by: Rajesh Sankaran <rajesh.sankaran@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 15:44:27 -07:00
Suresh Siddha cf6567fe40 x86, x2apic: fix clear_local_APIC() in the presence of x2apic
Impact: cleanup, paranoia

We were not clearing the local APIC in clear_local_APIC() in the
presence of x2apic. Fix it.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 15:43:51 -07:00
Suresh Siddha 7c6d9f9785 x86, x2apic: use virtual wire A mode in disable_IO_APIC() with interrupt-remapping
Impact: make kexec work with x2apic

disable_IO_APIC() gets called during crashdump aswell, which configures the
IO-APIC/LAPIC so that legacy interrupts can be delivered for the kexec'd kernel.

In the presence of interrupt-remapping, we need to change the
interrupt-remapping configuration aswell as modifying IO-APIC for virtual wire
B mode.

To keep things simple during the crash, use virtual wire A mode
(for which we don't need to touch io-apic and interrupt-remapping tables).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 15:42:28 -07:00
Suresh Siddha 9d783ba042 x86, x2apic: enable fault handling for intr-remapping
Impact: interface augmentation (not yet used)

Enable fault handling flow for intr-remapping aswell. Fault handling
code now shared by both dma-remapping and intr-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 15:38:59 -07:00
H. Peter Anvin be721696ca x86, setup: move 32-bit code to .text32
Impact: cleanup

The setup code is mostly 16-bit code, but there is a small stub of
32-bit code at the end.  Move the 32-bit code to a separate segment,
.text32, to avoid scrambling the disassembly.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 15:26:06 -07:00
H. Peter Anvin 0a699af8e6 x86-32: move _end to a dummy section
Impact: build fix with CONFIG_RELOCATABLE

Move _end into a dummy section, so that relocs.c will know it is a
relocatable symbol.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
2009-03-17 14:16:02 -07:00
Jeremy Fitzhardinge 704439ddf9 x86/brk: put the brk reservations in their own section
Impact: disambiguate real .bss variables from .brk storage

Add a .brk section after the .bss section.  This has no effect
on the final vmlinux, but it more clearly distinguishes the space
taken by actual .bss symbols, and the variable space reserved
by .brk users.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-03-17 12:58:15 -07:00
Jeremy Fitzhardinge 0b1c723d0b x86/brk: make the brk reservation symbols inaccessible from C
Impact: bulletproofing, clarification

The brk reservation symbols are just there to document the amount
of space reserved by brk users in the final vmlinux file.  Their
addresses are irrelevent, and using their addresses will cause
certain havok.  Name them ".brk.NAME", which is a valid asm symbol
but C can't reference it; it also highlights their special
role in the symbol table.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-03-17 12:56:52 -07:00
H. Peter Anvin 60ac982139 x86-32: tighten the bound on additional memory to map
Impact: Tighten bound to avoid masking errors

The definition of MAPPING_BEYOND_END was excessive; this has a nasty
tendency to mask bugs.  We have learned over time that this kind of
bug hiding can cause some very strange errors.  Therefore, tighten the
bound to only need to map the actual kernel area.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2009-03-17 11:52:10 -07:00
Jeremy Fitzhardinge b8a22a6273 x86-32: remove ALLOCATOR_SLOP from head_32.S
Impact: cleanup

ALLOCATOR_SLOP is a vestigial remain from when we used the
bootmem allocator to allocate the kernel's linear memory mapping.
Now we directly reserve pages from the e820 mapping, and no
longer require secondary structures to keep track of allocated
pages.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 11:46:01 -07:00
Jeremy Fitzhardinge c090f532db x86-32: make sure we map enough to fit linear map pagetables
Impact: crash fix

head_32.S needs to map the kernel itself, and enough space so
that mm/init.c can allocate space from the e820 allocator
for the linear map of low memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-03-17 11:42:05 -07:00
Masami Hiramatsu 30390880de prevent boosting kprobes on exception address
Don't boost at the addresses which are listed on exception tables,
because major page fault will occur on those addresses.  In that case,
kprobes can not ensure that when instruction buffer can be freed since
some processes will sleep on the buffer.

kprobes-ia64 already has same check.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-17 09:11:48 -07:00
Linus Torvalds 9e8912e04e Fast TSC calibration: calculate proper frequency error bounds
In order for ntpd to correctly synchronize the clocks, the frequency of
the system clock must not be off by more than 500 ppm (or, put another
way, 1:2000), or ntpd will end up giving up on trying to synchronize
properly, and ends up reseting the clock in jumps instead.

The fast TSC PIT calibration sometimes failed this test - it was
assuming that the PIT reads always took about one microsecond each (2us
for the two reads to get a 16-bit timer), and that calibrating TSC to
the PIT over 15ms should thus be sufficient to get much closer than
500ppm (max 2us error on both sides giving 4us over 15ms: a 270 ppm
error value).

However, that assumption does not always hold: apparently some hardware
is either very much slower at reading the PIT registers, or there was
other noise causing at least one machine to get 700+ ppm errors.

So instead of using a fixed 15ms timing loop, this changes the fast PIT
calibration to read the TSC delta over the individual PIT timer reads,
and use the result to calculate the error bars on the PIT read timing
properly.  We then successfully calibrate the TSC only if the maximum
error bars fall below 500ppm.

In the process, we also relax the timing to allow up to 25ms for the
calibration, although it can happen much faster depending on hardware.

Reported-and-tested-by: Jesper Krogh <jesper@krogh.cc>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-17 08:13:17 -07:00
Linus Torvalds a6a80e1d8c Fix potential fast PIT TSC calibration startup glitch
During bootup, when we reprogram the PIT (programmable interval timer)
to start counting down from 0xffff in order to use it for the fast TSC
calibration, we should also make sure to delay a bit afterwards to allow
the PIT hardware to actually start counting with the new value.

That will happens at the next CLK pulse (1.193182 MHz), so the easiest
way to do that is to just wait at least one microsecond after
programming the new PIT counter value.  We do that by just reading the
counter value back once - which will take about 2us on PC hardware.

Reported-and-tested-by: john stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-17 07:58:26 -07:00
Joerg Roedel 86f3195293 dma-debug/x86: register pci bus for dma-debug leak detection
Impact: detect dma memory leaks for pci devices

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-03-17 12:56:49 +01:00
Joerg Roedel 2118d0c548 dma-debug: x86 architecture bindings
Impact: make use of DMA-API debugging code in x86

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-03-17 12:56:46 +01:00
Yinghai Lu f0348c438c x86: MTRR workaround for system with stange var MTRRs
Impact: don't trim e820 according to wrong mtrr

Ozan reports that his server emits strange warning.
it turns out the BIOS sets the MTRRs incorrectly.

Ignore those strange ranges, and don't trim e820,
just emit one warning about BIOS

Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49BEE1E7.7020706@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-17 10:47:47 +01:00
Jeremy Fitzhardinge 42854dc0a6 x86, paravirt: prevent gcc from generating the wrong addressing mode
Impact: fix crash on VMI (VMware)

When we generate a call sequence for calling a paravirtualized
function, we presume that the generated code is "call *0xXXXXX",
which is a 6 byte opcode; this is larger than a normal
direct call, and so we can patch a direct call over it.

At the moment, however we give gcc enough rope to hang us by
putting the address in a register and generating a two byte
indirect-via-register call.  Prevent this by explicitly
dereferencing the function pointer and passing it into the
asm as a constant.

This prevents crashes in VMI, as it cannot handle unpatchable
callsites.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
LKML-Reference: <49BEEDC2.2070809@goop.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-16 18:36:31 -07:00
Thomas Gleixner 250981e6e1 x86: reduce preemption off section in exit thread
Impact: latency improvement

No need to keep preemption disabled over the kfree call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-03-16 15:32:28 +01:00
Hidetoshi Seto 514ec49a5f x86, mce: remove incorrect __cpuinit for intel_init_cmci()
Impact: Bug fix on UP

Referring commit cc3ca22063,
Peter removed __cpuinit annotations for mce_cpu_features()
and its successor functions, which caused troubles on UP
configurations.

However the intel_init_cmci() was introduced after that and
it also has __cpuinit annotation even though it is called from
mce_cpu_features(). Remove the annotation from that function
too.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-16 09:15:32 +01:00
Akinobu Mita 0920dce7d5 x86, mm: remove unnecessary include file from iomap_32.c
asm/highmem.h inclusion is added to use kmap_atomic_prot_pfn()
by commit bb6d59ca92

Now kmap_atomic_prot_pfn is moved to iomap_32.c
by commit dd63fdcc63

So the asm/highmem.h inclusion in iomap_32.c is unnecessary now.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090315151517.GA29074@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-15 20:05:08 +01:00
Yinghai Lu c61cf4cfe7 x86: print out more info in e820_update_range()
Impact: help debug e820 bugs

Try to print out more info, to catch wrong call parameters.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49BCB557.3030000@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-15 10:01:59 +01:00
Yinghai Lu 6d7942dc2a x86: fix 64k corruption-check
Impact: fix boot crash

Need to exit early if the addr is far above 64k.

The crash got exposed by:

  78a8b35: x86: make e820_update_range() handle small range update

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org>
LKML-Reference: <49BC2279.2030101@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-15 07:03:15 +01:00
Yinghai Lu 2bd2753ff4 x86: put initial_pg_tables into .bss
Impact: makes vmlinux section information more useful

Don't use ram after _end blindly for pagetables. aka init pages is before _end
put those pg table into .bss

[Adapted to use brk segment - Jeremy]

v2: keep initial page table up to 512M only.
v4: put initial page tables just before _end

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-14 17:23:47 -07:00
Jeremy Fitzhardinge 796216a57f x86: allow extend_brk users to reserve brk space
Impact: new interface; remove hard-coded limit

Add RESERVE_BRK(name, size) macro to reserve space in the brk
area.  This should be a conservative (ie, larger) estimate of
how much space might possibly be required from the brk area.
Any unused space will be freed, so there's no real downside
on making the reservation too large (within limits).

The name should be unique within a given file, and somewhat
descriptive.

The C definition of RESERVE_BRK() ends up being more complex than
one would expect to work around a cluster of gcc infelicities:

  The first attempt was to simply try putting __section(.brk_reservation)
  on a variable.  This doesn't work because it ends up making it a
  @progbits section, which gets actual space allocated in the vmlinux
  executable.

  The second attempt was to emit the space into a section using asm,
  but gcc doesn't allow arguments to be passed to file-level asm()
  statements, making it hard to pass in the size.

  The final attempt is to wrap the asm() in a function to allow
  it to have arguments, and put the function itself into the
  .discard section, which vmlinux*.lds drops entirely from the
  emitted vmlinux.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-14 17:23:47 -07:00
Yinghai Lu 7543c1de84 x86-32: compute initial mapping size more accurately
Impact: simplification

We only need to map the kernel in head_32.S, not the whole of
lowmem.  We use 512MB as a reasonable (but arbitrary) limit on
the maximum size of the kernel image.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-14 17:23:47 -07:00
Jeremy Fitzhardinge 6de6cb442e x86: use brk allocation for DMI
Impact: use new interface instead of previous ad hoc implementation

Use extend_brk() to allocate memory for DMI rather than having an
ad-hoc allocator.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-14 17:23:47 -07:00
Jeremy Fitzhardinge ccf3fe02e3 x86-32: use brk segment for allocating initial kernel pagetable
Impact: use new interface instead of previous ad hoc implementation

Rather than having special purpose init_pg_table_start/end variables
to delimit the kernel pagetable built by head_32.S, just use the brk
mechanism to extend the bss for the new pagetable.

This patch removes init_pg_table_start/end and pg0, defines __brk_base
(which is page-aligned and immediately follows _end), initializes
the brk region to start there, and uses it for the 32-bit pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-14 17:23:47 -07:00
H. Peter Anvin 5368a2be34 x86: move brk initialization out of #ifdef CONFIG_BLK_DEV_INITRD
Impact: build fix

The brk initialization functions were incorrectly located inside
an #ifdef CONFIG_VLK_DEV_INITRD block, causing the obvious build failure in
minimal configurations.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-03-14 17:23:41 -07:00
Jeremy Fitzhardinge 93dbda7cbc x86: add brk allocation for very, very early allocations
Impact: new interface

Add a brk()-like allocator which effectively extends the bss in order
to allow very early code to do dynamic allocations.  This is better than
using statically allocated arrays for data in subsystems which may never
get used.

The space for brk allocations is in the bss ELF segment, so that the
space is mapped properly by the code which maps the kernel, and so
that bootloaders keep the space free rather than putting a ramdisk or
something into it.

The bss itself, delimited by __bss_stop, ends before the brk area
(__brk_base to __brk_limit).  The kernel text, data and bss is reserved
up to __bss_stop.

Any brk-allocated data is reserved separately just before the kernel
pagetable is built, as that code allocates from unreserved spaces
in the e820 map, potentially allocating from any unused brk memory.
Ultimately any unused memory in the brk area is used in the general
kernel memory pool.

Initially the brk space is set to 1MB, which is probably much larger
than any user needs (the largest current user is i386 head_32.S's code
to build the pagetables to map the kernel, which can get fairly large
with a big kernel image and no PSE support).  So long as the system
has sufficient memory for the bootloader to reserve the kernel+1MB brk,
there are no bad effects resulting from an over-large brk.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-14 15:37:14 -07:00
Jeremy Fitzhardinge b9719a4d9c x86: make section delimiter symbols part of their section
Impact: cleanup

Move the symbols delimiting a section part of the section
(section relative) rather than absolute.  This avoids any
unexpected gaps between the section-start symbol and the first
data in the section, which could be caused by implicit
alignment of the section data.  It also makes the general
form of vmlinux_64.lds.S consistent with vmlinux_32.lds.S.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-14 15:37:14 -07:00
Jaswinder Singh Rajput f4c3c4cdb1 x86: cpu_debug add support for various AMD CPUs
Impact: Added AMD CPUs support

Added flags for various AMD CPUs.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 18:07:58 +01:00
Sebastian Andrzej Siewior 48f4c485c2 x86/centaur: merge 32 & 64 bit version
there should be no difference, except:

 * the 64bit variant now also initializes the padlock unit.
 * ->c_early_init() is executed again from ->c_init()
 * the 64bit fixups made into 32bit path.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: herbert@gondor.apana.org.au
LKML-Reference: <1237029843-28076-2-git-send-email-sebastian@breakpoint.cc>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 16:27:29 +01:00
Ingo Molnar 0ca0f16fd1 Merge branches 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/debug', 'x86/kconfig', 'x86/mm', 'x86/ptrace', 'x86/setup' and 'x86/urgent'; commit 'v2.6.29-rc8' into x86/core 2009-03-14 16:25:40 +01:00
Yinghai Lu d4c90e37a2 x86: print the continous part of fixed mtrrs together
Impact: print out fewer lines

 1. print continuous range with same type together
 2. change _INFO to _DEBUG

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49BACB61.8000302@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 12:27:06 +01:00
Yinghai Lu 63516ef6d6 x86: fix get_mtrr() warning about smp_processor_id() with CONFIG_PREEMPT=y
Impact: fix debug warning

Jaswinder noticed that there is a warning about smp_processor_id()
in get_mtrr().

Fix it by wrapping the printout into a get/put_cpu() pair.

Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49BAB7FF.4030107@kernel.org>
[ changed to get/put_cpu(), cleaned up surrounding code a it. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 12:27:06 +01:00
Yinghai Lu 78a8b35bc7 x86: make e820_update_range() handle small range update
Impact: enhance e820 code to handle more cases

Try to handle new range which could be covered by one entry.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: jbeulich@novell.com
LKML-Reference: <49B9F0C1.10402@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 12:20:07 +01:00
Ingo Molnar 0f3fa48a7e x86: cpu/common.c more cleanups
Complete/fix the cleanups of cpu/common.c:

 - fix ugly warning due to asm/topology.h -> linux/topology.h change
 - standardize the style across the file
 - simplify/refactor the code flow where possible

Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
LKML-Reference: <1237009789.4387.2.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 10:37:34 +01:00
Ingo Molnar c550033ced Merge branch 'core/percpu' into x86/core 2009-03-14 09:50:10 +01:00
Ingo Molnar 62395efdb0 Merge branch 'x86/asm' into tracing/syscalls
We need the wider TIF work-mask checks in entry_32.S.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 09:44:08 +01:00
Jaswinder Singh Rajput 88200bc28d x86: entry_32.S fix compile warnings - fix work mask bit width
Fix:

 arch/x86/kernel/entry_32.S:446: Warning: 00000000080001d1 shortened to 00000000000001d1
 arch/x86/kernel/entry_32.S:457: Warning: 000000000800feff shortened to 000000000000feff
 arch/x86/kernel/entry_32.S:527: Warning: 00000000080001d1 shortened to 00000000000001d1
 arch/x86/kernel/entry_32.S:541: Warning: 000000000800feff shortened to 000000000000feff
 arch/x86/kernel/entry_32.S:676: Warning: 0000000008000091 shortened to 0000000000000091

TIF_SYSCALL_FTRACE is 0x08000000 and until now we checked the
first 16 bits of the work mask - bit 27 falls outside of that.

Update the entry_32.S code to check the full 32-bit mask.

[ %cx => %ecx fix from Cyrill Gorcunov <gorcunov@gmail.com> ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "H. Peter Anvin" <hpa@kernel.org>
LKML-Reference: <1237012693.18733.3.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 09:42:51 +01:00
Jaswinder Singh Rajput 9766cdbcb2 x86: cpu/common.c cleanups
- fix various style problems
 - declare varibles before they get used
 - introduced clear_all_debug_regs
 - fix header files issues

LKML-Reference: <1237009789.4387.2.camel@localhost.localdomain>
Signed-off-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-14 08:59:50 +01:00
Ingo Molnar 0634023562 Merge branch 'x86/core' into x86/kconfig 2009-03-13 17:08:30 +01:00
Ingo Molnar ccd50dfd92 tracing/syscalls: support for syscalls tracing on x86, fix
Impact: build fix

 kernel/built-in.o: In function `ftrace_syscall_exit':
 (.text+0x76667): undefined reference to `syscall_nr_to_meta'

ftrace.o is built:

obj-$(CONFIG_DYNAMIC_FTRACE)    += ftrace.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o

But now a CONFIG_FTRACE_SYSCALLS dependency is needed too.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1236401580-5758-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 17:02:17 +01:00
Frederic Weisbecker f58ba10067 tracing/syscalls: support for syscalls tracing on x86
Extend x86 architecture syscall tracing support with syscall
metadata table details.

(The upcoming core syscall tracing modifications rely on this.)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236955332-10133-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 16:57:42 +01:00
Thomas Gleixner f9a36fa541 x86: disable __do_IRQ support
Impact: disable unused code

x86 is fully converted to flow handlers. No need to keep the
deprecated __do_IRQ() support active.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-03-13 16:46:37 +01:00
Rusty Russell 0b966252d9 cpumask: convert node_to_cpumask_map[] to cpumask_var_t
Impact: fix (CONFIG_MAXSMP=y only) boot crash

c032ef60d1 "cpumask: convert
node_to_cpumask_map[] to cpumask_var_t" didn't get this one
conversion.  There was a compile warning, but I missed it.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mike Travis <travis@sgi.com>
LKML-Reference: <200903132342.42813.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 14:35:31 +01:00
Américo Wang 5a8ac9d28d x86: ptrace, bts: fix an unreachable statement
Commit c2724775ce put a statement
after return, which makes that statement unreachable.

Move that statement before return.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
LKML-Reference: <20090313075622.GB8933@hack>
Cc: <stable@kernel.org> # .29 only
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 10:27:57 +01:00
Andreas Herrmann 3ff42da504 x86: mtrr: don't modify RdDram/WrDram bits of fixed MTRRs
Impact: bug fix + BIOS workaround

BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs
after fixed MTRRs are configured.

Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs).

This can lead to obfuscation in Linux when this bit is not cleared on
BP but cleared on APs. A consequence of this is that the saved
fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs --
because RdDram/WrDram bits are read as zero when
SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync
fixed-MTRR state from BP to AP. This implies that Linux sets
SYSCFG[MtrrFixDramEn] and activates those bits.

More important is that (some) systems change these bits in SMM when
ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits,
too.

(1) The patch modifies an old fix from Bernhard Kaindl to get
    suspend/resume working on some Acer Laptops. Bernhard's patch
    tried to sync RdMem/WrMem bits of fixed MTRR registers and that
    helped on those old Laptops. (Don't ask me why -- can't test it
    myself). But this old problem was not the motivation for the
    patch. (See http://lkml.org/lkml/2007/4/3/110)

(2) The more important effect is to fix issues on some more current systems.

    On those systems Linux panics or just freezes, see

    http://bugzilla.kernel.org/show_bug.cgi?id=11541
    (and also duplicates of this bug:
    http://bugzilla.kernel.org/show_bug.cgi?id=11737
    http://bugzilla.kernel.org/show_bug.cgi?id=11714)

    The affected systems boot only using acpi=ht, acpi=off or
    when the kernel is built with CONFIG_MTRR=n.

    The acpi options prevent full enablement of ACPI.  Obviously when
    ACPI is enabled the BIOS/SMM modfies RdMem/WrMem bits.  When
    CONFIG_MTRR=y Linux also accesses and modifies those bits when it
    needs to sync fixed-MTRRs across cores (Bernhard's fix, see (1)).
    How do you synchronize that? You can't. As a consequence Linux
    shouldn't touch those bits at all (Rationale are AMD's BKDGs which
    recommend to clear the bit that makes RdMem/WrMem accessible).
    This is the purpose of this patch. And (so far) this suffices to
    fix (1) and (2).

I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and
SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as
suggested by AMD K8, and AMD family 10h/11h BKDGs.
BIOS is expected to do this anyway. This should avoid that
Linux and SMM tread on each other's toes ...

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: trenn@suse.de
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090312163937.GH20716@alberich.amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 10:19:27 +01:00
Frederic Weisbecker 1b3fa2ce64 tracing/x86: basic implementation of syscall tracing for x86
Provide the x86 trace callbacks to trace syscalls.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1236401580-5758-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 06:25:44 +01:00
Ingo Molnar 238a5b4bff Merge branch 'cpus4096' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-x86 into cpus4096 2009-03-13 05:54:55 +01:00
Ingo Molnar 17d85bc756 Merge commit 'v2.6.29-rc8' into cpus4096 2009-03-13 05:54:43 +01:00
Yinghai Lu 773e673de2 x86: fix e820_update_range()
Impact: fix left range size on head

| commit 5c0e6f035d
|    x86: fix code paths used by update_mptable
|    Impact: fix crashes under Xen due to unrobust e820 code

fixes one e820 bug, but introduces another bug.

Need to update size for left range at first in case it is header.

also add __e820_add_region take more parameter.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: jbeulich@novell.com
LKML-Reference: <49B9E286.502@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 05:38:29 +01:00
Rusty Russell 73e907de7d cpumask: remove x86 cpumask_t uses.
Impact: cleanup

We are removing cpumask_t in favour of struct cpumask: mainly as a
marker of what code is now CONFIG_CPUMASK_OFFSTACK-safe.

The only non-trivial change here is vector_allocation_domain():
explicitly clear the mask and set the first word, rather than using
assignment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:57 +10:30
Rusty Russell 76ba0ecda0 cpumask: use cpumask_var_t in uv_flush_tlb_others.
Impact: remove cpumask_t, reduce per-cpu size for CONFIG_CPUMASK_OFFSTACK=y

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:57 +10:30
Rusty Russell 5c6cb5e2b1 cpumask: remove cpumask_t assignment from vector_allocation_domain()
Impact: cleanup

It's not legal to do assignments into cpumask_var_t; they will soon be of
variable length.

So explicitly clear the mask and set the first word, rather than using
assignment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:56 +10:30
Rusty Russell 70ba2b6a70 cpumask: clean up summit's send_IPI functions
Impact: cleanup, remove cpumask from stack

summit_send_IPI_allbutself might as well call
default_send_IPI_mask_allbutself_logical().  Also change cpumask_t to
struct cpumask and &cpu_online_map to cpu_online_mask while here.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:55 +10:30
Rusty Russell 4f0628963c cpumask: use new cpumask functions throughout x86
Impact: cleanup

1) &cpu_online_map -> cpu_online_mask
2) first_cpu/next_cpu_nr -> cpumask_first/cpumask_next
3) cpu_*_map manipulation -> init_cpu_* / set_cpu_*

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:54 +10:30
Rusty Russell 3f76a183de x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask
Impact: cleanup

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:54 +10:30
Rusty Russell 155dd720d0 cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t
Impact: reduce kernel memory usage when CONFIG_CPUMASK_OFFSTACK=y

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:53 +10:30
Rusty Russell c032ef60d1 cpumask: convert node_to_cpumask_map[] to cpumask_var_t
Impact: reduce kernel memory usage when CONFIG_CPUMASK_OFFSTACK=y

Straightforward conversion: done for 32 and 64 bit kernels.
node_to_cpumask_map is now a cpumask_var_t array.

64-bit used to be a dynamic cpumask_t array, and 32-bit used to be a
static cpumask_t array.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:53 +10:30
Rusty Russell 71ee73e722 x86: unify 32 and 64-bit node_to_cpumask_map
Impact: cleanup

We take the 64-bit code and use it on 32-bit as well.  The new file
is called mm/numa.c.

In a minor cleanup, we use cpu_none_mask instead of declaring a local
cpu_mask_none.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:52 +10:30
Rusty Russell b9c4398ed4 cpumask: remove x86's node_to_cpumask now everyone uses cpumask_of_node
Impact: cleanup

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:52 +10:30
Rusty Russell b643decad6 x86: arch_send_call_function_ipi_mask
Impact: implement new API

We define arch_send_call_function_ipi_mask and generic kernel/smp.c
code creates arch_send_call_function_ipi() as a wrapper.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:51 +10:30
Rusty Russell 996867d096 cpumask: convert arch/x86/kernel/cpu/mcheck/mce_64.c
Impact: reduce kernel memory usage when CONFIG_CPUMASK_OFFSTACK=y

Simple conversion of mce_device_initialized to cpumask_var_t.  We don't
check the alloc_cpumask_var() return since it's boot-time only, and
the misc_register() in that same function isn't checked.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:51 +10:30
Rusty Russell 7ad728f981 cpumask: x86: convert cpu_sibling_map/cpu_core_map to cpumask_var_t
Impact: reduce per-cpu size for CONFIG_CPUMASK_OFFSTACK=y

In most places it's cleaner to use the accessors cpu_sibling_mask()
and cpu_core_mask() wrappers which already exist.

I couldn't avoid cleaning up the access in oprofile, either.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:50 +10:30
Rusty Russell fcef8576d8 cpumask: convert arch/x86/kernel/nmi.c's backtrace_mask to a cpumask_var_t
Impact: cleanup, reduce memory usage for CONFIG_CPUMASK_OFFSTACK=y

I *think* every path calls check_nmi_watchdog before using the
watchdog, so that's the right place for the initialization.

If that's wrong, we'll get a nice NULL-deref with
CONFIG_CPUMASK_OFFSTACK=y, and have uncovered another bug.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:49 +10:30
Rusty Russell bc9b83dd1f cpumask: convert c1e_mask in arch/x86/kernel/process.c to cpumask_var_t.
Impact: reduce kernel size when CONFIG_CPUMASK_OFFSTACK=y

Simple conversion.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:49 +10:30
Rusty Russell d3d2e7f243 cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: x86
Impact: cleanup

There were replaced by topology_core_cpumask and topology_thread_cpumask.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:48 +10:30
Rusty Russell 23c5c9c662 cpumask: remove cpu_coregroup_map: x86
Impact: cleanup

cpu_coregroup_mask is the New Hotness.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:48 +10:30
Rusty Russell cb3d560f36 cpumask: remove the now-obsoleted pcibus_to_cpumask(): x86
Impact: reduce stack usage for large NR_CPUS

cpumask_of_pcibus() is the new version.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:47 +10:30
Rusty Russell 101aaca1f3 cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL.: x86
Impact: cleanup

(Thanks to Al Viro for reminding me of this, via Ingo)

CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:

	#define CPU_MASK_ALL (cpumask_t) { { ... } }

Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:

	#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

Which formalizes this practice.  One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).

So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
with the modern "cpu_all_mask" (a real const struct cpumask *), and remove
CPU_MASK_ALL_PTR altogether.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
2009-03-13 14:49:47 +10:30
Ingo Molnar f6411fe7e0 Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/core 2009-03-13 04:50:44 +01:00
Pallipadi, Venkatesh 4bb9c5c021 VM, x86, PAT: Change is_linear_pfn_mapping to not use vm_pgoff
Impact: fix false positive PAT warnings - also fix VirtalBox hang

Use of vma->vm_pgoff to identify the pfnmaps that are fully
mapped at mmap time is broken. vm_pgoff is set by generic mmap
code even for cases where drivers are setting up the mappings
at the fault time.

The problem was originally reported here:

 http://marc.info/?l=linux-kernel&m=123383810628583&w=2

Change is_linear_pfn_mapping logic to overload VM_INSERTPAGE
flag along with VM_PFNMAP to mean full PFNMAP setup at mmap
time.

Problem also tracked at:

 http://bugzilla.kernel.org/show_bug.cgi?id=12800

Reported-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha>@intel.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: "ebiederm@xmission.com" <ebiederm@xmission.com>
Cc: <stable@kernel.org> # only for 2.6.29.1, not .28
LKML-Reference: <20090313004527.GA7176@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 04:28:50 +01:00
Jaswinder Singh Rajput 91219bcbdc x86: cpu_debug add write support for MSRs
Supported write flag for registers.
currently write is enabled only for PMC MSR.

[root@ht]# cat /sys/kernel/debug/x86/cpu/cpu1/pmc/0x300/value
0x0

[root@ht]# echo 1234 > /sys/kernel/debug/x86/cpu/cpu1/pmc/0x300/value
[root@ht]# cat /sys/kernel/debug/x86/cpu/cpu1/pmc/0x300/value
0x4d2

[root@ht]# echo 0x1234 > /sys/kernel/debug/x86/cpu/cpu1/pmc/0x300/value
[root@ht]# cat /sys/kernel/debug/x86/cpu/cpu1/pmc/0x300/value
0x1234

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 03:02:45 +01:00
Yinghai Lu 0d890355bf x86: separate mtrr cleanup/mtrr_e820 trim to separate file
Impact: cleanup

mtrr main.c is too big, seperate mtrr cleanup and mtrr e820 trim
code to another file.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B87C7B.80809@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:52:19 +01:00
Yinghai Lu c1ab7e93c6 x86: print out mtrr_range_state when user specify size
Impact: print more debug info

Keep it consistent with autodetect version.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B87C0A.4010105@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:52:18 +01:00
Yinghai Lu 8ad9790588 x86: more MTRR debug printouts
Impact: improve MTRR debugging messages

There's still inefficiencies suspected with the MTRR sanitizing
code, so make sure we get all the info we need from a dmesg.

- Remove unneeded mtrr_show

 (It will only printout one time by first cpu, so it is no big deal.)

- Also print out directly from get_mtrr, because it doesn't update mtrr_state.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B9BA5A.40108@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:52:18 +01:00
Jan Beulich 698609bdcd x86: create a non-zero sized bm_pte only when needed
Impact: kernel image size reduction

Since in most configurations the pmd page needed maps the same range of
virtual addresses which is also mapped by the earlier inserted one for
covering FIX_DBGP_BASE, that page (and its insertion in the page
tables) can be avoided altogether by detecting the condition at compile
time.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B91826.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:37:20 +01:00
Jan Beulich 5c0e6f035d x86: fix code paths used by update_mptable
Impact: fix crashes under Xen due to unrobust e820 code

find_e820_area_size() must return a properly distinguishable and
out-of-bounds value when it fails, and -1UL does not meet that
criteria on i386/PAE. Additionally, callers of the function must
check against that value.

early_reserve_e820() should be prepared for the region found to be
outside of the addressable range on 32-bits.

e820_update_range_map() should not blindly update e820, but should do
all it work on the map it got a pointer passed for (which in 50% of the
cases is &e820_saved). It must also not call e820_add_region(), as that
again acts on e820 unconditionally.

The issues were found when trying to make this option work in our Xen
kernel (i.e. where some of the silent assumptions made in the code
would not hold).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B9171B.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:37:19 +01:00
Jan Beulich 82034d6f59 x86: clean up output resulting from update_mptable option
Impact: cleanup

Without apic=verbose, using the update_mptable option would result in
garbled and confusing output due to the inconsistent use of printk() vs
apic_printk().

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B914B6.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:37:19 +01:00
Jan Beulich 9a50156a1c x86: properly __init-annotate recent early_printk additions
Impact: cleanup, save memory

Don't keep code resident that's only needed during startup.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B91103.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:37:18 +01:00
Jan Beulich dc9dd5cc85 x86: move save_mr() into .meminit.text
Impact: cleanup, save memory

The function is only being called from boot or memory hotplug paths.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B910B6.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:37:18 +01:00
Jan Beulich 13c6c53282 x86, 32-bit: also use cpuinfo_x86's x86_{phys,virt}_bits members
Impact: 32/64-bit consolidation

In a first step, this allows fixing phys_addr_valid() for PAE (which
until now reported all addresses to be valid). Subsequently, this will
also allow simplifying some MTRR handling code.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B9101E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:37:17 +01:00
Jan Beulich 46d50c98d9 x86, 32-bit: also limit NODES_HIGH_SHIFT here
Impact: configuration bug fix

Just like for x86-64, the range of widths valid for NODE_SHIFT is not
unbounded. The upper bound 64-bit uses is definitely also an upper
bound for 32-bit.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B90F12.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:37:17 +01:00
Ingo Molnar dd63fdcc63 x86: unify kmap_atomic_pfn() and iomap_atomic_prot_pfn(), fix
Impact: build fix

Move kmap_atomic_prot_pfn() to iomap_32.c. It is used on all 32-bit
kernels, while highmem_32.c is only built on highmem kernels.

( Note: the debug_kmap_atomic_prot() check is removed for now, that
  problem is handled via another patch. )

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090311143317.GA22244@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 03:24:53 +01:00
Jan Beulich 7a81d9a7da x86: smarten /proc/interrupts output
Impact: change /proc/interrupts output ABI

With the number of interrupts on large systems growing, assumptions on
the width an interrupt number requires when converted to a decimal
string turn invalid. Therefore, calculate the maximum number of digits
dynamically.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B911EB.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:36:52 +01:00
Ingo Molnar 480c93df5b Merge branch 'core/locking' into tracing/ftrace 2009-03-13 01:33:21 +01:00
H. Peter Anvin 16a6791934 x86: use targets in the boot Makefile instead of CLEAN_FILES
Impact: cleanup

Instead of using CLEAN_FILES in arch/x86/Makefile, add generated files
to targets in arch/x86/boot/Makefile, so they will get naturally
cleaned up by "make clean".

Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-12 13:43:14 -07:00
H. Peter Anvin f9c5107c2b x86: remove additional vestiges of the zImage/bzImage split
Impact: cleanup

Remove targets that were used for zImage only, and Makefile
infrastructure that was there to support the zImage/bzImage split.

Reported-by: Paul Bolle <pebolle@tiscali.nl>
LKML-Reference: <1236879901.24144.26.camel@test.thuisdomein>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-12 12:50:33 -07:00
Jan Beulich 02dde8b45c x86: move various CPU initialization objects into .cpuinit.rodata
Impact: debuggability and micro-optimization

Putting whatever is possible into the (final) .rodata section increases
the likelihood of catching memory corruption bugs early, and reduces
false cache line sharing.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B90961.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 13:13:07 +01:00
Jan Beulich 821508d4ef x86: move a few device initialization objects into .devinit.rodata
Impact: debuggability and micro-optimization

Putting whatever is possible into the (final) .rodata section increases
the likelihood of catching memory corruption bugs early, and reduces
false cache line sharing.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B909A5.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 13:12:19 +01:00
Jan Beulich 6a5c05f002 x86: fix HYPERVISOR_update_descriptor()
Impact: fix potential oops during app-initiated LDT manipulation

The underlying hypercall has differing argument requirements on 32-
and 64-bit.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <49B9061E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 12:56:21 +01:00
Ingo Molnar f3b6eaf014 x86: memcpy, clean up
Impact: cleanup

Make this file more readable by bringing it more in line
with the usual kernel style.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 12:21:17 +01:00
Jan Beulich dd1ef4ec47 x86-64: remove unnecessary spill/reload of rbx from memcpy
Impact: micro-optimization

This should slightly improve its performance.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B8F641.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 12:04:47 +01:00
Jan Beulich c2810188c1 x86-64: move save_paranoid into .kprobes.text
Impact: mark save_paranoid as non-kprobe-able code

This appears to be necessary as the function gets called from
kprobes-unsafe exception handling stubs (i.e. which themselves
live in .kprobes.text).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B8F44F.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:57:46 +01:00
Jan Beulich 9fa7266c2f x86: remove leftover unwind annotations
Impact: cleanup

These got left in needlessly when ret_from_fork got simplified.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B8F355.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 11:50:39 +01:00
Ingo Molnar a98fe7f342 Merge branches 'x86/asm', 'x86/debug', 'x86/mm', 'x86/setup', 'x86/urgent' and 'linus' into x86/core 2009-03-12 11:50:15 +01:00
Stuart Bennett afcfe024ae x86: mmiotrace: quieten spurious warning message
This message was being incorrectly emitted when using gdb,
so compile it out by default for now; there will be a
better fix in v2.6.30.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Acked-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-11 21:41:58 +01:00
H. Peter Anvin 5e47c478b0 x86: remove zImage support
Impact: obsolete feature removal

The zImage kernel format has been functionally unused for a very long
time.  It is just barely possible to build a modern kernel that still
fits within the zImage size limit, but it is highly unlikely that
anyone ever uses it.  Furthermore, although it is still supported by
most bootloaders, it has been at best poorly tested (or not tested at
all); some bootloaders are even known to not support zImage at all and
not having even noticed.

Also remove some really obsolete constants that no longer have any
meaning.

LKML-Reference: <49B703D4.1000008@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-11 11:00:00 -07:00
Ingo Molnar 211b3d03c7 x86: work around Fedora-11 x86-32 kernel failures on Intel Atom CPUs
Impact: work around boot crash

Work around Intel Atom erratum AAH41 (probabilistically) - it's triggering
in the field.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-11 18:22:03 +01:00
Akinobu Mita 12074fa107 x86: debug check for kmap_atomic_pfn and iomap_atomic_prot_pfn()
It may be useful for kmap_atomic_pfn() and iomap_atomic_prot_pfn()
to check invalid kmap usage as well as kmap_atomic.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090311143449.GB22244@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-11 15:47:46 +01:00
Akinobu Mita bb6d59ca92 x86: unify kmap_atomic_pfn() and iomap_atomic_prot_pfn()
kmap_atomic_pfn() and iomap_atomic_prot_pfn() are almost same
except pgprot. This patch removes the code duplication for these
two functions.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090311143317.GA22244@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-11 15:47:46 +01:00
Jaswinder Singh Rajput 8229d75438 x86: cpu architecture debug code, build fix, cleanup
move store_ldt outside the CONFIG_PARAVIRT section and
also clean up the code a bit.

Signed-off-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-11 14:52:03 +01:00
Cyrill Gorcunov bb7f5f6c26 x86: shrink __ALIGN and __ALIGN_STR definitions
Impact: cleanup

1) .p2align 4 and .align 16 are the same meaning
   (until a.out format for i386 is used which is
    not our case for CONFIG_X86_ALIGNMENT_16 anyway)

2) having 15 as max allowed bytes to be skipped
   does not make sense on modulo 16

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090309171951.GE9945@localhost>
[ small cleanup, use __stringify(), etc. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-11 12:39:28 +01:00
Ingo Molnar d95c357812 Merge branch 'x86/core' into cpus4096 2009-03-11 10:49:34 +01:00
Ingo Molnar 78b020d035 Merge branches 'x86/cleanups', 'x86/kexec', 'x86/mce2' and 'linus' into x86/core 2009-03-11 10:49:15 +01:00
Ingo Molnar 65a37b29a8 Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu 2009-03-11 10:30:23 +01:00
Ingo Molnar 1d8ce7bc4d Merge branch 'linus' into core/percpu
Conflicts:
	arch/x86/include/asm/fixmap_64.h
2009-03-11 10:29:28 +01:00
Thomas Gleixner bf5172d07a x86: convert obsolete irq_desc_t typedef to struct irq_desc
Impact: cleanup

Convert the last remaining users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-03-11 09:49:01 +01:00
Benjamin Herrenschmidt e14eee56c2 Merge commit 'origin/master' into next 2009-03-11 17:10:07 +11:00
KOSAKI Motohiro 5490fa9673 x86, mce: use round_jiffies() instead round_jiffies_relative()
Impact: saving power _very_ little

round_jiffies() round up absolute jiffies to full second.
round_jiffies_relative() round up relative jiffies to full second.

The "t->expires" is absolute jiffies. Then, round_jiffies() should be
used instead round_jiffies_relative().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-10 22:33:06 -07:00
Huang Ying fee7b0d84c x86, kexec: x86_64: add kexec jump support for x86_64
Impact: New major feature

This patch add kexec jump support for x86_64. More information about
kexec jump can be found in corresponding x86_32 support patch.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-10 18:13:25 -07:00
Huang Ying 5359454701 x86, kexec: x86_64: add identity map for pages at image->start
Impact: Fix corner case that cannot yet occur

image->start may be outside of 0 ~ max_pfn, for example when jumping
back to original kernel from kexeced kenrel. This patch add identity
map for pages at image->start.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-10 18:13:25 -07:00
Huang Ying fef3a7a174 x86, kexec: fix kexec x86 coding style
Impact: Cleanup

Fix some coding style issue for kexec x86.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-10 18:13:25 -07:00
Ingo Molnar 4dd163a051 Merge branches 'tracing/ftrace', 'tracing/textedit' and 'linus' into tracing/core 2009-03-10 22:54:23 +01:00
Linus Torvalds 1abaf3326b Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86 mmiotrace: fix remove_kmmio_fault_pages()
2009-03-10 12:03:30 -07:00
Ingo Molnar f24ade3a33 x86, sched_clock(): mark variables read-mostly
Impact: micro-optimization

There's a number of variables in the sched_clock() path that are
in .data/.bss - but not marked __read_mostly. This creates the
danger of accidental false cacheline sharing with some other,
write-often variable.

So mark them __read_mostly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10 19:02:30 +01:00
Jaswinder Singh Rajput 9b779edf4b x86: cpu architecture debug code
Introduce:

 cat /sys/kernel/debug/x86/cpu/*

for Intel and AMD processors to view / debug the state of each CPU.

By using this we can debug whole range of registers and other
cpu information for debugging purpose and monitor how things
are changing.

This can be useful for developers as well as for users.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1236701373.3387.4.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10 18:39:45 +01:00
Ingo Molnar 8c54436ae9 Merge branches 'sched/cleanups' and 'linus' into sched/core 2009-03-10 16:34:43 +01:00
Masami Hiramatsu 7cf4942704 x86: expand irq-off region in text_poke()
Expand irq-off region to cover fixmap using code and cache synchronizing.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <49B54688.8090403@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10 16:24:23 +01:00
Ingo Molnar 8293dd6f86 Merge branch 'x86/core' into tracing/ftrace
Semantic merge:

  kernel/trace/trace_functions_graph.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10 10:17:48 +01:00
Ingo Molnar 12e87e36e0 Merge branches 'tracing/doc', 'tracing/ftrace', 'tracing/printk' and 'linus' into tracing/core 2009-03-10 09:56:25 +01:00
Stoyan Gaydarov 8c5dfd2551 x86: BUG to BUG_ON changes
Impact: cleanup

Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com>
LKML-Reference: <1236661850-8237-8-git-send-email-stoyboyker@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10 09:55:18 +01:00
Ingo Molnar 467c88fee5 Merge branches 'x86/apic', 'x86/asm', 'x86/fixmap', 'x86/memtest', 'x86/mm', 'x86/urgent', 'linus' and 'core/percpu' into x86/core 2009-03-10 09:26:38 +01:00
Tejun Heo 66c3a75772 percpu: generalize embedding first chunk setup helper
Impact: code reorganization

Separate out embedding first chunk setup helper from x86 embedding
first chunk allocator and put it in mm/percpu.c.  This will be used by
the default percpu first chunk allocator and possibly by other archs.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-10 16:27:48 +09:00
Tejun Heo 6074d5b0a3 percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk()
Impact: cleanup, more flexibility for first chunk init

Non-negative @dyn_size used to be allowed iff @unit_size wasn't auto.
This restriction stemmed from implementation detail and made things a
bit less intuitive.  This patch allows @dyn_size to be specified
regardless of @unit_size and swaps the positions of @dyn_size and
@unit_size so that the parameter order makes more sense (static,
reserved and dyn sizes followed by enclosing unit_size).

While at it, add @unit_size >= PCPU_MIN_UNIT_SIZE sanity check.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-10 16:27:48 +09:00
Tejun Heo e01009833e percpu: make x86 addr <-> pcpu ptr conversion macros generic
Impact: generic addr <-> pcpu ptr conversion macros

There's nothing arch specific about x86 __addr_to_pcpu_ptr() and
__pcpu_ptr_to_addr().  With proper __per_cpu_load and __per_cpu_start
defined, they'll do the right thing regardless of actual layout.

Move these macros from arch/x86/include/asm/percpu.h to mm/percpu.c
and allow archs to override it as necessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-10 16:27:48 +09:00
Linus Torvalds 99adcd9d67 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Add p4-clockmod sysfs-ui removal to feature-removal schedule.
  Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."
2009-03-09 13:23:59 -07:00
Dave Jones 129f8ae9b1 Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."
This reverts commit e088e4c9cd.

Removing the sysfs interface for p4-clockmod was flagged as a
regression in bug 12826.

Course of action:
 - Find out the remaining causes of overheating, and fix them
   if possible. ACPI should be doing the right thing automatically.
   If it isn't, we need to fix that.
 - mark p4-clockmod ui as deprecated
 - try again with the removal in six months.

It's not really feasible to printk about the deprecation, because
it needs to happen at all the sysfs entry points, which means adding
a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-03-09 15:07:33 -04:00
Rusty Russell 6db6a5f3ae lguest: fix for CONFIG_SPARSE_IRQ=y
Impact: remove lots of lguest boot WARN_ON() when CONFIG_SPARSE_IRQ=y

We now need to call irq_to_desc_alloc_cpu() before
set_irq_chip_and_handler_name(), but we can't do that from init_IRQ (no
kmalloc available).

So do it as we use interrupts instead.  Also means we only alloc for
irqs we use, which was the intent of CONFIG_SPARSE_IRQ anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
2009-03-09 10:06:29 +10:30
Rusty Russell cbd88c8e6f lguest: fix crash 'unhandled trap 13 at <native_read_msr_safe>'
Impact: fix lguest boot crash on modern Intel machines

The code in early_init_intel does:

	if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
		u64 misc_enable;

		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);

And that rdmsr faults (not allowed from non-0 PL).  We can get around
this by mugging the family ID part of the cpuid.  5 seems like a good
number.

Of course, this is a hack (how very lguest!).  We could just indicate
that we don't support MSRs, or implement lguest_rdmst.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Patrick McHardy <kaber@trash.net>
2009-03-09 10:06:28 +10:30
Jeremy Fitzhardinge 0feca851c1 x86-32: make sure virt_addr_valid() returns false for fixmap addresses
I found that virt_addr_valid() was returning true for fixmap addresses.

I'm not sure whether pfn_valid() is supposed to include this test,
but there's no harm in being explicit.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B166D6.2080505@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-08 20:03:52 +01:00
Stuart Bennett d0fc63f7bd x86 mmiotrace: fix remove_kmmio_fault_pages()
Impact: fix race+crash in mmiotrace

The list manipulation in remove_kmmio_fault_pages() was broken. If more
than one consecutive kmmio_fault_page was re-added during the grace
period between unregister_kmmio_probe() and remove_kmmio_fault_pages(),
the list manipulation failed to remove pages from the release list.

After a second grace period the pages get into rcu_free_kmmio_fault_pages()
and raise a BUG_ON() kernel crash.

The list manipulation is fixed to properly remove pages from the release
list.

This bug has been present from the very beginning of mmiotrace in the
mainline kernel. It was introduced in 0fd0e3da ("x86: mmiotrace full
patch, preview 1");

An urgent fix for Linus. Tested by Stuart (on 32-bit) and Pekka
(on amd and intel 64-bit systems, nouveau and nvidia proprietary).

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
LKML-Reference: <20090308202135.34933feb@daedalus.pq.iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-08 19:51:23 +01:00
Yinghai Lu e954ef20c2 x86: fix warning about nodeid
Impact: cleanup

Ingo found there warning about nodeid with some configs.

try to use for_each_online_node for non numa too. in that case
nodeid will be 0.

also move out boundary checking from setup_node_bootmem(), so
non-numa config will not check it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49B03069.80001@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-08 19:34:17 +01:00
Wang Chen 8827247ffc x86: don't define __this_fixmap_does_not_exist()
Impact: improve out-of-range fixmap index debugging

Commit "1b42f51630c7eebce6fb780b480731eb81afd325"
defined the __this_fixmap_does_not_exist() function
with a WARN_ON(1) in it.

This causes the linker to not report an error when
__this_fixmap_does_not_exist() is called with a
non-constant parameter.

Ingo defined __this_fixmap_does_not_exist() because he
wanted to get virt addresses of fix memory of nest level
by non-constant index.

But we can fix this and still keep the link-time check:

We can get the four slot virt addresses on link time and
store them to array slot_virt[].

Then we can then refer the slot_virt with non-constant index,
in the ioremap-leak detection code.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
LKML-Reference: <49B2075B.4070509@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-08 17:07:47 +01:00
Yinghai Lu 1f442d70c8 x86: remove smp_apply_quirks()/smp_checks()
Impact: cleanup and code size reduction on 64-bit

This code is only applied to Intel Pentium and AMD K7 32-bit cpus.

Move those checks to intel_init()/amd_init() for 32-bit
so 64-bit will not build this code.

Also change to use cpu_index check to see if we need to emit warning.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B377D2.8030108@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-08 16:22:56 +01:00
Cliff Wickman 3a450de136 x86: UV: remove uv_flush_tlb_others() WARN_ON
In uv_flush_tlb_others() (arch/x86/kernel/tlb_uv.c),
the "WARN_ON(!in_atomic())" fails if CONFIG_PREEMPT is not enabled.

And CONFIG_PREEMPT is not enabled by default in the distribution that
most UV owners will use.

We could #ifdef CONFIG_PREEMPT the warning, but that is not good form.
And there seems to be no suitable fix to in_atomic() when CONFIG_PREMPT
is not on.

As Ingo commented:

  > and we have no proper primitive to test for atomicity. (mainly
  > because we dont know about atomicity on a non-preempt kernel)

So we drop the WARN_ON.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-08 11:17:15 +01:00
Cyrill Gorcunov 7ab152470e x86: linkage.h - guard assembler specifics by __ASSEMBLY__
Stephen Rothwell reported:

|Today's linux-next build (x86_64 allmodconfig) produced this warning:
|
|In file included from drivers/char/epca.c:49:
|drivers/char/digiFep1.h:7:1: warning: "GLOBAL" redefined
|In file included from include/linux/linkage.h:5,
|                 from include/linux/kernel.h:11,
|                 from arch/x86/include/asm/system.h:10,
|                 from arch/x86/include/asm/processor.h:17,
|                 from include/linux/prefetch.h:14,
|                 from include/linux/list.h:6,
|                 from include/linux/module.h:9,
|                 from drivers/char/epca.c:29:
|arch/x86/include/asm/linkage.h:55:1: warning: this is the location of the previous definition
|
|Probably introduced by commit 95695547a7
|("x86: asm linkage - introduce GLOBAL macro") from the x86 tree.

Any assembler specific snippets being placed in headers
are to be protected by __ASSEMBLY__. Fixed.

Also move __ALIGN definition under the same protection as well.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090306160833.GB7420@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 17:14:12 +01:00
Masami Hiramatsu 78ff7fae04 x86: implement atomic text_poke() via fixmap
Use fixmaps instead of vmap/vunmap in text_poke() for avoiding
page allocation and delayed unmapping.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
LKML-Reference: <49B14352.2040705@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 16:49:01 +01:00
Masami Hiramatsu 3945dab45a tracing, Text Edit Lock - SMP alternatives support
Use the mutual exclusion provided by the text edit lock in alternatives code.
Since alternative_smp_* will be called from module init code, etc,
we'd better protect it from other subsystems.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <49B14332.9030109@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 16:49:00 +01:00
Ingo Molnar f0ef039851 Merge branch 'x86/core' into tracing/textedit
Conflicts:
	arch/x86/Kconfig
	block/blktrace.c
	kernel/irq/handle.c

Semantic conflict:
	kernel/trace/blktrace.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 16:45:01 +01:00
Markus Metzger 73bf1b62f5 x86, pebs: correct qualifier passed to ds_write_config() from ds_request_pebs()
ds_write_config() can write the BTS as well as the PEBS part of
the DS config. ds_request_pebs() passes the wrong qualifier, which
results in the wrong configuration to be written.

Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
LKML-Reference: <20090305085721.A22550@sedona.ch.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 16:13:15 +01:00
Markus Metzger 9ca0791dca x86, bts: remove bad warning
In case a ptraced task is reaped (while the tracer is still attached),
ds_exit_thread() is called before ptrace_exit(). The latter will
release the bts_tracer and remove the thread's ds_ctx.
The former will WARN() if the context is not NULL.

Oleg Nesterov submitted patches that move ptrace_exit() before
exit_thread() and thus reverse the order of the above calls.

Remove the bad warning. I will add it again when Oleg's changes are in.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
LKML-Reference: <20090305084954.A22000@sedona.ch.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 16:13:15 +01:00
Pekka Enberg 5dd61dfabc x86: rename do_not_nx to disable_nx in mm/init_64.c
As a preparational step for unifying noexec handling on 32-bit and 64-bit,
rename the do_not_nx variable to disable_nx on 64-bit.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236265497.31324.11.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 15:25:52 +01:00
Pekka Enberg c77a3b59c6 x86: fix uninitialized variable in init_memory_mapping()
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236265466.31324.9.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 15:25:52 +01:00
Yinghai Lu d1a8e77920 x86: make "memtest" like "memtest=17"
Impact: make boot command line "memtest" do one loop by default

So don't need to guess many patterns in one loop.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B10532.3020105@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 12:16:43 +01:00
Tejun Heo 6b19b0c240 x86, percpu: setup reserved percpu area for x86_64
Impact: fix relocation overflow during module load

x86_64 uses 32bit relocations for symbol access and static percpu
symbols whether in core or modules must be inside 2GB of the percpu
segement base which the dynamic percpu allocator doesn't guarantee.
This patch makes x86_64 reserve PERCPU_MODULE_RESERVE bytes in the
first chunk so that module percpu areas are always allocated from the
first chunk which is always inside the relocatable range.

This problem exists for any percpu allocator but is easily triggered
when using the embedding allocator because the second chunk is located
beyond 2GB on it.

This patch also changes the meaning of PERCPU_DYNAMIC_RESERVE such
that it only indicates the size of the area to reserve for dynamic
allocation as static and dynamic areas can be separate.  New
PERCPU_DYNAMIC_RESERVED is increased by 4k for both 32 and 64bits as
the reserved area separation eats away some allocatable space and
having slightly more headroom (currently between 4 and 8k after
minimal boot sans module area) makes sense for common case
performance.

x86_32 can address anywhere from anywhere and doesn't need reserving.

Mike Galbraith first reported the problem first and bisected it to the
embedding percpu allocator commit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
2009-03-06 14:33:59 +09:00
Tejun Heo edcb463997 percpu, module: implement reserved allocation and use it for module percpu variables
Impact: add reserved allocation functionality and use it for module
	percpu variables

This patch implements reserved allocation from the first chunk.  When
setting up the first chunk, arch can ask to set aside certain number
of bytes right after the core static area which is available only
through a separate reserved allocator.  This will be used primarily
for module static percpu variables on architectures with limited
relocation range to ensure that the module perpcu symbols are inside
the relocatable range.

If reserved area is requested, the first chunk becomes reserved and
isn't available for regular allocation.  If the first chunk also
includes piggy-back dynamic allocation area, a separate chunk mapping
the same region is created to serve dynamic allocation.  The first one
is called static first chunk and the second dynamic first chunk.
Although they share the page map, their different area map
initializations guarantee they serve disjoint areas according to their
purposes.

If arch doesn't setup reserved area, reserved allocation is handled
like any other allocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06 14:33:59 +09:00
Tejun Heo 9a4f8a878b x86: make embedding percpu allocator return excessive free space
Impact: reduce unnecessary memory usage on certain configurations

Embedding percpu allocator allocates unit_size *
smp_num_possible_cpus() bytes consecutively and use it for the first
chunk.  However, if the static area is small, this can result in
excessive prellocated free space in the first chunk due to
PCPU_MIN_UNIT_SIZE restriction.

This patch makes embedding percpu allocator preallocate only what's
necessary as described by PERPCU_DYNAMIC_RESERVE and return the
leftover to the bootmem allocator.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06 14:33:59 +09:00
Tejun Heo cafe8816b2 percpu: use negative for auto for pcpu_setup_first_chunk() arguments
Impact: argument semantic cleanup

In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant
auto-sizing.  It's okay for @unit_size as 0 doesn't make sense but 0
dynamic reserve size is valid.  Alos, if arch @dyn_size is calculated
from other parameters, it might end up passing in 0 @dyn_size and
malfunction when the size is automatically adjusted.

This patch makes both @unit_size and @dyn_size ssize_t and use -1 for
auto sizing.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06 14:33:59 +09:00
Ingo Molnar 31bbed527e Merge branch 'x86/uv' into x86/core 2009-03-05 21:49:47 +01:00
Ingo Molnar 28e93a005b Merge branch 'x86/mm' into x86/core 2009-03-05 21:49:35 +01:00
Ingo Molnar caab36b593 Merge branch 'x86/mce2' into x86/core 2009-03-05 21:49:25 +01:00
Ingo Molnar a1413c89ae Merge branch 'x86/urgent' into x86/core
Conflicts:
	arch/x86/include/asm/fixmap_64.h
Semantic merge:
	arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 21:48:50 +01:00
Ingo Molnar b2b352590d x86: UV, SGI RTC: add generic system vector, build fix on UP
Make ack_APIC_irq() build on !SMP && !APIC too.

Cc: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20090304185605.GA24419@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 15:15:56 +01:00
Jeremy Fitzhardinge ed26dbe5ae x86: pre-initialize boot_cpu_data.x86_phys_bits to avoid system_state tests
Impact: cleanup, micro-optimization

Pre-initialize boot_cpu_data.x86_phys_bits to a reasonable default
to remove the use of system_state tests in __virt_addr_valid()
and __phys_addr().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:53:43 +01:00
Jeremy Fitzhardinge dc16ecf7fd x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid
Rather than relying on the ever-unreliable system_state,
add a specific __vmalloc_start_set flag to indicate whether
the vmalloc area has meaningful boundaries yet, and use that
in x86-32's __phys_addr and __virt_addr_valid.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:53:10 +01:00
Jeremy Fitzhardinge a964e33c5d x86: clean up old gcc warnings
gcc 3.2.2 reports:

In file included from /usr/src/all/linux-next/arch/x86/include/asm/page.h:8,
                 from /usr/src/all/linux-next/arch/x86/include/asm/processor.h:18,
                 from /usr/src/all/linux-next/arch/x86/include/asm/atomic_32.h:6,
                 from /usr/src/all/linux-next/arch/x86/include/asm/atomic.h:2,
                 from include/linux/crypto.h:20,
                 from arch/x86/kernel/asm-offsets_32.c:7,
                 from arch/x86/kernel/asm-offsets.c:2:
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:54: warning: parameter has incomplete type
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:56: warning: parameter has incomplete type
In file included from /usr/src/all/linux-next/arch/x86/include/asm/page.h:8,
                 from /usr/src/all/linux-next/arch/x86/include/asm/processor.h:18,
                 from include/linux/prefetch.h:14,
                 from include/linux/list.h:6,
                 from include/linux/module.h:9,
                 from init/main.c:13:
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:54: warning: parameter has incomplete type
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:56: warning: parameter has incomplete type

This is a bogus warning, but moving the pat-related functions
into asm/pat.h and including asm/pgtable_types.h should fix it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:50:55 +01:00
Ingo Molnar 62436fe9ee x86: move init_memory_mapping() to common mm/init.c, build fix on 32-bit PAE
Impact: build fix

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-14-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:39:03 +01:00
Pekka Enberg 4fcb208391 x86: move function and variable declarations to asm/init.h
Impact: cleanup

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-17-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:18 +01:00
Pekka Enberg e53fb04fce x86: unify kernel_physical_mapping_init() function signatures
Impact: cleanup

In preparation for moving the function declaration to a header file,
unify 32-bit and 64-bit signatures.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-16-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:18 +01:00
Pekka Enberg 298af9d89f x86: fix up some bad global variable names in mm/init.c
Impact: cleanup

The table_start, table_end, and table_top are too generic for global
namespace so rename them to be more specific.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-15-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:17 +01:00
Pekka Enberg f765090a26 x86: move init_memory_mapping() to common mm/init.c
Impact: cleanup

This patch moves the init_memory_mapping() function to common mm/init.c.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-14-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:17 +01:00
Pekka Enberg 0c0f756fd6 x86: add stub init_gbpages() for 32-bit init_memory_mapping()
Impact: cleanup

This patch adds an empty static inline init_gbpages() for the 32-bit
version of init_memory_mapping() making both versions identical.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-13-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:16 +01:00
Pekka Enberg b47e3418c5 x86: ifdef 32-bit and 64-bit NR_RANGE_MR for save_mr() unification
Impact: cleanup

As a trivial preparation for moving common code to arc/x86/mm/init.c,
ifdef the 32-bit and 64-bit versions of NR_RANGE_MR.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-12-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:16 +01:00
Pekka Enberg c338d6f60f x86: ifdef 32-bit and 64-bit pfn setup in init_memory_mapping()
Impact: cleanup

To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), ifdef configuration specific pfn setup
code in the function.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-11-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:15 +01:00
Pekka Enberg 01ced9ec14 x86: ifdef 32-bit and 64-bit setup in init_memory_mapping()
Impact: cleanup

To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), ifdef configuration specific setup code
in the function.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-10-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:15 +01:00
Pekka Enberg d58e854e36 x86: add table start and end sanity checks to 32-bit init_memory_mapping()
Impact: cleanup

This patch adds a sanity check to the 32-bit version of
init_memory_mapping() to reduce the diff to the 64-bit version.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-9-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:14 +01:00
Pekka Enberg cbba65796d x86: unify kernel_physical_mapping_init() call in init_memory_mapping()
Impact: cleanup

The 64-bit version of init_memory_mapping() uses the last mapped
address returned from kernel_physical_mapping_init() whereas the
32-bit version doesn't. This patch adds relevant ifdefs to both
versions of the function to reduce the diff between them.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-8-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:14 +01:00
Pekka Enberg c464573cb3 x86: rename after_init_bootmem to after_bootmem in mm/init_32.c
Impact: cleanup

This patch renames after_init_bootmem to after_bootmem in
mm/init_32.c to reduce the diff to the 64-bit version of of
init_memory_mapping().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-7-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:13 +01:00
Pekka Enberg 96083ca11b x86: remove unnecessary save_mr() sanity check
Impact: cleanup

The save_mr() function already checks that start_pfn is less than
end_pfn so we can remove the unnecessary check which reduces the
diff between the 32-bit and the 64-bit versions of init_memory_mapping().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-6-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:13 +01:00
Pekka Enberg 54e63f3a42 x86: ifdef 32-bit specific setup in init_memory_mapping()
Impact: cleanup

Enabling NX, PSE, and PGE are only required on 32-bit so ifdef them
in both versions of the function.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-5-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:12 +01:00
Pekka Enberg e7179853e7 x86: move pgd_base out of init_memory_mapping()
Impact: cleanup

This patch moves pgd_base out of init_memory_mapping() to reduce
the diff between the 32-bit version and the 64-bit version of the
function.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-4-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:12 +01:00
Pekka Enberg 49a2bf7303 x86: find_early_table_space() unification
Impact: cleanup

There are some minor differences between the 32-bit and 64-bit
find_early_table_space() functions. This patch wraps those
differences under CONFIG_X86_32 to make the function identical
on both configurations.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-3-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:11 +01:00
Pekka Enberg 4bbd4fa038 x86: add gbpages support to 32-bit init_memory_mapping()
Impact: cleanup

To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), add gbpages support to the 32-bit version.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-2-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:11 +01:00
Pekka Enberg c3f5d2d8b5 x86: init_memory_mapping() trivial cleanups
Impact: cleanup

To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), fix up all trivial issues.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-1-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:10 +01:00
Ingo Molnar 0bd5c4f7c8 Merge branch 'iommu/fixes-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2009-03-05 12:47:34 +01:00
Ingo Molnar 7df4edb07c Merge branch 'linus' into core/iommu 2009-03-05 12:47:28 +01:00
Frederic Weisbecker 0012693ad4 tracing/function-graph-tracer: use the more lightweight local clock
Impact: decrease hangs risks with the graph tracer on slow systems

Since the function graph tracer can spend too much time on timer
interrupts, it's better now to use the more lightweight local
clock. Anyway, the function graph traces are more reliable on a
per cpu trace.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 12:14:41 +01:00
FUJITA Tomonori 5f812de63c AMD IOMMU: remove unnecessary ifdef
We try to avoid this type of ifdef and we can safely remove this
ifdef.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-03-05 12:09:41 +01:00
Ingo Molnar 49d2d266ad Merge commit 'v2.6.29-rc7' into sched/core 2009-03-05 11:59:10 +01:00
Dimitri Sivanich 1400b3faab x86: UV, SGI RTC: fix uv_time.c for UP
Fix non-smp build of uv_time.c.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20090304220246.GC6288@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 11:27:49 +01:00
David S. Miller 508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
Dave Jones 36e8abf3ed [CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor.
The latency of p4-clockmod sucks so hard that scaling on a regular
basis with ondemand is a really bad idea.

Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-03-05 00:16:26 -05:00
Yinghai Lu fc5efe3941 x86: fix bootmem cross node for 32bit numa, cleanup
Impact: clean up

Simplify the code, reuse some lines.
Remove min_low_pfn reference, it is always 0

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AEE2C4.2030602@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 22:09:59 +01:00
Pekka Enberg 731ddea636 x86: move free_initrd_mem() to common mm/init.c
Impact: cleanup

The function is identical on 32-bit and 64-bit configurations so move it to the
common mm/init.c file.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236158020.29024.28.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:59:26 +01:00
Leann Ogasawara dd4124a8a0 x86: add Dell XPS710 reboot quirk
Dell XPS710 will hang on reboot.  This is resolved by adding a quirk to
set bios reboot.

Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Cc: "manoj.iyer" <manoj.iyer@canonical.com>
Cc: <stable@kernel.org>
LKML-Reference: <1236196380.3231.89.camel@emiko>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:56:15 +01:00
Yinghai Lu f62432395e x86: reserve exact size of mptable
Impact: save a bit of RAM

Get the exact size for the reserve_bootmem() call.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE4922.605@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:55:04 +01:00
Yinghai Lu 8d4dd919b4 x86: ioremap mptable
Impact: fix boot with mptable above max_low_mapped

Try to use early_ioremap() to map MPC to make sure it works even it is
at the end of ram.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE4901.3090801@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-and-tested-by: Kevin O'Connor <kevin@koconnor.net>
2009-03-04 20:55:04 +01:00
Yinghai Lu b68adb16f2 x86: make 32-bit init_memory_mapping range change more like 64-bit
Impact: cleanup

make code more readable and more like 64-bit

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE48B4.8010907@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:55:03 +01:00
Yinghai Lu a71edd1f46 x86: fix bootmem cross node for 32bit numa
Impact: fix panic on system 2g x4 sockets

Found one system with 4 sockets and every sockets has 2g can not boot
with numa32 because boot mem is crossing nodes.

So try to have numa version of setup_bootmem_allocator().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE485B.8000902@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:55:03 +01:00
Daniel Glöckner ab9e18587f x86, math-emu: fix init_fpu for task != current
Impact: fix math-emu related crash while using GDB/ptrace

init_fpu() calls finit to initialize a task's xstate, while finit always
works on the current task. If we use PTRACE_GETFPREGS on another
process and both processes did not already use floating point, we get
a null pointer exception in finit.

This patch creates a new function finit_task that takes a task_struct
parameter. finit becomes a wrapper that simply calls finit_task with
current. On the plus side this avoids many calls to get_current which
would each resolve to an inline assembler mov instruction.

An empty finit_task has been added to i387.h to avoid linker errors in
case the compiler still emits the call in init_fpu when
CONFIG_MATH_EMULATION is not defined.

The declaration of finit in i387.h has been removed as the remaining
code using this function gets its prototype from fpu_proto.h.

Signed-off-by: Daniel Glöckner <dg@emlix.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Bill Metzenthen <billm@melbpc.org.au>
LKML-Reference: <E1Lew31-0004il-Fg@mailer.emlix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:33:16 +01:00
Dimitri Sivanich 5ab5ab3449 x86: UV, SGI RTC: add UV RTC clocksource/clockevents
This patch provides a high resolution clock/timer source using the
SGI UV system-wide synchronized RTC clock/timer hardware.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
LKML-Reference: <20090304185918.GC24419@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:25:38 +01:00
Dimitri Sivanich 8661984f62 x86: UV, SGI RTC: loop through installed UV blades
Add macro to loop through each possible blade.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
LKML-Reference: <20090304185719.GB24419@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:25:37 +01:00
Dimitri Sivanich acaabe795a x86: UV, SGI RTC: add generic system vector
This patch allocates a system interrupt vector for various platform
specific uses.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
LKML-Reference: <20090304185605.GA24419@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:25:37 +01:00
Ingo Molnar 6d2e91bf80 Merge branch 'x86/urgent' into x86/mm
Conflicts:
	arch/x86/include/asm/fixmap_64.h
Semantic merge:
	arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:20:10 +01:00
Huang Ying dd39ecf522 x86: EFI: Back efi_ioremap with init_memory_mapping instead of FIX_MAP
Impact: Fix boot failure on EFI system with large runtime memory range

Brian Maly reported that some EFI system with large runtime memory
range can not boot. Because the FIX_MAP used to map runtime memory
range is smaller than run time memory range.

This patch fixes this issue by re-implement efi_ioremap() with
init_memory_mapping().

Reported-and-tested-by: Brian Maly <bmaly@redhat.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Brian Maly <bmaly@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236135513.6204.306.camel@yhuang-dev.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 19:20:16 +01:00
Pekka Enberg 6298e719cf x86: set_highmem_pages_init() cleanup, #2
Impact: cleanup

The zones are set up at this stage so there's a highmem zone
available even for the UMA case.

The only difference there is that for machines that have
CONFIG_HIGHMEM enabled but don't have any highmem available,
->zone_start_pfn is zero whereas highstart_pfn is non-zero).

The field is left zeroed because of the !size test in
free_area_init_core() but shouldn't be a problem because
add_highpages_with_active_regions() handles empty ranges just
fine.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Mel Gorman <mel@csn.ul.ie>
LKML-Reference: <1236154567.29024.23.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 19:00:51 +01:00
Brian Maly ff0c087490 x86: fix DMI on EFI
Impact: reactivate DMI quirks on EFI hardware

DMI tables are loaded by EFI, so the dmi calls must happen after
efi_init() and not before.

Currently Apple hardware uses DMI to determine the framebuffer mappings
for efifb. Without DMI working you also have no video on MacBook Pro.

This patch resolves the DMI issue for EFI hardware (DMI is now properly
detected at boot), and additionally efifb now loads on Apple hardware
(i.e. video works).

Signed-off-by: Brian Maly <bmaly@redhat>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: ying.huang@intel.com
LKML-Reference: <49ADEDA3.1030406@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 arch/x86/kernel/setup.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
2009-03-04 18:55:56 +01:00
Ingo Molnar 73af76dfd1 x86, mce: fix build failure in arch/x86/kernel/cpu/mcheck/threshold.c
Impact: build fix

The APIC code rewrite in the x86 tree broke the x86/mce branch:

 arch/x86/kernel/cpu/mcheck/threshold.c: In function ‘mce_threshold_interrupt’:
 arch/x86/kernel/cpu/mcheck/threshold.c:24: error: implicit declaration of function ‘ack_APIC_irq’

Also tidy up the file a bit while at it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 11:48:28 +01:00
Pekka Enberg 540aca06b7 x86: move devmem_is_allowed() to common mm/init.c
Impact: cleanup

The function is identical on 32-bit and 64-bit configurations so move
it to the common mm/init.c file.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236160001.29024.29.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 11:40:04 +01:00
Ingo Molnar a1be621dfa Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core 2009-03-04 11:14:47 +01:00
H. Peter Anvin 2e22ea7cea Merge branch 'x86/core' into x86/mce2 2009-03-03 21:05:42 -08:00
Jeremy Fitzhardinge f254f3909e x86: un-__init fill_pud/pmd/pte
They are used by __set_fixmap->set_pte_vaddr_pud, which can
be used by arch_setup_additional_pages(), and so is used
after init.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 02:29:36 +01:00
Jeremy Fitzhardinge 4e8304758c x86: remove vestigial fix_ioremap prototypes
The function seems to have disappeared at some point, leaving
some vestigial prototypes behind...

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 02:29:32 +01:00
Ingo Molnar 91d75e209b Merge branch 'x86/core' into core/percpu 2009-03-04 02:29:19 +01:00
Ingo Molnar 8b0e5860cb Merge branches 'x86/apic', 'x86/cpu', 'x86/fixmap', 'x86/mm', 'x86/sched', 'x86/setup-lzma', 'x86/signal' and 'x86/urgent' into x86/core 2009-03-04 02:22:31 +01:00
Linus Torvalds 3024e4a997 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: oprofile: don't set counter width from cpuid on Core2
  x86: fix init_memory_mapping() to handle small ranges
2009-03-03 14:32:55 -08:00
Linus Torvalds f2a4165526 Merge branch 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86 mmiotrace: fix race with release_kmmio_fault_page()
  x86 mmiotrace: improve handling of secondary faults
  x86 mmiotrace: split set_page_presence()
  x86 mmiotrace: fix save/restore page table state
  x86 mmiotrace: WARN_ONCE if dis/arming a page fails
  x86: add far read test to testmmiotrace
  x86: count errors in testmmiotrace.ko
2009-03-03 14:32:37 -08:00
Ingo Molnar 03787ceed8 x86: set_highmem_pages_init() cleanup, fix !CONFIG_NUMA && CONFIG_HIGHMEM=y
Impact: build fix

 arch/x86/mm/highmem_32.c:187: error: static declaration of 'set_highmem_pages_init' follows non-static declaration
 arch/x86/include/asm/numa_32.h:8: error: previous declaration of 'set_highmem_pages_init' was here

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236082212.2675.24.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 15:32:24 +01:00
Pekka Enberg 867c5b5292 x86: set_highmem_pages_init() cleanup
Impact: cleanup

This patch moves set_highmem_pages_init() to arch/x86/mm/highmem_32.c.

The declaration of the function is kept in asm/numa_32.h because
asm/highmem.h is included only if CONFIG_HIGHMEM is enabled so we
can't put the empty static inline function there.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236082212.2675.24.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 13:13:15 +01:00
Pekka Enberg e5b2bb5527 x86: unify free_init_pages() and free_initmem()
Impact: unification

This patch introduces a common arch/x86/mm/init.c and moves the identical
free_init_pages() and free_initmem() functions to the file.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236078906.2675.18.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:21:18 +01:00
Pekka Enberg e087edd8c0 x86: make sure initmem is writable on 64-bit
Impact: unification

This patch ports commit 3c1df68b84 ("x86: make
sure initmem is writable") to the 64-bit version to unify implementations of
free_init_pages().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <1236078904.2675.17.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:21:18 +01:00
Pekka Enberg 05f209e7b9 x86: add sanity checks to init_32.c
Impact: unification

This patch adds sanity checks that are already in init_64.c to init_32.c.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236078902.2675.16.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:21:17 +01:00
Pekka Enberg fd578f9c0a x86: use roundup() instead of PAGE_ALIGN() in find_early_table_space()
Impact: cleanup

This patch changes find_early_table_space() to use roundup() for rounding up
tables to page size to unify the common parts of the 32-bit and 64-bit
implementations.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236077705.2675.6.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:07:00 +01:00
Pekka Enberg 2b688dfd0a x86: move __VMALLOC_RESERVE to pgtable_32.c
Impact: cleanup

The __VMALLOC_RESERVE global variable is not used in init_32.c. Move that to
pgtable_32.c to reduce the diff between init_32.c and init_64.c.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236077704.2675.4.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:06:59 +01:00
Tim Blechmann 780eef9492 x86: oprofile: don't set counter width from cpuid on Core2
Impact: fix stuck NMIs and non-working oprofile on certain CPUs

Resetting the counter width of the performance counters on Intel's
Core2 CPUs, breaks the delivery of NMIs, when running in x86_64 mode.

This should fix bug #12395:

  http://bugzilla.kernel.org/show_bug.cgi?id=12395

Signed-off-by: Tim Blechmann <tim@klingt.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <20090303100412.GC10085@erda.amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:04:22 +01:00
Hiroshi Shimamoto 2505170211 x86, signals: fix xine & firefox bustage
Impact: fix bad frame in rt_sigreturn on 64-bit

After commit 97286a2b64 some applications
fail to return from signal handler:

[  145.150133] firefox[3250] bad frame in rt_sigreturn frame:00007f902b44eb28 ip:352e80b307 sp:7f902b44ef70 orax:ffffffffffffffff in libpthread-2.9.so[352e800000+17000]
[  665.519017] firefox[5420] bad frame in rt_sigreturn frame:00007faa8deaeb28 ip:352e80b307 sp:7faa8deaef70 orax:ffffffffffffffff in libpthread-2.9.so[352e800000+17000]

The root cause is forgetting to keep 64 byte aligned value of
fpstate for next stack pointer calculation.

Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
LKML-Reference: <49AC85C1.7060600@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 09:03:12 +01:00
Yinghai Lu 0fc59d3a01 x86: fix init_memory_mapping() to handle small ranges
Impact: fix failed EFI bootup in certain circumstances

Ying Huang found init_memory_mapping() has problem with small ranges
less than 2M when he tried to direct map the EFI runtime code out of
max_low_pfn_mapped.

It turns out we never considered that case and didn't check the range...

Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Maly <bmaly@redhat.com>
LKML-Reference: <49ACDDED.1060508@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 08:50:22 +01:00
Linus Torvalds 2d44947a56 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  fix warning in io_mapping_map_wc()
  x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()
2009-03-02 15:47:01 -08:00
Roland McGrath 5b1017404a x86-64: seccomp: fix 32/64 syscall hole
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table.  The fix is simple: test TS_COMPAT
instead of TIF_IA32.  Here is an example exploit:

	/* test case for seccomp circumvention on x86-64

	   There are two failure modes: compile with -m64 or compile with -m32.

	   The -m64 case is the worst one, because it does "chmod 777 ." (could
	   be any chmod call).  The -m32 case demonstrates it was able to do
	   stat(), which can glean information but not harm anything directly.

	   A buggy kernel will let the test do something, print, and exit 1; a
	   fixed kernel will make it exit with SIGKILL before it does anything.
	*/

	#define _GNU_SOURCE
	#include <assert.h>
	#include <inttypes.h>
	#include <stdio.h>
	#include <linux/prctl.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <asm/unistd.h>

	int
	main (int argc, char **argv)
	{
	  char buf[100];
	  static const char dot[] = ".";
	  long ret;
	  unsigned st[24];

	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

	#ifdef __x86_64__
	  assert ((uintptr_t) dot < (1UL << 32));
	  asm ("int $0x80 # %0 <- %1(%2 %3)"
	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
	  ret = snprintf (buf, sizeof buf,
			  "result %ld (check mode on .!)\n", ret);
	#elif defined __i386__
	  asm (".code32\n"
	       "pushl %%cs\n"
	       "pushl $2f\n"
	       "ljmpl $0x33, $1f\n"
	       ".code64\n"
	       "1: syscall # %0 <- %1(%2 %3)\n"
	       "lretl\n"
	       ".code32\n"
	       "2:"
	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
	  if (ret == 0)
	    ret = snprintf (buf, sizeof buf,
			    "stat . -> st_uid=%u\n", st[7]);
	  else
	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
	#else
	# error "not this one"
	#endif

	  write (1, buf, ret);

	  syscall (__NR_exit, 1);
	  return 2;
	}

Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
  at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:41:30 -08:00
Roland McGrath ccbe495caa x86-64: syscall-audit: fix 32/64 syscall hole
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases, audit_syscall_entry() will use the wrong system
call number table and the wrong system call argument registers.  This
could be used to circumvent a syscall audit configuration that filters
based on the syscall numbers or argument details.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:41:30 -08:00
Ingo Molnar fdfa66ab45 Merge branches 'tracing/ftrace', 'tracing/mmiotrace' and 'linus' into tracing/core 2009-03-02 22:37:35 +01:00
Jeremy Fitzhardinge 9976b39b50 xen: deal with virtually mapped percpu data
The virtually mapped percpu space causes us two problems:

 - for hypercalls which take an mfn, we need to do a full pagetable
   walk to convert the percpu va into an mfn, and

 - when a hypercall requires a page to be mapped RO via all its aliases,
   we need to make sure its RO in both the percpu mapping and in the
   linear mapping

This primarily affects the gdt and the vcpu info structure.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 12:58:19 +01:00
Jeremy Fitzhardinge 2fb6b2a048 x86: add forward decl for tss_struct
Its the correct thing to do before using the struct in a prototype.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 12:07:49 +01:00
Jeremy Fitzhardinge 389d1fb11e x86: unify chunks of kernel/process*.c
With x86-32 and -64 using the same mechanism for managing the
tss io permissions bitmap, large chunks of process*.c are
trivially unifyable, including:

 - exit_thread
 - flush_thread
 - __switch_to_xtra (along with tsc enable/disable)

and as bonus pickups:

 - sys_fork
 - sys_vfork

(Note: asmlinkage expands to empty on x86-64)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 12:07:48 +01:00
Jeremy Fitzhardinge db949bba3c x86-32: use non-lazy io bitmap context switching
Impact: remove 32-bit optimization to prepare unification

x86-32 and -64 differ in the way they context-switch tasks
with io permission bitmaps.  x86-64 simply copies the next
tasks io bitmap into place (if any) on context switch.  x86-32
invalidates the bitmap on context switch, so that the next
IO instruction will fault; at that point it installs the
appropriate IO bitmap.

This makes context switching IO-bitmap-using tasks a bit more
less expensive, at the cost of making the next IO instruction
slower due to the extra fault.  This tradeoff only makes sense
if IO-bitmap-using processes are relatively common, but they
don't actually use IO instructions very often.

However, in a typical desktop system, the only process likely
to be using IO bitmaps is the X server, and nothing at all on
a server.  Therefore the lazy context switch doesn't really win
all that much, and its just a gratuitious difference from
64-bit code.

This patch removes the lazy context switch, with a view to
unifying this code in a later change.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 12:07:48 +01:00
Ingo Molnar 5512b3ece0 Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/core 2009-03-02 12:02:36 +01:00
Jiri Slaby b6122b3843 x86_32: apic/numaq_32, fix section mismatch
Remove __cpuinitdata section placement for translation_table
structure, since it is referenced from a functions within .text.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
2009-03-02 12:00:25 +01:00
Jiri Slaby 2fcb1f1f38 x86_32: apic/summit_32, fix section mismatch
Remove __init section placement for some functions/data, so that
we don't get section mismatch warnings.

Also make inline function instead of empty setup_summit macro.

[v2]
One of them was not caught by
DEBUG_SECTION_MISMATCH=y
magic. Fix it.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
2009-03-02 12:00:25 +01:00
Jiri Slaby 871d78c6d9 x86_32: apic/es7000_32, fix section mismatch
Remove __init section placement for some functions, so that we don't
get section mismatch warnings.

[v2]:
2 of them were not caught by
DEBUG_SECTION_MISMATCH=y
magic. Fix it.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
2009-03-02 12:00:24 +01:00
Jiri Slaby fae176d6e0 x86_32: apic/summit_32, fix cpu_mask_to_apicid
Perform same-cluster checking even for masks with all (nr_cpu_ids)
bits set and report correct apicid on success instead.

While at it, convert it to for_each_cpu and newer cpumask api.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 11:20:34 +01:00
Jiri Slaby 0edc0b324a x86_32: apic/es7000_32, fix cpu_mask_to_apicid
Perform same-cluster checking even for masks with all (nr_cpu_ids)
bits set and report BAD_APICID on failure.

While at it, convert it to for_each_cpu.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 11:20:33 +01:00
Jiri Slaby c2b20cbd05 x86_32: apic/es7000_32, cpu_mask_to_apicid cleanup
Remove es7000_cpu_mask_to_apicid_cluster completely, because it's
almost the same as es7000_cpu_mask_to_apicid except 2 code paths.
One of them is about to be removed soon, the another should be
BAD_APICID (it's a fail path).

The _cluster one was not invoked on apic->cpu_mask_to_apicid_and
anyway, since there was no _cluster_and variant.

Also use newer cpumask functions.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 11:20:33 +01:00
Jiri Slaby 9694cd6c17 x86_32: apic/bigsmp_32, de-inline functions
The ones which go only into struct apic are de-inlined
by compiler anyway, so remove the inline specifier from them.

Afterwards, remove bigsmp_setup_portio_remap completely as it
is unused.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 11:20:32 +01:00
Ingo Molnar f180053694 x86, mm: dont use non-temporal stores in pagecache accesses
Impact: standardize IO on cached ops

On modern CPUs it is almost always a bad idea to use non-temporal stores,
as the regression in this commit has shown it:

  30d697f: x86: fix performance regression in write() syscall

The kernel simply has no good information about whether using non-temporal
stores is a good idea or not - and trying to add heuristics only increases
complexity and inserts fragility.

The regression on cached write()s took very long to be found - over two
years. So dont take any chances and let the hardware decide how it makes
use of its caches.

The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
absolutely sure that another entity (the GPU) will pick up the dirty
data immediately and that the CPU will not touch that data before the
GPU will.

Also, keep the _nocache() primitives to make it easier for people to
experiment with these details. There may be more clear-cut cases where
non-cached copies can be used, outside of filemap.c.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 11:06:49 +01:00
Pekka Paalanen 340430c572 x86 mmiotrace: fix race with release_kmmio_fault_page()
There was a theoretical possibility to a race between arming a page in
post_kmmio_handler() and disarming the page in
release_kmmio_fault_page():

cpu0                             cpu1
------------------------------------------------------------------
mmiotrace shutdown
enter release_kmmio_fault_page
                                 fault on the page
                                 disarm the page
disarm the page
                                 handle the MMIO access
                                 re-arm the page
put the page on release list
remove_kmmio_fault_pages()
                                 fault on the page
                                 page not known to mmiotrace
                                 fall back to do_page_fault()
                                 *KABOOM*

(This scenario also shows the double disarm case which is allowed.)

Fixed by acquiring kmmio_lock in post_kmmio_handler() and checking
if the page is being released from mmiotrace.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:37 +01:00
Stuart Bennett 3e39aa156a x86 mmiotrace: improve handling of secondary faults
Upgrade some kmmio.c debug messages to warnings.
Allow secondary faults on probed pages to fall through, and only log
secondary faults that are not due to non-present pages.

Patch edited by Pekka Paalanen.

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:37 +01:00
Pekka Paalanen 0b700a6a25 x86 mmiotrace: split set_page_presence()
From 36772dcb6ffbbb68254cbfc379a103acd2fbfefc Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sat, 28 Feb 2009 21:34:59 +0200

Split set_page_presence() in kmmio.c into two more functions set_pmd_presence()
and set_pte_presence(). Purely code reorganization, no functional changes.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:36 +01:00
Pekka Paalanen 5359b585fb x86 mmiotrace: fix save/restore page table state
From baa99e2b32449ec7bf147c234adfa444caecac8a Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sun, 22 Feb 2009 20:02:43 +0200

Blindly setting _PAGE_PRESENT in disarm_kmmio_fault_page() overlooks the
possibility, that the page was not present when it was armed.

Make arm_kmmio_fault_page() store the previous page presence in struct
kmmio_fault_page and use it on disarm.

This patch was originally written by Stuart Bennett, but Pekka Paalanen
rewrote it a little different.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:36 +01:00
Stuart Bennett e9d54cae8f x86 mmiotrace: WARN_ONCE if dis/arming a page fails
Print a full warning once, if arming or disarming a page fails.

Also, if initial arming fails, do not handle the page further. This
avoids the possibility of a page failing to arm and then later claiming
to have handled any fault on that page.

WARN_ONCE added by Pekka Paalanen.

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:35 +01:00
Pekka Paalanen 5ff93697fc x86: add far read test to testmmiotrace
Apparently pages far into an ioremapped region might not actually be
mapped during ioremap(). Add an optional read test to try to trigger a
multiply faulting MMIO access. Also add more messages to the kernel log
to help debugging.

This patch is based on a patch suggested by
Stuart Bennett <stuart@freedesktop.org>
who discovered bugs in mmiotrace related to normal kernel space faults.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:35 +01:00
Pekka Paalanen fab852aaf7 x86: count errors in testmmiotrace.ko
Check the read values against the written values in the MMIO read/write
test. This test shows if the given MMIO test area really works as
memory, which is a prerequisite for a successful mmiotrace test.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:34 +01:00
David S. Miller aa4abc9bcc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-tx.c
	net/8021q/vlan_core.c
	net/core/dev.c
2009-03-01 21:35:16 -08:00
Ingo Molnar 55f2b78995 Merge branch 'x86/urgent' into x86/pat 2009-03-01 12:47:58 +01:00
Ingo Molnar f5c1aa1537 Revert "gpu/drm, x86, PAT: PAT support for io_mapping_*"
This reverts commit 17581ad812.

Sitsofe Wheeler reported that /dev/dri/card0 is MIA on his EeePC 900
and bisected it to this commit.

Graphics card is an i915 in an EeePC 900:

 00:02.0 VGA compatible controller [0300]:
   Intel Corporation Mobile 915GM/GMS/910GML
     Express Graphics Controller [8086:2592] (rev 04)

( Most likely the ioremap() of the driver failed and hence the card
  did not initialize. )

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-01 12:47:49 +01:00
Tejun Heo d0c4f57027 bootmem, x86: further fixes for arch-specific bootmem wrapping
Impact: fix new breakages introduced by previous fix

Commit c132937556 tried to clean up
bootmem arch wrapper but it wasn't quite correct.  Before the commit,
the followings were broken.

* Low level interface functions prefixed with __ ignored arch
  preference.

* reserve_bootmem(...) can't be mapped into
  reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
  not preference here.  The region specified MUST fall into the
  specified region; otherwise, it will panic.

After the commit,

* If allocation fails for the arch preferred node, it should fallback
  to whatever is available.  Instead, it simply failed allocation.

There are too many internal details to allow generic wrapping and
still keep things simple for archs.  Plus, all that arch wants is a
way to prefer certain node over another.

This patch drops the generic wrapping around alloc_bootmem_core() and
add alloc_bootmem_core() instead.  If necessary, arch can define
bootmem_arch_referred_node() macro or function which takes all
allocation information and returns the preferred node.  bootmem
generic code will always try the preferred node first and then
fallback to other nodes as usual.

Breakages noted and changes reviewed by Johannes Weiner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
2009-03-01 16:06:56 +09:00
Jaswinder Singh Rajput 327f4387e3 x86: remove double copy of show_cpuinfo_core for 32 and 64 bit
Impact: unification

show_cpuinfo_core is identical for 32 and 64 bit and can be unified,
and CONFIG_X86_HT inherently depends on CONFIG_X86_SMP.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-28 19:26:33 -08:00
Ingo Molnar 92b9af9e4f x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()
Impact: build fix

Theodore Ts reported that the i915 driver needs these symbols:

 ERROR: "pgprot_writecombine" [drivers/gpu/drm/i915/i915.ko] undefined!
 ERROR: "is_io_mapping_possible" [drivers/gpu/drm/i915/i915.ko] undefined!

Reported-by: Theodore Ts'o <tytso@mit.edu> wrote:
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 14:22:44 +01:00
Hiroshi Shimamoto 1fae0279ce x86: signal: introduce helper align_sigframe()
Impact: cleanup

Introduce helper align_sigframe() to align stack pointer for signal frame.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:31 +01:00
Hiroshi Shimamoto 75779f0526 x86: signal: unify get_sigframe()
Impact: cleanup

Unify get_sigframe().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:30 +01:00
Hiroshi Shimamoto 36a4526583 x86: signal: use 16 bytes boundary for rt_sigframe
Impact: cleanup

Supporting xsave/xrestore introduces 64 bytes boundary for save_i387_xstate().
16 bytes boundary is OK for rt_sigframe.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:30 +01:00
Hiroshi Shimamoto 97286a2b64 x86: signal: intrroduce get_sigframe() and replace get_sigstack()
Impact: cleanup

Introduce get_sigframe() like 32-bit to replace get_sigstack().
Move the i387 stuff into get_sigframe().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:29 +01:00
Hiroshi Shimamoto 144b0712dd x86: signal: add __user annotation
Impact: cleanup

Add missing __user annotation to the parameter of get_sigframe().
Also change cast type to void __user * of *fpstate.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:29 +01:00
Gustavo F. Padovan c577b098f9 x86, fixmap: unify fixmap.h
Impact: unification

This patch unify fixmap_32.h and fixmap_64.h into fixmap.h.
Things that we can't merge now are using CONFIG_X86_{32,64}
(e.g.:vsyscall and EFI)

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:48 -08:00
Gustavo F. Padovan c78f322ce8 x86, fixmap: prepare fixmap_32.h for unification
Impact: cleanup

Just prepare fixmap for later mechanic unification.
No real modification on code.

   text    data     bss     dec     hex filename
3831152  353188  372736 4557076  458914 vmlinux-32.after
3831152  353188  372736 4557076  458914 vmlinux-32.before

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:48 -08:00
Gustavo F. Padovan e365bcd921 x86, fixmap: prepare fixmap_64.h for unification
Impact: cleanup

Just prepare fixmap for later mechanic unification.
No real modification on code.

  text    data     bss     dec     hex filename
4312362  527192  421924 5261478  5048a6 vmlinux-64.after
4312362  527192  421924 5261478  5048a6 vmlinux-64.before

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:48 -08:00
Gustavo F. Padovan 5f403fa9de x86, fixmap: add CONFIG_EFI
Impact: new fixmap allocation

FIX_EFI_IO_MAP_FIRST_PAGE is used only when EFI is enabled.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan 2ae38daf25 x86, fixmap: add CONFIG_X86_{LOCAL,IO}_APIC
Impact: New fixmap allocations

Add CONFIG_X86_{LOCAL,IO}_APIC to enum fixed_address.
FIX_APIC_BASE is used only when CONFIG_X86_LOCAL_APIC is
enabled and FIX_IO_APIC_BASE_* are used only when
CONFIG_X86_IO_APIC is enabled.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan fd862dde18 x86, fixmap: define reserve_top_address for x86_64
Impact: new interface (not yet use)

Define reserve_top_address for x86_64; only for later x86 integration.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan ab93e3c45d x86, fixmap: define FIXADDR_BOOT_* and redefine FIX_ADDR_SIZE
Impact: new interface, not yet used

Now, with these macros, x86_64 code can know where start the
permanent and non-permanent fixed mapped address.
This patch make these macros equal fixmap_32.h for future
x86 integration.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan d09375a9ec x86, fixmap: rename __FIXADDR_SIZE and __FIXADDR_BOOT_SIZE
Impact: rename

Rename __FIXADDR_SIZE to FIXADDR_SIZE
and __FIXADDR_BOOT_SIZE to FIXADDR_BOOT_SIZE.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Ingo Molnar 1b49061d40 Merge branch 'sched/clock' into tracing/ftrace
Conflicts:
	kernel/sched_clock.c
2009-02-27 08:35:19 +01:00
Ingo Molnar ba1d755a36 fix warning in arch/x86/kernel/cpu/intel_cacheinfo.c
fix this warning:

  arch/x86/kernel/cpu/intel_cacheinfo.c:139: warning: ‘k8_nb_id’ defined but not used
  arch/x86/kernel/cpu/intel_cacheinfo.c:527: warning: ‘free_cache_attributes’ defined but not used
  arch/x86/kernel/cpu/intel_cacheinfo.c:538: warning: ‘detect_cache_attributes’ defined but not used

Unused variables in the !CONFIG_SYSCTL case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 22:39:12 +01:00
Ingo Molnar 83ce400928 x86: set X86_FEATURE_TSC_RELIABLE
If the TSC is constant and non-stop, also set it reliable.

(We will turn this off in DMI quirks for multi-chassis systems)

The performance number on a 16-way Nehalem system running
32 tasks that context-switch between each other is significant:

   sched_clock_stable=0		sched_clock_stable=1
   ....................         ....................
   22.456925 million/sec        24.306972 million/sec   [+8.2%]

lmbench's "lat_ctx -s 0 2" goes from 0.63 microseconds to
0.59 microseconds - a 6.7% increase in context-switching
performance.

Perfstat of 1 million pipe context switches between two tasks:

 Performance counter stats for './pipe-test-1m':

       [before]           [after]
   ............      ............
   37621.421089      36436.848378    task clock ticks     (msecs)

              0                 0    CPU migrations       (events)
        2000274           2000189    context switches     (events)
            194               193    pagefaults           (events)
     8433799643        8171016416    CPU cycles           (events) -3.21%
     8370133368        8180999694    instructions         (events) -2.31%
        4158565           3895941    cache references     (events) -6.74%
          44312             46264    cache misses         (events)

    2349.287976       2279.362465    wall-time            (msecs)  -3.06%

The speedup comes straight from the reduction in the instruction
count. sched_clock_cpu() got simpler and the whole workload thus
executes faster.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 21:20:25 +01:00
Kyle McMartin f6be37fdc6 x86: enable DMAR by default
Now that the obvious bugs have been worked out, specifically
the iwlagn issue, and the write buffer errata, DMAR should be safe
to turn back on by default. (We've had it on since those patches were
first written a few weeks ago, without any noticeable bug reports
(most have been due to the dma-api debug patchset.))

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 20:59:47 +01:00
Ingo Molnar 3b900d4419 x86: fix !ACPI build for es7000_32.c
arch/x86/kernel/apic/es7000_32.c:702: error: 'es7000_acpi_madt_oem_check_cluster' undeclared here (not in a function)

Provide a es7000_acpi_madt_oem_check_cluster() definition in the !ACPI
case too.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:35:56 +01:00
Ingo Molnar 0b1da1c8fc x86: apic: simplify secondary CPU wakeup methods, fix
Impact: build fix

init_deasserted is only available on SMP. Make the secondary-wakeup
function conditional on SMP.

Also clean up the file some.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:11:06 +01:00
Ingo Molnar 1f5bcabf1b x86: apic: simplify secondary CPU wakeup methods
Impact: cleanup

- rename apic->wakeup_cpu  to apic->wakeup_secondary_cpu, to
  make it apparent that this is an SMP-only method

- handle NULL ->wakeup_secondary_cpus to mean the default INIT
  wakeup sequence - this allows simplification of the APIC
  driver templates.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 13:58:26 +01:00
Ingo Molnar 0917c01f8e x86: remove update_apic from x86_quirks, fix
Impact: build fix

wakeup_secondary_cpu_via_init(), the default platform method for
booting a secondary CPU, is always used on UP due to probe_32.c,
if CONFIG_X86_LOCAL_APIC is enabled but SMP is off.

So provide a UP wrapper inline as well.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 12:49:34 +01:00
Yinghai Lu 129d8bc828 x86: don't compile vsmp_64 for 32bit
Impact: cleanup

that is only needed when CONFIG_X86_VSMP is defined with 64bit
also remove dead code about PCI, because CONFIG_X86_VSMP depends on PCI

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 06:40:06 +01:00
Yinghai Lu 2b6163bf57 x86: remove update_apic from x86_quirks
Impact: cleanup

x86_quirks->update_apic() calling looks crazy. so try to remove it:

 1. every apic take wakeup_cpu member directly
 2. separate es7000_apic to es7000_apic_cluster
 3. use uv_wakeup_cpu directly

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 06:32:25 +01:00
Ingo Molnar ecc25fbd6b Merge branches 'x86/apic', 'x86/defconfig', 'x86/memtest', 'x86/mm' and 'linus' into x86/core 2009-02-26 06:31:32 +01:00
Ingo Molnar 801c0be814 Merge branches 'x86/urgent' and 'x86/pat' into x86/core
Conflicts:
	arch/x86/include/asm/pat.h
2009-02-26 06:31:23 +01:00
Ingo Molnar 13b2eda64d Merge branch 'x86/urgent' into x86/core
Conflicts:
	arch/x86/mach-voyager/voyager_smp.c
2009-02-26 06:30:42 +01:00
Ingo Molnar 13093cb0e5 gpu/drm, x86, PAT: PAT support for io_mapping_*, export symbols for modules
Impact: build fix

 ERROR: "reserve_io_memtype_wc" [drivers/gpu/drm/i915/i915.ko] undefined!
 ERROR: "free_io_memtype" [drivers/gpu/drm/i915/i915.ko] undefined!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 03:43:53 +01:00
Jeremy Fitzhardinge 55d8085671 xen: disable interrupts early, as start_kernel expects
This avoids a lockdep warning from:
	if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled)))
		return;
in trace_hardirqs_on_caller();

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:51:57 +01:00
Ingo Molnar 2e31add2a7 Merge branch 'x86/urgent' into x86/pat 2009-02-25 16:40:10 +01:00
Peter Zijlstra 34754b69a6 x86: make vmap yell louder when it is used under irqs_disabled()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 16:38:34 +01:00
Venkatesh Pallipadi 17581ad812 gpu/drm, x86, PAT: PAT support for io_mapping_*
Make io_mapping_create_wc and io_mapping_free go through PAT to make sure
that there are no memory type aliases.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:52 +01:00
Venkatesh Pallipadi 4ab0d47d0a gpu/drm, x86, PAT: io_mapping_create_wc and resource_size_t
io_mapping_create_wc should take a resource_size_t parameter in place of
unsigned long. With unsigned long, there will be no way to map greater than 4GB
address in i386/32 bit.

On x86, greater than 4GB addresses cannot be mapped on i386 without PAE. Return
error for such a case.

Patch also adds a structure for io_mapping, that saves the base, size and
type on HAVE_ATOMIC_IOMAP archs, that can be used to verify the offset on
io_mapping_map calls.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:51 +01:00
Venkatesh Pallipadi 7880f74645 gpu/drm, x86, PAT: routine to keep identity map in sync
Add a function to check and keep identity maps in sync, when changing
any memory type. One of the follow on patches will also use this
routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:51 +01:00
Andreas Herrmann 63823126c2 x86: memtest: add additional (regular) test patterns
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:47 +01:00
Andreas Herrmann bfb4dc0da4 x86: memtest: wipe out test pattern from memory
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:46 +01:00
Andreas Herrmann 570c9e69aa x86: memtest: adapt log messages
- print test pattern instead of pattern number,
- show pattern as stored in memory,
- use proper priority flags,
- consistent use of u64 throughout the code

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:46 +01:00
Andreas Herrmann 7dad169e57 x86: memtest: cleanup memtest function
Impact: code cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:45 +01:00
Andreas Herrmann 6d74171bf7 x86: memtest: introduce array to select memtest patterns
Impact: code cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:45 +01:00
Andreas Herrmann 40823f737e x86: memtest: reuse test patterns when memtest parameter exceeds number of available patterns
Impact: fix unexpected behaviour when pattern number is out of range

Current implementation provides 4 patterns for memtest. The code doesn't
check whether the memtest parameter value exceeds the maximum pattern number.

Instead the memtest code pretends to test with non-existing patterns, e.g.
when booting with memtest=10 I've observed the following

  ...
  early_memtest: pattern num 10
  0000001000 - 0000006000 pattern 0
  ...
  0000001000 - 0000006000 pattern 1
  ...
  0000001000 - 0000006000 pattern 2
  ...
  0000001000 - 0000006000 pattern 3
  ...
  0000001000 - 0000006000 pattern 4
  ...
  0000001000 - 0000006000 pattern 5
  ...
  0000001000 - 0000006000 pattern 6
  ...
  0000001000 - 0000006000 pattern 7
  ...
  0000001000 - 0000006000 pattern 8
  ...
  0000001000 - 0000006000 pattern 9
  ...

But in fact Linux didn't test anything for patterns > 4 as the default
case in memtest() is to leave the function.

I suggest to use the memtest parameter as the number of tests to be
performed and to re-iterate over all existing patterns.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:44 +01:00
Ingo Molnar 95108fa34a x86: usercopy: check for total size when deciding non-temporal cutoff
Impact: make more types of copies non-temporal

This change makes the following simple fix:

  30d697f: x86: fix performance regression in write() syscall

A bit more sophisticated: we check the 'total' number of bytes
written to decide whether to copy in a cached or a non-temporal
way.

This will for example cause the tail (modulo 4096 bytes) chunk
of a large write() to be non-temporal too - not just the page-sized
chunks.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 10:20:05 +01:00
Ingo Molnar 3255aa2eb6 x86, mm: pass in 'total' to __copy_from_user_*nocache()
Impact: cleanup, enable future change

Add a 'total bytes copied' parameter to __copy_from_user_*nocache(),
and update all the callsites.

The parameter is not used yet - architecture code can use it to
more intelligently decide whether the copy should be cached or
non-temporal.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 10:20:03 +01:00
Ingo Molnar 95f66b3770 Merge branch 'x86/asm' into x86/mm 2009-02-25 08:27:46 +01:00
Matthew Garrett eb3092cee7 [CPUFREQ] Make cpufreq-nforce2 less obnoxious
Not owning an nforce2 is a sign of good taste, not an error.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:32 -05:00
Matthias-Christian Ott 199785eac8 [CPUFREQ] p4-clockmod reports wrong frequency.
http://bugzilla.kernel.org/show_bug.cgi?id=10968

[ Updated for current tree, and fixed compile failure
  when p4-clockmod was built modular -- davej]

From: Matthias-Christian Ott <ott@mirix.org>
Signed-off-by: Dominik Brodowski <linux@brodo.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:32 -05:00
Dave Jones 0cb8bc2560 [CPUFREQ] powernow-k8: Use a common exit path.
a0abd520fd introduced a slew of
extra kfree/return -ENODEV pairs. This replaces them all
with gotos.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:32 -05:00
Matthew Garrett de3ed81d74 [CPUFREQ] Change link order of x86 cpufreq modules
Change the link order of the cpufreq modules to ensure that they're
probed in the preferred order when statically linked in.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:32 -05:00
Dave Jones 91420220d2 [CPUFREQ] Use swap() in longhaul.c
Remove hand-coded implementation of swap()

Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Dave Jones 3a58df35a6 [CPUFREQ] checkpatch cleanups for acpi-cpufreq
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Thomas Renninger 79cc56af9f [CPUFREQ] powernow-k8: Only print error message once, not per core.
This is the typical message you get if you plug in a CPU
which is newer than your BIOS. It's annoying seeing this
message for each core.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Thomas Renninger 57f4fa6991 [CPUFREQ] powernow-k8: Always compile powernow-k8 driver with ACPI support
powernow-k8 driver should always try to get cpufreq info from ACPI.
Otherwise it will not be able to detect the transition latency correctly
which results in ondemand governor taking a wrong sampling rate which will
then result in sever performance loss.

Let the user not shoot himself in the foot and always compile in ACPI
support for powernow-k8.

This also fixes a wrong message if ACPI_PROCESSOR is compiled as a module and
#ifndef CONFIG_ACPI_PROCESSOR
path is chosen.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:31 -05:00
Dave Jones 0e64a0c982 [CPUFREQ] checkpatch cleanups for powernow-k8
This driver has so many long function names, and deep nested if's
The remaining warnings will need some code restructuring to clean up.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:30 -05:00
Dave Jones b9e7638a30 [CPUFREQ] checkpatch cleanups for powernow-k7
The asm/timer.h warning can be ignored, it's needed for
recalibrate_cpu_khz()

Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:30 -05:00
Dave Jones bbfebd6655 [CPUFREQ] checkpatch cleanups for speedstep related drivers.
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:30 -05:00
Dave Jones 6072ace436 [CPUFREQ] checkpatch cleanups for sc520
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Dave Jones 14a6650f13 [CPUFREQ] checkpatch cleanups for powernow-k6
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Dave Jones 48ee923a66 [CPUFREQ] checkpatch cleanups for longrun
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Dave Jones ac617bd0f7 [CPUFREQ] checkpatch cleanups for longhaul
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Dave Jones 00f6a235bf [CPUFREQ] checkpatch cleanups for gx-suspmod
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:29 -05:00
Dave Jones c9b8c87152 [CPUFREQ] checkpatch cleanups for e_powersaver
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:28 -05:00
Dave Jones 04cd1a99dc [CPUFREQ] checkpatch cleanups for elanfreq
The remaining warning about the simple_strtoul conversion
to strict_strtoul seems kind of pointless to me.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:28 -05:00
Dave Jones 20174b65d9 [CPUFREQ] nforce2: Use driver prefix, not cpufreq prefix.
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:28 -05:00
Dave Jones b5c9166662 [CPUFREQ] checkpatch cleanups for cpufreq-nforce2
Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:28 -05:00
Dave Jones fff78ad5ce [CPUFREQ] Stupidly trivial CodingStyle fix
GNU indent complains about this being ambiguous, because it's dumb.
One of my automated tests relies on the output of indent, so this shuts
it up.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-02-24 22:47:28 -05:00
Tejun Heo d325100504 x86: convert cacheflush macros inline functions
Impact: cleanup

Unused macro parameters cause spurious unused variable warnings.
Convert all cacheflush macros to inline functions to avoid the
warnings and achieve better type checking.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-25 11:06:51 +09:00
Tejun Heo 24ff954233 x86, percpu: fix minor bugs in setup_percpu.c
Recent changes in setup_percpu.c made a now meaningless DBG()
statement fail to compile and introduced a
comparison-of-different-types warning.  Fix them.

Compile failure is reported by Ingo Molnar.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 10:38:10 +09:00
H. Peter Anvin 638bee71c8 Merge branch 'x86/core' into x86/mce2 2009-02-24 16:11:51 -08:00
H. Peter Anvin 2aaa822984 Merge branch 'x86/defconfig' into x86/mce2
Conflicts (resolved):
	arch/x86/configs/x86_64_defconfig

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 16:01:17 -08:00
H. Peter Anvin 15d4fcd615 x86, mce: enable machine checks in 32-bit defconfig
Impact: defconfig change

Enable MCE in the 32-bit defconfig.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 15:52:58 -08:00
Andi Kleen 1250fbed14 x86, mce: enable machine checks in 64-bit defconfig
Impact: defconfig change

Enable MCE in the 64-bit defconfig.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-24 15:52:21 -08:00
Yinghai Lu 46cb27f516 x86: check range in reserve_early()
Impact: cleanup

one 32-bit system reports:

BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001c000000 (usable)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
DMI 2.0 present.
last_pfn = 0x1c000 max_arch_pfn = 0x100000
kernel direct mapping tables up to 1c000000 @ 7000-c000
..
RAMDISK: 1bc69000 - 1bfef4fa
..
0MB HIGHMEM available.
448MB LOWMEM available.
  mapped low ram: 0 - 1c000000
  low ram: 00000000 - 1c000000
  bootmap 00002000 - 00005800
(9 early reservations) ==> bootmem [0000000000 - 001c000000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000001000 - 0000002000]    EX TRAMPOLINE ==> [0000001000 - 0000002000]
  #2 [0000006000 - 0000007000]       TRAMPOLINE ==> [0000006000 - 0000007000]
  #3 [0000400000 - 00009ed14c]    TEXT DATA BSS ==> [0000400000 - 00009ed14c]
  #4 [001bc69000 - 001bfef4fa]          RAMDISK ==> [001bc69000 - 001bfef4fa]
  #5 [00009ee000 - 00009f2000]    INIT_PG_TABLE ==> [00009ee000 - 00009f2000]
  #6 [000009f400 - 0000100000]    BIOS reserved ==> [000009f400 - 0000100000]
  #7 [0000007000 - 0000007000]          PGTABLE
  #8 [0000002000 - 0000006000]          BOOTMAP ==> [0000002000 - 0000006000]

Notice the strange blank PGTABLE entry.

The reason is init_pg_table is big enough, and zero range is called
with init_memory_mapping/reserve_early().

So try to check the range in reserve_early()

v2: fix the reversed compare

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: nickpiggin@yahoo.com.au
Cc: ink@jurassic.park.msu.ru
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 22:43:15 +01:00
Andi Kleen be71b8553d x86, mce, cmci: recheck CMCI banks after APIC has been enabled on CPU #0
Impact: Fix marginal race condition

One the first CPU the machine checks are enabled early before
the local APIC is enabled. This could in theory lead
to some lost CMCI events very early during boot because
CMCIs cannot be delivered with disabled LAPIC.

The poller also doesn't recover from this because it doesn't
check CMCI banks.

Add an explicit CMCI banks check after the LAPIC is enabled.
This is only done for CPU #0, the other CPUs only initialize
machine checks after the LAPIC is on.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:01 -08:00
Andi Kleen 5ca8681ca1 x86, mce, cmci: disable CMCI on rebooting
Impact: Avoids confusing other OSes.

Disable the CMCI vector on reboot to avoid confusing other OS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:01 -08:00
H. Peter Anvin df20e2eb3e x86, mce, cmci: remove incorrect __cpuinit/__cpuexit annotations
Impact: Bug fix on UP

The MCE code is reinitialized from resume, so we can't use
__cpuinit/__cpuexit for most of the code.  Remove those annotations
for anything downstream of mce_init().

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:01 -08:00
Andi Kleen 88ccbedd9c x86, mce, cmci: add CMCI support
Impact: Major new feature

Intel CMCI (Corrected Machine Check Interrupt) is a new
feature on Nehalem CPUs. It allows the CPU to trigger
interrupts on corrected events, which allows faster
reaction to them instead of with the traditional
polling timer.

Also use CMCI to discover shared banks. Machine check banks
can be shared by CPU threads or even cores. Using the CMCI enable
bit it is possible to detect the fact that another CPU already
saw a specific bank. Use this to assign shared banks only
to one CPU to avoid reporting duplicated events.

On CPU hot unplug bank sharing is re discovered. This is done
using a thread that cycles through all the CPUs.

To avoid races between the poller and CMCI we only poll
for banks that are not CMCI capable and only check CMCI
owned banks on a interrupt.

The shared banks ownership information is currently only used for
CMCI interrupts, not polled banks.

The sharing discovery code follows the algorithm recommended in the
IA32 SDM Vol3a 14.5.2.1

The CMCI interrupt handler just calls the machine check poller to
pick up the machine check event that caused the interrupt.

I decided not to implement a separate threshold event like
the AMD version has, because the threshold is always one currently
and adding another event didn't seem to add any value.

Some code inspired by Yunhong Jiang's Xen implementation,
which was in term inspired by a earlier CMCI implementation
by me.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:00 -08:00
Andi Kleen 03195c6b40 x86, mce, cmci: define MSR names and fields for new CMCI registers
Impact: New register definitions only

CMCI means support for raising an interrupt on a corrected machine
check event instead of having to poll for it. It's a new feature in
Intel Nehalem CPUs available on some machine check banks.

For details see the IA32 SDM Vol3a 14.5

Define the registers for it as a preparation for further patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:00 -08:00
Andi Kleen ee031c31d6 x86, mce, cmci: use polled banks bitmap in machine check poller
Define a per cpu bitmap that contains the banks polled by the machine
check poller. This is needed for the CMCI code in the next patches
to be able to disable polling on specific banks.

The bank by default contains all banks, so there is no behaviour
change. Only future code will remove some banks from the polling
set.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:26:05 -08:00
Andi Kleen 8457c84d68 x86, mce: replace machine check events logged interval with ratelimit
Impact: behavior change, use common code

Use a standard leaky bucket ratelimit for the machine check
warning print interval instead of waiting every check_interval.
Also decrease the limit to twice per minute.
This interacts better with threshold interrupts because
they can happen more often than check_interval.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:25:53 -08:00
Andi Kleen f9695df42c x86, mce, cmci: avoid potential reentry of threshold interrupt
Impact: minor bugfix

The threshold handler on AMD (and soon on Intel) could be theoretically
reentered by the hardware. This could lead to corrupted events
because the machine check poll code assumes it is not reentered.

Move the APIC ACK to the end of the interrupt handler to let
the hardware avoid that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:24:42 -08:00
Andi Kleen b276268631 x86, mce, cmci: factor out threshold interrupt handler
Impact: cleanup; preparation for feature

The mce_amd_64 code has an own private MC threshold vector with an own
interrupt handler. Since Intel needs a similar handler
it makes sense to share the vector because both can not
be active at the same time.

I factored the common APIC handler code into a separate file which can
be used by both the Intel or AMD MC code.

This is needed for the next patch which adds an Intel specific
CMCI handler.

This patch should be a nop for AMD, it just moves some code
around.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:24:42 -08:00
Andi Kleen 41fdff322e x86, mce, cmci: export MAX_NR_BANKS
Impact: Cleanup (code movement)

Move MAX_NR_BANKS into mce.h because it's needed there
for followup patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:24:42 -08:00
Jiri Slaby b5f26d0556 x86_32: summit_32, de-inline functions
The ones which go only into struct genapic are de-inlined
by compiler anyway, so remove the inline specifier from them.

Afterwards, remove summit_setup_portio_remap completely as it
is unused.

Remove inline also from summit_cpu_mask_to_apicid, since it's
not worth it (it is used in struct genapic too).

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 22:07:51 +01:00
Jiri Slaby 10b614eaa8 x86_32: summit_32, use BAD_APICID
Use BAD_APICID instead of 0xFF constants in summit_cpu_mask_to_apicid.

Also remove bogus comments about what we actually return.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 22:07:51 +01:00
Ingo Molnar 0edcf8d692 Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu
Conflicts:
	arch/x86/include/asm/pgtable.h
2009-02-24 21:52:45 +01:00
Ingo Molnar a852cbfaaf Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core 2009-02-24 21:50:43 +01:00
James Bottomley ddf9499b3d x86, Voyager: fix compile by lifting the degeneracy of phys_cpu_present_map
This was changed to a physmap_t giving a clashing symbol redefinition,
but actually using a physmap_t consumes rather a lot of space on x86,
so stick with a private copy renamed with a voyager_ prefix and made
static.  Nothing outside of the Voyager code uses it, anyway.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 12:50:11 -08:00
Markus Metzger 499aa86dcb x86, ptrace: remove CONFIG guards around declarations
Remove unnecessary CONFIG guards around type declarations and macro
definitions.

Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: markus.t.metzger@gmail.com
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:23:35 +01:00
Ingo Molnar a7f4463e03 Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc6' into tracing/core 2009-02-24 18:22:39 +01:00
Cyrill Gorcunov 9f331119a4 x86: efi_stub_32,64 - add missing ENDPROCs
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:40 +01:00
Cyrill Gorcunov bc8b2b9258 x86: head_64.S - use GLOBAL macro
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:40 +01:00
Cyrill Gorcunov b3baaa138c x86: entry_64.S - add missing ENDPROC
native_usergs_sysret64 is described as

	extern void native_usergs_sysret64(void)

so lets add ENDPROC here

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:39 +01:00
Cyrill Gorcunov 57e372932c x86: invalid_vm86_irq -- use predefined macros
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:39 +01:00
Cyrill Gorcunov 5e112ae23b x86: head_64.S - use IDT_ENTRIES instead of hardcoded number
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:38 +01:00
Cyrill Gorcunov 2a0b100111 x86: head_64.S - remove useless balign
Impact: cleanup

NEXT_PAGE already has 'balign' so no
need to keep this redundant one.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:38 +01:00
Salman Qazi 30d697fa3a x86: fix performance regression in write() syscall
While the introduction of __copy_from_user_nocache (see commit:
0812a579c9) may have been an improvement
for sufficiently large writes, there is evidence to show that it is
deterimental for small writes.  Unixbench's fstime test gives the
following results for 256 byte writes with MAX_BLOCK of 2000:

    2.6.29-rc6 ( 5 samples, each in KB/sec ):
    283750, 295200, 294500, 293000, 293300

    2.6.29-rc6 + this patch (5 samples, each in KB/sec):
    313050, 3106750, 293350, 306300, 307900

    2.6.18
    395700, 342000, 399100, 366050, 359850

    See w_test() in src/fstime.c in unixbench version 4.1.0.  Basically, the above test
    consists of counting how much we can write in this manner:

    alarm(10);
    while (!sigalarm) {
            for (f_blocks = 0; f_blocks < 2000; ++f_blocks) {
                   write(f, buf, 256);
            }
            lseek(f, 0L, 0);
    }

Note, there are other components to the write syscall regression
that are not addressed here.

Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 17:16:36 +01:00
David S. Miller e70049b9e7 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-24 03:50:29 -08:00
Tejun Heo 8ac8375714 x86: add remapping percpu first chunk allocator
Impact: add better first percpu allocation for NUMA

On NUMA, embedding allocator can't be used as different units can't be
made to fall in the correct NUMA nodes.  To use large page mapping,
each unit needs to be remapped.  However, percpu areas are usually
much smaller than large page size and unused space hurts a lot as the
number of cpus grow.  This allocator remaps large pages for each chunk
but gives back unused part to the bootmem allocator making the large
pages mapped twice.

This adds slightly to the TLB pressure but is much better than using
4k mappings while still being NUMA-friendly.

Ingo suggested that this would be the correct approach for NUMA.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-02-24 11:57:22 +09:00
Tejun Heo 89c9215165 x86: add embedding percpu first chunk allocator
Impact: add better first percpu allocation for !NUMA

On !NUMA, we can simply allocate contiguous memory and use it for the
first chunk without mapping it into vmalloc area.  As the memory area
is covered by the large page physical memory mapping, it allows the
dynamic perpcu allocator to not add any TLB overhead for the static
percpu area and whatever falls into the first chunk and the
implementation is very simple too.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo 5f5d8405d1 x86: separate out setup_pcpu_4k() from setup_per_cpu_areas()
Impact: modularize percpu first chunk allocation

x86 is gonna have a few different strategies for the first chunk
allocation.  Modularize it by separating out the current allocation
mechanism into pcpu_alloc_bootmem() and setup_pcpu_4k().

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo 8d408b4be3 percpu: give more latitude to arch specific first chunk initialization
Impact: more latitude for first percpu chunk allocation

The first percpu chunk serves the kernel static percpu area and may or
may not contain extra room for further dynamic allocation.
Initialization of the first chunk needs to be done before normal
memory allocation service is up, so it has its own init path -
pcpu_setup_static().

It seems archs need more latitude while initializing the first chunk
for example to take advantage of large page mapping.  This patch makes
the following changes to allow this.

* Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
  to reserve in the first chunk for further dynamic allocation.

* Rename pcpu_setup_static() to pcpu_setup_first_chunk().

* Make pcpu_setup_first_chunk() much more flexible by fetching page
  pointer by callback and adding optional @unit_size, @free_size and
  @base_addr arguments which allow archs to selectively part of chunk
  initialization to their likings.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo 458a3e644c x86: update populate_extra_pte() and add populate_extra_pmd()
Impact: minor change to populate_extra_pte() and addition of pmd flavor

Update populate_extra_pte() to return pointer to the pte_t for the
specified address and add populate_extra_pmd() which only populates
till the pmd and returns pointer to the pmd entry for the address.

For 64bit, pud/pmd/pte fill functions are separated out from
set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
populate_extra_{pte|pmd}().

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo c132937556 bootmem: clean up arch-specific bootmem wrapping
Impact: cleaner and consistent bootmem wrapping

By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
arch-specific wrappers for bootmem allocation.  However, this is done
a bit strangely in that only the high level convenience macros can be
changed while lower level, but still exported, interface functions
can't be wrapped.  This not only is messy but also leads to strange
situation where alloc_bootmem() does what the arch wants it to do but
the equivalent __alloc_bootmem() call doesn't although they should be
able to be used interchangeably.

This patch updates bootmem such that archs can override / wrap the
backend function - alloc_bootmem_core() instead of the highlevel
interface functions to allow simpler and consistent wrapping.  Also,
HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@saeurebad.de>
2009-02-24 11:57:20 +09:00
H. Peter Anvin dc731ca609 Merge branch 'x86/urgent' into x86/mce2 2009-02-23 14:05:56 -08:00
H. Peter Anvin ec5b3d3243 x86, mce: remove invalid __cpuinit/__cpuexit annotations
Impact: Bug fix when CPU hotplug is disabled

Correct the following broken __cpuinit/__cpuexit annotations:

- mce_cpu_features() is called from mce_resume(), and so cannot be
  __cpuinit.
- mce_disable_cpu() and mce_reenable_cpu() are called from
  mce_cpu_callback(), and so cannot be __cpuexit().

Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-23 14:01:04 -08:00
Stas Sergeev bda3a89745 x86: minor cleanup in the espfix code
Impact: Cleanup

Checkin be44d2aabc eliminates the use of
a 16-bit stack for espfix.  However, at least one instruction remained
that only operated on the low 16 bits of %esp.

This is not a bug per se because the kernel stack is always an aligned
4K or 8K block.  Therefore it cannot cross 64K boundaries; this code,
in fact, relies strictly on that fact.

However, it's a lot cleaner (and, for that matter, smaller) to operate
on the entire 32-bit register.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
CC: Zachary Amsden <zach@vmware.com>
CC: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-23 11:34:04 -08:00
Yinghai Lu ecda06289f x86: check mptable physptr with max_low_pfn on 32bit
Impact: fix early crash on LinuxBIOS systems

Kevin O'Connor reported that Coreboot aka LinuxBIOS tries to put
mptable somewhere very high, well above max_low_pfn (below which
BIOSes generally put the mptable), causing a panic.

The BIOS will probably be changed to be compatible with older
Linus versions, but nevertheless the MP-spec does not forbid
an MP-table in arbitrary system RAM, so make sure it all
works even if the table is in an unexpected place.

Check physptr with max_low_pfn * PAGE_SIZE.

Reported-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Stefan Reinauer <stepan@coresystems.de>
Cc: coreboot@coreboot.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 07:41:31 +01:00
Benjamin Herrenschmidt 35f88e6b06 Merge commit 'ftrace/function-graph' into next 2009-02-23 10:47:23 +11:00
Ingo Molnar 8e6dafd6c7 x86: refactor x86_quirks support
Impact: cleanup

Make x86_quirks support more transparent. The highlevel
methods are now named:

  extern void x86_quirk_pre_intr_init(void);
  extern void x86_quirk_intr_init(void);

  extern void x86_quirk_trap_init(void);

  extern void x86_quirk_pre_time_init(void);
  extern void x86_quirk_time_init(void);

This makes it clear that if some platform extension has to
do something here that it is considered ... weird, and is
discouraged.

Also remove arch_hooks.h and move it into setup.h (and other
header files where appropriate).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 00:08:11 +01:00
Ingo Molnar d85a881d78 x86: remove various unused subarch hooks
Impact: remove dead code

Remove:

 - pre_setup_arch_hook()
 - mca_nmi_hook()

If needed they can be added back via an x86_quirk handler.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 00:06:49 +01:00
Ingo Molnar 965c7ecaf2 x86: remove the Voyager 32-bit subarch
Impact: remove unused/broken code

The Voyager subarch last built successfully on the v2.6.26 kernel
and has been stale since then and does not build on the v2.6.27,
v2.6.28 and v2.6.29-rc5 kernels.

No actual users beyond the maintainer reported this breakage.
Patches were sent and most of the fixes were accepted but the
discussion around how to do a few remaining issues cleanly
fizzled out with no resolution and the code remained broken.

In the v2.6.30 x86 tree development cycle 32-bit subarch support
has been reworked and removed - and the Voyager code, beyond the
build problems already known, needs serious and significant
changes and probably a rewrite to support it.

CONFIG_X86_VOYAGER has been marked BROKEN then. The maintainer has
been notified but no patches have been sent so far to fix it.

While all other subarchs have been converted to the new scheme,
voyager is still broken. We'd prefer to receive patches which
clean up the current situation in a constructive way, but even in
case of removal there is no obstacle to add that support back
after the issues have been sorted out in a mutually acceptable
fashion.

So remove this inactive code for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 00:54:01 +01:00
Ravikiran G Thirumalai 8425091ff8 x86: improve the help text of X86_EXTENDED_PLATFORM
Change the CONFIG_X86_EXTENDED_PLATFORM help text to display the
32bit/64bit extended platform list. This is as suggested by Ingo.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: shai@scalex86.org
Cc: "Benzi Galili (Benzi@ScaleMP.com)" <benzi@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:21:31 +01:00
Ingo Molnar fc6fc7f1b1 Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/mach-default/setup.c

Semantic conflict resolution:
	arch/x86/kernel/setup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:05:19 +01:00
Rafael J. Wysocki 770824bdc4 PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.

This is based on an earlier patch from Linus.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 10:33:44 -08:00
Linus Torvalds 936577c61d x86: Add IRQF_TIMER to legacy x86 timer interrupt descriptors
Right now nobody cares, but the suspend/resume code will eventually want
to suspend device interrupts without suspending the timer, and will
depend on this flag to know.

The modern x86 timer infrastructure uses the local APIC timers and never
shows up as a device interrupt at all, so it isn't affected and doesn't
need any of this.

Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 10:27:49 -08:00
Suresh Siddha ef1f87aa7b x86: select x2apic ops in early apic probe only if x2apic mode is enabled
If BIOS hands over the control to OS in legacy xapic mode, select
legacy xapic related ops in the early apic probe and shift to x2apic
ops later in the boot sequence, only after enabling x2apic mode.

If BIOS hands over the control in x2apic mode, select x2apic related
ops in the early apic probe.

This fixes the early boot panic, where we were selecting x2apic ops,
while the cpu is still in legacy xapic mode.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 18:20:50 +01:00
Ingo Molnar c478f87869 Merge branch 'tip/x86/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
Conflicts:
	include/linux/ftrace.h
	kernel/trace/ftrace.c
2009-02-22 18:12:01 +01:00
Andreas Herrmann c23e253e67 x86: hpet: stop HPET_COUNTER when programming periodic mode
Impact: fix system hang on some systems operating with HZ_1000

On a system that stalled with HZ_1000, the first value written to
T0_CMP (when the main counter was not stopped) did not trigger an
interrupt. Instead after the main counter wrapped around (after
several minutes) an interrupt was triggered and afterwards the
periodic interrupt took effect.

This can be fixed by implementing HPET spec recommendation for
programming the periodic mode (i.e. stopping the main counter).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mark Hounschell <markh@compro.net>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 18:01:18 +01:00
Andreas Herrmann 8d6f0c8214 x86: hpet: provide separate functions to stop and start the counter
By splitting up existing hpet_start_counter function.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mark Hounschell <markh@compro.net>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 18:01:17 +01:00
Andreas Herrmann b98103a559 x86: hpet: print HPET registers during setup (if hpet=verbose is used)
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mark Hounschell <markh@compro.net>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 18:01:14 +01:00
Ingo Molnar 2702e0a46c Merge branch 'linus' into timers/hpet 2009-02-22 17:59:49 +01:00
Hiroshi Shimamoto a967bb3fbe x86: ia32_signal: introduce {get|set}_user_seg()
Impact: cleanup

Introduce {get|set}_user_seg() and loadsegment_xx() macros to make code clean.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 17:54:47 +01:00
Hiroshi Shimamoto 8801ead40c x86: ia32_signal: introduce GET_SEG() macro
Impact: cleanup

introduce GET_SEG() macro like arch/x86/kernel/signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 17:54:47 +01:00
Hiroshi Shimamoto a47e3ec197 x86: ia32_signal: remove unused debug code
Impact: cleanup

DEBUG_SIG will not be used.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 17:54:46 +01:00
Ingo Molnar b319eed0aa x86, mm: fault.c, simplify kmmio_fault(), cleanup
Clarify the kmmio_fault() comment.

Acked-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 10:24:18 +01:00
Hannes Eder 2366c298b5 x86: numa_32.c: fix sparse warning: Using plain integer as NULL pointer
Fix this sparse warning:

  arch/x86/mm/numa_32.c:197:24: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 09:27:12 +01:00
Hannes Eder fc6fcdfbb8 x86: kexec/i386: fix sparse warnings: Using plain integer as NULL pointer
Fix these sparse warnings:

  arch/x86/kernel/machine_kexec_32.c:124:22: warning: Using plain integer as NULL pointer
  arch/x86/kernel/traps.c:950:24: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 09:27:11 +01:00
Jiri Slaby 6defa2fe20 x86_64: Fix S3 fail path
As acpi_enter_sleep_state can fail, take this into account in
do_suspend_lowlevel and don't return to the do_suspend_lowlevel's
caller. This would break (currently) fpu status and preempt count.

Technically, this means use `call' instead of `jmp' and `jmp' to
the `resume_point' after the `call' (i.e. if
acpi_enter_sleep_state returns=fails). `resume_point' will handle
the restore of fpu and preempt count gracefully.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-02-21 21:58:18 -05:00
Jiri Slaby e6bd6760c9 x86_64: acpi/wakeup_64 cleanup
- remove %ds re-set, it's already set in wakeup_long64
- remove double labels and alignment (ENTRY already adds both)
- use meaningful resume point labelname
- skip alignment while jumping from wakeup_long64 to the resume point
- remove .size, .type and unused labels
[v2]
- added ENDPROCs

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-02-21 21:58:18 -05:00
H. Peter Anvin cc3ca22063 x86, mce: remove incorrect __cpuinit for mce_cpu_features()
Impact: Bug fix on UP

Checkin 6ec68bff3c81e776a455f6aca95c8c5f1d630198:
    x86, mce: reinitialize per cpu features on resume

introduced a call to mce_cpu_features() in the resume path, in order
for the MCE machinery to get properly reinitialized after a resume.
However, this function (and its successors) was flagged __cpuinit,
which becomes __init on UP configurations (on SMP suspend/resume
requires CPU hotplug and so this would not be seen.)

Remove the offending __cpuinit annotations for mce_cpu_features() and
its successor functions.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-20 23:40:40 -08:00
Ingo Molnar f8eeb2e6be x86, mm: fault.c, update copyrights
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 23:13:36 +01:00
Ingo Molnar cd1b68f08f x86, mm: fault.c, give another attempt at prefetch handing before SIGBUS
Impact: extend prefetch handling on 64-bit

Currently there's an extra is_prefetch() check done in do_sigbus(),
which we only do on 32 bits.

This is a last-ditch check before we terminate a task, so it's worth
giving prefetch instructions another chance - should none of our
existing quirks have caught a prefetch instruction related spurious
fault.

The only risk is if a prefetch causes a real sigbus, in that case
we'll not OOM but try another fault. But this code has been on
32-bit for a long time, so it should be fine in practice.

So do this on 64-bit too - and thus remove one more #ifdef.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:46 +01:00
Ingo Molnar 7c178a26d3 x86, mm: fault.c, remove #ifdef from fault_in_kernel_space()
Impact: cleanup

Removal of an #ifdef in fault_in_kernel_space(), by making
use of the new TASK_SIZE_MAX symbol which is now available
on 32-bit too.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:45 +01:00
Ingo Molnar d951734654 x86, mm: rename TASK_SIZE64 => TASK_SIZE_MAX
Impact: cleanup

Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the
define on 32-bit too. (mapped to TASK_SIZE)

This allows 32-bit code to make use of the (former-) TASK_SIZE64
symbol as well, in a clean way.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar c3731c6866 x86, mm: fault.c, remove #ifdef from do_page_fault()
Impact: cleanup

do_page_fault() has this ugly #ifdef in its prototype:

  #ifdef CONFIG_X86_64
  asmlinkage
  #endif
  void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)

Replace it with 'dotraplinkage' which maps to exactly the above
construct: nothing on 32-bit and asmlinkage on 64-bit.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar 1cc99544dd x86, mm: fault.c, unify oops handling
Impact: add oops-recursion check to 32-bit

Unify the oops state-machine, to the 64-bit version. It is
slightly more careful in that it does a recursion check
in oops_begin(), and is thus more likely to show the relevant
oops.

It also means that 32-bit will print one more line at the
end of pagefault triggered oopses:

 	printk(KERN_EMERG "CR2: %016lx\n", address);

Which is generally good information to be seen in partial-dump
digital-camera jpegs ;-)

The downside is the somewhat more complex critical path. Both
variants have been tested well meanwhile by kernel developers
crashing their boxes so i dont think this is a practical worry.

This removes 3 ugly #ifdefs from no_context() and makes the
function a lot nicer read.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar 8f7661496c x86, mm: fault.c, unify oops printing
Impact: refine/extend page fault related oops printing on 64-bit

 - honor the pause_on_oops logic on 64-bit too
 - print out NX fault warnings on 64-bit as well
 - factor out the NX fault message to make it git-greppable and readable

Note that this means that we do the PF_INSTR check on 32-bit non-PAE
as well where it should not occur ... normally. Cannot hurt.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:43 +01:00
Ingo Molnar f2f13a8535 x86, mm: fault.c, reorder functions
Impact: cleanup

Avoid a couple more #ifdefs by moving fundamentally non-unifiable
functions into a single #ifdef 32-bit / #else / #endif block in
fault.c: vmalloc*(), dump_pagetable(), check_vm8086_mode().

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:43 +01:00
Ingo Molnar b18018126f x86, mm, kprobes: fault.c, simplify notify_page_fault()
Impact: cleanup

Remove an #ifdef from notify_page_fault(). The function still
compiles to nothing in the !CONFIG_KPROBES case.

Introduce kprobes_built_in() and kprobe_fault_handler() helpers
to allow this - they returns 0 if !CONFIG_KPROBES.

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:42 +01:00
Ingo Molnar b814d41f09 x86, mm: fault.c, simplify kmmio_fault()
Impact: cleanup

Remove an #ifdef from kmmio_fault() - we can do this by
providing default implementations for is_kmmio_active()
and kmmio_handler(). The compiler optimizes it all away
in the !CONFIG_MMIOTRACE case.

Also, while at it, clean up mmiotrace.h a bit:

 - standard header guards
 - standard vertical spaces for structure definitions

No code changed (both with mmiotrace on and off in the config):

   text	   data	    bss	    dec	    hex	filename
   2947	     12	     12	   2971	    b9b	fault.o.before
   2947	     12	     12	   2971	    b9b	fault.o.after

Cc: Pekka Paalanen <pq@iki.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:42 +01:00
Ingo Molnar 121d5d0a7e x86, mm: fault.c, enable PF_RSVD checks on 32-bit too
Impact: improve page fault handling robustness

The 'PF_RSVD' flag (bit 3) of the page-fault error_code is a
relatively recent addition to x86 CPUs, so the 32-bit do_fault()
implementation never had it. This flag gets set when the CPU
detects nonzero values in any reserved bits of the page directory
entries.

Extend the existing 64-bit check for PF_RSVD in do_page_fault()
to 32-bit too. If we detect such a fault then we print a more
informative oops and the pagetables.

This unifies the code some more, removes an ugly #ifdef and improves
the 32-bit page fault code robustness a bit. It slightly increases
the 32-bit kernel text size.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:41 +01:00
Ingo Molnar 8c938f9fae x86, mm: fault.c, factor out the vm86 fault check
Impact: cleanup

Instead of an ugly, open-coded, #ifdef-ed vm86 related legacy check
in do_page_fault(), put it into the check_v8086_mode() helper
function and merge it with an existing #ifdef.

Also, simplify the code flow a tiny bit in the helper.

No code changed:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   2711	     12	     12	   2735	    aaf	fault.o.before
   2711	     12	     12	   2735	    aaf	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:41 +01:00
Ingo Molnar 107a03678c x86, mm: fault.c, refactor/simplify the is_prefetch() code
Impact: no functionality changed

Factor out the opcode checker into a helper inline.

The code got a tiny bit smaller:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

And it got cleaner / easier to review as well.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:40 +01:00
Ingo Molnar 2d4a71676f x86, mm: fault.c cleanup
Impact: cleanup, no code changed

Clean up various small details, which can be correctness checked
automatically:

 - tidy up the include file section
 - eliminate unnecessary includes
 - introduce show_signal_msg() to clean up code flow
 - standardize the code flow
 - standardize comments and other style details
 - more cleanups, pointed out by checkpatch

No code changed on either 32-bit nor 64-bit:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4632	     32	     24	   4688	   1250	fault.o.after

the md5 changed due to a change in a single instruction:

   2e8a8241e7f0d69706776a5a26c90bc0  fault.o.before.asm
   c5c3d36e725586eb74f0e10692f0193e  fault.o.after.asm

Because a __LINE__ reference in a WARN_ONCE() has changed.

On 32-bit a few stack offsets changed - no code size difference
nor any functionality difference.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:39 +01:00
Steven Rostedt 90c7ac49aa ftrace: immediately stop code modification if failure is detected
Impact: fix to prevent NMI lockup

If the page fault handler produces a WARN_ON in the modifying of
text, and the system is setup to have a high frequency of NMIs,
we can lock up the system on a failure to modify code.

The modifying of code with NMIs allows all NMIs to modify the code
if it is about to run. This prevents a modifier on one CPU from
modifying code running in NMI context on another CPU. The modifying
is done through stop_machine, so only NMIs must be considered.

But if the write causes the page fault handler to produce a warning,
the print can slow it down enough that as soon as it is done
it will take another NMI before going back to the process context.
The new NMI will perform the write again causing another print and
this will hang the box.

This patch turns off the writing as soon as a failure is detected
and does not wait for it to be turned off by the process context.
This will keep NMIs from getting stuck in this back and forth
of print outs.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-20 14:30:18 -05:00
Steven Rostedt 1623963097 ftrace, x86: make kernel text writable only for conversions
Impact: keep kernel text read only

Because dynamic ftrace converts the calls to mcount into and out of
nops at run time, we needed to always keep the kernel text writable.

But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
the kernel code to writable before ftrace modifies the text, and converts
it back to read only afterward.

The kernel text is converted to read/write, stop_machine is called to
modify the code, then the kernel text is converted back to read only.

The original version used SYSTEM_STATE to determine when it was OK
or not to change the code to rw or ro. Andrew Morton pointed out that
using SYSTEM_STATE is a bad idea since there is no guarantee to what
its state will actually be.

Instead, I moved the check into the set_kernel_text_* functions
themselves, and use a local variable to determine when it is
OK to change the kernel text RW permissions.

[ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-20 14:30:06 -05:00
Alok Kataria fdb17aeb28 x86, vmi: TSC going backwards check in vmi clocksource, cleanup
clean up vmi_read_cycles to use max()

Reported-b: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 19:31:03 +01:00
Ingo Molnar c9e1585b1b Merge branch 'tip/x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into x86/mm 2009-02-20 18:51:43 +01:00
Ingo Molnar 7a5714e018 x86, pat: add large-PAT check to split_large_page()
Impact: future-proof the split_large_page() function

Linus noticed that split_large_page() is not safe wrt. the
PAT bit: it is bit 12 on the 1GB and 2MB page table level
(_PAGE_BIT_PAT_LARGE), and it is bit 7 on the 4K page
table level (_PAGE_BIT_PAT).

Currently it is not a problem because we never set
_PAGE_BIT_PAT_LARGE on any of the large-page mappings - but
should this happen in the future the split_large_page() would
silently lift bit 12 into the lowlevel 4K pte and would start
corrupting the physical page frame offset. Not fun.

So add a debug warning, to make sure if something ever sets
the PAT bit then this function gets updated too.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 17:48:49 +01:00
Steven Rostedt 3c3e5694ad x86: check PMD in spurious_fault handler
Impact: fix to prevent hard lockup on bad PMD permissions

If the PMD does not have the correct permissions for a page access,
but the PTE does, the spurious fault handler will mistake the fault
as a lazy TLB transaction. This will result in an infinite loop of:

 fault -> spurious_fault check (pass) -> return to code -> fault

This patch adds a check and a warn on if the PTE passes the permissions
but the PMD does not.

[ Updated: Ingo Molnar suggested using WARN_ONCE with some text ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-20 11:44:47 -05:00
Ingo Molnar 609162850d Merge branches 'x86/asm', 'x86/cleanups' and 'x86/headers' into x86/core 2009-02-20 17:40:50 +01:00
Ingo Molnar 3b6f7b9beb Merge branch 'x86/urgent' into x86/core 2009-02-20 17:40:43 +01:00
Vegard Nossum ecab22aa6d x86: use symbolic constants for MSR_IA32_MISC_ENABLE bits
Impact: Cleanup. No functional changes.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 12:07:43 +01:00
Ingo Molnar 64b36ca7f4 Merge branches 'tracing/function-graph-tracer' and 'linus' into tracing/core 2009-02-20 11:35:57 +01:00
Ingo Molnar 07a66d7c53 x86: use the right protections for split-up pagetables
Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().

The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.

The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.

With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.

Also update the comment.

This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.

[ Huang Ying also experienced problems in this area when writing
  the EFI code, but the real bug in split_large_page() was not
  realized back then. ]

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Huang Ying <ying.huang@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 08:35:03 +01:00
Tejun Heo 11124411aa x86: convert to the new dynamic percpu allocator
Impact: use new dynamic allocator, unified access to static/dynamic
        percpu memory

Convert to the new dynamic percpu allocator.

* implement populate_extra_pte() for both 32 and 64
* update setup_per_cpu_areas() to use pcpu_setup_static()
* define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr()
* define config HAVE_DYNAMIC_PER_CPU_AREA

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:09 +09:00
Rusty Russell b36128c830 alloc_percpu: change percpu_ptr to per_cpu_ptr
Impact: cleanup

There are two allocated per-cpu accessor macros with almost identical
spelling.  The original and far more popular is per_cpu_ptr (44
files), so change over the other 4 files.

tj: kill percpu_ptr() and update UP too

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: mingo@redhat.com
Cc: lenb@kernel.org
Cc: cpufreq@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:08 +09:00
Lai Jiangshan 42f8faecf7 x86: use percpu data for 4k hardirq and softirq stacks
Impact: economize memory for large NR_CPUS

percpu data is setup earlier than irq, we can use percpu data
to economize memory.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:26:10 +09:00
Alok N Kataria 48ffc70b67 x86, vmi: TSC going backwards check in vmi clocksource
Impact: fix time warps under vmware

Similar to the check for TSC going backwards in the TSC clocksource,
we also need this check for VMI clocksource.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
2009-02-20 07:53:08 +01:00
H. Peter Anvin f6d1826dfa x86, mce: use %ll instead of %L for 64-bit numbers
Impact: Cleanup

The standard spelling of a printf pattern for long long is "ll", not
"L", which is for long double.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 15:44:58 -08:00
Andi Kleen b79109c3bb x86, mce: separate correct machine check poller and fatal exception handler
Impact: cleanup, performance enhancement

The machine check poller is diverging more and more from the fatal
exception handler. Instead of adding more special cases separate the code
paths completely. The corrected poll path is actually quite simple,
and this doesn't result in much code duplication.

This makes both handlers much easier to read and results in
cleaner code flow.  The exception handler now only needs to care
about uncorrected errors, which also simplifies the handling of multiple
errors. The corrected poller also now always runs in standard interrupt
context and does not need to do anything special to handle NMI context.

Minor behaviour changes:
- MCG status is now not cleared on polling.
- Only the banks which had corrected errors get cleared on polling
- The exception handler only clears banks with errors now

v2: Forward port to new patch order. Add "uc" argument.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:52:20 -08:00
Andi Kleen b5f2fa4ea0 x86, mce: factor out duplicated struct mce setup into one function
Impact: cleanup

This merely factors out duplicated code to set up
the initial struct mce state into a single function.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:51:39 -08:00
Andi Kleen 0d7482e3d7 x86, mce: implement dynamic machine check banks support
Impact: cleanup; making code future proof; memory saving on small systems

This patch replaces the hardcoded max number of machine check banks with 
dynamic allocation depending on what the CPU reports. The sysfs
data structures and the banks array are dynamically allocated.

There is still a hard bank limit (128) because the mcelog protocol uses
banks >= 128 as pseudo banks to escape other events. But we expect
that 128 banks is beyond any reasonable CPU for now.

This supersedes an earlier patch by Venki, but it solves the problem
more completely by making the limit fully dynamic (up to the 128
boundary).

This saves some memory on machines with less than 6 banks because
they won't need sysdevs for unused ones and also allows to 
use sysfs to control these banks on possible future CPUs with
more than 6 banks.

This is an updated patch addressing Venki's comments.  I also added in
another patch from Thomas which fixed the error allocation path (that
patch was previously separated)

Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:50:58 -08:00
Andi Kleen e35849e910 x86, mce: enable machine checks in 64-bit defconfig
Impact: Low priority fix

The 32-bit defconfig already had it enabled. And it's a pretty
fundamental feature, so better enable it on 64 bits too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:48:55 -08:00
Ingo Molnar e9ce0c37c2 Merge branch 'x86/untangle2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/headers 2009-02-19 18:15:01 +01:00
Linus Torvalds bcf8951fc2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
  x86, mce: use force_sig_info to kill process in machine check
  x86, mce: reinitialize per cpu features on resume
  x86, rcu: fix strange load average and ksoftirqd behavior
2009-02-19 09:14:35 -08:00
Cyrill Gorcunov cb425afd21 x86: compressed head_32 - use ENTRY,ENDPROC macros
Impact: clenaup

Linker script will put startup_32 at predefined
address so using startup_32 will not bloat the
code size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:01 +01:00
Cyrill Gorcunov 2d4eeecb98 x86: compressed head_64 - use ENTRY,ENDPROC macros
Impact: clenaup

Linker script will put startup_32 at predefined
address so using ENTRY will not bloat the code
size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:01 +01:00
Cyrill Gorcunov 324bda9e47 x86: pmjump - use GLOBAL,ENDPROC macros
Impact: cleanup

We are in setup stage so we use GLOBAL
instead of ENTRY and do not increase code
size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:00 +01:00
Cyrill Gorcunov 2f79555097 x86: copy.S - use GLOBAL,ENDPROC macros
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:00 +01:00
Cyrill Gorcunov 1b25f3b4e1 x86: linkage - get rid of _X86 macros
Impact: cleanup

There was an attempt to bring build-time checking for
missed ENTRY_X86/END_X86 and KPROBE... pairs. Using
them will add messy in code. Get just rid of them.
This commit could be easily restored if the need appear
in future.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:12:59 +01:00
Cyrill Gorcunov 95695547a7 x86: asm linkage - introduce GLOBAL macro
If the code is time critical and this entry is called
from other places we use ENTRY to have it globally defined
and especially aligned.

Contrary we have some snippets which are size
critical. So we use plane ".globl name; name:"
directive. Introduce GLOBAL macro for this.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:12:59 +01:00
Hiroshi Shimamoto 71d8f9784a x86: syscalls.h: remove asmlinkage from declaration of sys_rt_sigreturn()
Impact: cleanup

asmlinkage for sys_rt_sigreturn() no longer exists in arch/x86/kernel/signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 12:18:54 +01:00
Ingo Molnar 4cd0332db7 Merge branch 'mainline/function-graph' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/function-graph-tracer 2009-02-19 12:13:33 +01:00
Jaswinder Singh Rajput de5483029b x86: include/asm/processor.h remove double declaration of print_cpu_info
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 10:12:18 +01:00
Ingo Molnar 72c26c9a26 Merge branch 'linus' into tracing/blktrace
Conflicts:
	block/blktrace.c

Semantic merge:
	kernel/trace/blktrace.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 09:00:35 +01:00
KAMEZAWA Hiroyuki f2dbcfa738 mm: clean up for early_pfn_to_nid()
What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
Steven Rostedt 712406a6bf tracing/function-graph-tracer: make arch generic push pop functions
There is nothing really arch specific of the push and pop functions
used by the function graph tracer. This patch moves them to generic
code.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18 13:43:04 -05:00
Huang Ying 54b6a1bd53 crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform
Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
instructions that are going to be introduced in the next generation of
Intel processor, as of 2009. These instructions enable fast and secure
data encryption and decryption, using the Advanced Encryption Standard
(AES), defined by FIPS Publication number 197.  The architecture
introduces six instructions that offer full hardware support for
AES. Four of them support high performance data encryption and
decryption, and the other two instructions support the AES key
expansion procedure.

The white paper can be downloaded from:

http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf

AES may be used in soft_irq context, but MMX/SSE context can not be
touched safely in soft_irq context. So in_interrupt() is checked, if
in IRQ or soft_irq context, the general x86_64 implementation are used
instead.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-02-18 16:48:06 +08:00