Commit Graph

1707 Commits

Author SHA1 Message Date
Jeremy Fitzhardinge d75cd22fdd x86/paravirt: split sysret and sysexit
Don't conflate sysret and sysexit; they're different instructions with
different semantics, and may be in use at the same time (at least
within the same kernel, depending on whether its an Intel or AMD
system).

sysexit - just return to userspace, does no register restoration of
    any kind; must explicitly atomically enable interrupts.

sysret - reloads flags from r11, so no need to explicitly enable
    interrupts on 64-bit, responsible for restoring usermode %gs

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citirx.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:13:15 +02:00
Eduardo Habkost 0814e0bace x86, 64-bit: split set_pte_vaddr()
We will need to set a pte on l3_user_pgt. Extract set_pte_vaddr_pud()
from set_pte_vaddr(), that will accept the l3 page table as parameter.

This change should be a no-op for existing code.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:09 +02:00
Jeremy Fitzhardinge 4f30cb0262 x86, 64-bit: PSE no longer a hard requirement
Because Xen doesn't support PSE mappings in guests, all code which
assumed the presence of PSE has been changed to fall back to smaller
mappings if necessary.  As a result, PSE is optional rather than
required (though still used whereever possible).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:08 +02:00
Jeremy Fitzhardinge f97013fd8f x86, 64-bit: split x86_64_start_kernel
Split x86_64_start_kernel() into two pieces:

   The first essentially cleans up after head_64.S.  It clears the
   bss, zaps low identity mappings, sets up some early exception
   handlers.

   The second part preserves the boot data, reserves the kernel's
   text/data/bss, pagetables and ramdisk, and then starts the kernel
   proper.

This split is so that Xen can call the second part to do the set up it
needs done.  It doesn't need any of the first part setups, because it
doesn't boot via head_64.S, and its redundant or actively damaging.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:06 +02:00
Jeremy Fitzhardinge 408011759c x86, 64-bit: add FIX_PARAVIRT_BOOTMAP fixmap slot
This matches 32 bit.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:05 +02:00
Eduardo Habkost a6523748bd paravirt/x86, 64-bit: move __PAGE_OFFSET to leave a space for hypervisor
Set __PAGE_OFFSET to the most negative possible address +
16*PGDIR_SIZE.  The gap is to allow a space for a hypervisor to fit.
The gap is more or less arbitrary, but it's what Xen needs.

When booting native, kernel/head_64.S has a set of compile-time
generated pagetables used at boot time.  This patch removes their
absolutely hard-coded layout, and makes it parameterised on
__PAGE_OFFSET (and __START_KERNEL_map).

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:04 +02:00
Jeremy Fitzhardinge 491eccb721 x86/paravirt: define PARA_INDIRECT for indirect asm calls
On 32-bit it's best to use a %cs: prefix to access memory where the
other segments may not bet set up properly yet.  On 64-bit it's best
to use a rip-relative addressing mode.  Define PARA_INDIRECT() to
abstract this and generate the proper addressing mode in each case.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:04 +02:00
Jeremy Fitzhardinge 97349135fe x86/paravirt: add debugging for missing operations
Rather than just jumping to 0 when there's a missing operation, raise a BUG.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:03 +02:00
Jeremy Fitzhardinge eba0045ff8 x86/paravirt: add a pgd_alloc/free hooks
Add hooks which are called at pgd_alloc/free time.  The pgd_alloc hook
may return an error code, which if non-zero, causes the pgd allocation
to be failed.  The hooks may be used to allocate/free auxillary
per-pgd information.

also fix:

> * Ingo Molnar <mingo@elte.hu> wrote:
>
>  include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
>  include/asm/pgalloc.h:14: error: parameter name omitted
>  arch/x86/kernel/entry_64.S: In file included from
>  arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
>  include/asm/pgalloc.h:14: error: parameter name omitted

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:01 +02:00
Jeremy Fitzhardinge 15878c0b21 x86, 64-bit: add sync_cmpxchg
Add sync_cmpxchg to match 32-bit's sync_cmpxchg.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:58 +02:00
Jeremy Fitzhardinge 102e3b8d3f x86, 64-bit: add prototype for x86_64_start_kernel()
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:58 +02:00
Yinghai Lu 29f784e369 x86: change some functions in setup.c to static
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:54 +02:00
Alok Kataria 3381959da5 x86: cleanup e820_setup_gap(), add e820_search_gap(), v2
This is a preparatory patch for the next patch in series.
Moves some code from e820_setup_gap to a new function e820_search_gap.
This patch is a part of a bug fix where we walk the ACPI table to calculate
a gap for PCI optional devices.

v1->v2: Patch on top of tip/master.
	Fixes a bug introduced in the last patch about the typeof "last".
	Also the new function e820_search_gap now returns if we found a gap in
	e820_map.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: lenb@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:39 +02:00
Yinghai Lu c987d12f84 x86: remove end_pfn in 64bit
and use max_pfn directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:38 +02:00
Yinghai Lu f47f9d538e x86: numa 32 using apicid_2_node to get node for logical_apicid
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:37 +02:00
Yinghai Lu 3a58a2a6c8 x86: introduce init_memory_mapping for 32bit #3
move kva related early backto initmem_init for numa32

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:33 +02:00
Yinghai Lu cfb0e53b05 x86: introduce init_memory_mapping for 32bit #2
moving relocate_initrd early

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:32 +02:00
Yinghai Lu 4e29684c40 x86: introduce init_memory_mapping for 32bit #1
... so can we use mem below max_low_pfn earlier.

this allows us to move several functions more early instead of waiting
to after paging_init.

That includes moving relocate_initrd() earlier in the bootup, and kva
related early setup done in initmem_init. (in followup patches)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:32 +02:00
Jeremy Fitzhardinge c3c2fee384 x86: unify mmu_context.h
Some amount of asm-x86/mmu_context.h can be unified, including
activate_mm paravirt hook.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:31 +02:00
Jeremy Fitzhardinge fb15a9b304 x86: unify pgd_index
pgd_index is common for 32 and 64-bit, so move it to a common place.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:30 +02:00
Eduardo Habkost e7a9b0b3c3 x86, 64-bit: use __pgd() on mk_kernel_pgd()
Use __pgd() on mk_kernel_pgd()

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:29 +02:00
Jeremy Fitzhardinge 43adfc26de x86, 64-bit: add gate_offset() and gate_segment() macros
For calculating the offset from struct gate_struct fields.

[ gate_offset and gate_segment were broken for 32-bit. ]

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:28 +02:00
Jeremy Fitzhardinge 4583ed514e x86, 64-bit: unify early_ioremap
The 32-bit early_ioremap will work equally well for 64-bit, so just use it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:28 +02:00
Jeremy Fitzhardinge af2b1c609f x86: add memory barriers to wrmsr
wrmsr is a special instruction which can have arbitrary system-wide
effects.  We don't want the compiler to reorder it with respect to
memory operations, so make it a memory barrier.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:24 +02:00
Jeremy Fitzhardinge d338c73c39 x86: add memory clobber to save/loadsegment
Add "memory" clobbers to savesegment and loadsegment, since they can
affect memory accesses and we never want the compiler to reorder them
with respect to memory references.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:23 +02:00
Jeremy Fitzhardinge bea41808ef x86: asm-x86/pgtable.h: fix compiler warning
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:23 +02:00
Cyrill Gorcunov 4de0043617 x86: nmi_watchdog - introduce nmi_watchdog_active() helper
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: macro@linux-mips.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:51:42 +02:00
Cyrill Gorcunov c376d45432 x86: nmi_watchdog - use NMI_NONE by default
There is no need to keep NMI_DISABLED definition and use it
for nmi_watchdog by default. Here is the point why:

- IO-APIC and APIC chips are programmed for nmi_watchdog support at very
  early stage of kernel booting and not having nmi_watchdog specified as
  boot option lead only to nmi_watchdog becomes to NMI_NONE anyway
- enable nmi_watchdog thru /proc/sys/kernel/nmi if it was not specified at
  boot is not possible too (even having this sysfs entry)

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: macro@linux-mips.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:51:41 +02:00
Paul Jackson c4ba1320b7 x86 boot: allow overlapping early reserve memory ranges
Add support for overlapping early memory reservations.

In general, they still can't overlap, and will panic
with "Overlapping early reservations" if they do overlap.

But if a memory range is reserved with the new call:
    reserve_early_overlap_ok()
rather than with the usual call:
    reserve_early()
then subsequent early reservations are allowed to overlap.

This new reserve_early_overlap_ok() call is only used in one
place so far, which is the "BIOS reserved" reservation for the
the EBDA region, which out of Paranoia reserves more than what
the BIOS might have specified, and which thus might overlap with
another legitimate early memory reservation (such as, perhaps,
the EFI memmap.)

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: "Yinghai Lu" <yhlu.kernel@gmail.com>
Cc: "Jack Steiner" <steiner@sgi.com>
Cc: "Mike Travis" <travis@sgi.com>
Cc: "Huang
Cc: Ying" <ying.huang@intel.com>
Cc: "Andi Kleen" <andi@firstfloor.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:51:26 +02:00
Yinghai Lu 11cd0bc140 x86: move some func calling from setup_arch to paging_init
those function depend on paging setup pgtable, so they could access
the ram in bootmem region but just get mapped.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:24 +02:00
Yinghai Lu 2ec65f8b89 x86: clean up using max_low_pfn on 32-bit
so that max_low_pfn is not changed after it is set.
so we can move that early and out of initmem_init.

could call find_low_pfn_range just after max_pfn is set.

also could move reserve_initrd out of setup_bootmem_allocator

so 32bit is more like 64bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:20 +02:00
Yinghai Lu 225c37d71b x86: introduce reserve_initrd
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:16 +02:00
Yinghai Lu b2ac82a090 x86: introduce initmem_init for 32 bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:15 +02:00
Yinghai Lu 1f75d7e32e x86: introduce initmem_init for 64 bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:14 +02:00
Yinghai Lu ce97c40e28 x86: move reserve_standard_io_resource to setup.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:12 +02:00
Yinghai Lu a9c1182fbd x86: seperate probe_roms into another file
it is only needed for 32bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:05 +02:00
Yinghai Lu 7a1fd9866c x86: add e820_remove_range
... so could add real hole in e820

agp check is using request_mem_region, and could fail if e820 is reserved...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:37 +02:00
Yinghai Lu 9a25034759 x86: change identify_cpu to static
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:35 +02:00
Yinghai Lu f580366f77 x86: seperate funcs from setup_64 to cpu common_64.c
Signed-off-by: Yinghai Lu <yhlu.kernel@mail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:34 +02:00
Yinghai Lu 3c999f1426 x86: check command line when CONFIG_X86_MPPARSE is not set, v2
if acpi=off, acpi=noirq and pci=noacpi, we need to disable apic.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:31 +02:00
Glauber Costa 1481a3dd42 x86: move cpu_exit_clear to process_32.c
Take it out of smpboot.c, and move it to process_32.c, closer
to its only user.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:24 +02:00
Glauber Costa 3fde690011 x86: change __setup_vector_irq with setup_vector_irq
We create a version of it for i386, and then take the CONFIG_X86_64
ifdef out of the game. We could create a __setup_vector_irq for i386,
but it would incur in an unnecessary lock taking. Moreover, it is better
practice to only export setup_vector_irq anyway.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:21 +02:00
Glauber Costa a939098afc x86: move x86_64 gdt closer to i386
i386 and x86_64 used two different schemes for maintaining the gdt.
With this patch, x86_64 initial gdt table is defined in a .c file,
same way as i386 is now. Also, we call it "gdt_page", and the descriptor,
"early_gdt_descr". This way we achieve common naming, which can allow for
more code integration.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:16 +02:00
Bernhard Walle 1ecd27657b x86: unify crashkernel reservation for 32 and 64 bit
This patch moves the reserve_crashkernel() to setup.c and removes the
architecture-specific version. Both versions were more or less the same.

I tested it on both x86-64 and i386, with CONFIG_KEXEC on and off (so
that it compiles).

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: yhlu.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:45:44 +02:00
Ingo Molnar 6236af82d8 Merge branch 'x86/fixmap' into x86/devel
Conflicts:

	arch/x86/mm/init_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:24:29 +02:00
Ingo Molnar e3ae0acf59 Merge branch 'x86/uv' into x86/devel 2008-07-08 12:24:13 +02:00
Jack Steiner ab9c0bb8a8 x86: increase MAX_APICS for very large x86-64 configs
Increase the maximum number of apics when running very large
configurations. This patch has no affect on most systems.

The patch has no effect on any 32-bit kernel. It adds ~4k to the size
of 64-bit kernels but only if NR_CPUS > 255.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:29 +02:00
Jack Steiner b6df1b8bc1 x86: fix stack overflow for large values of MAX_APICS
physid_mask_of_physid() causes a huge stack (12k) to be created if the
number of APICS is large. Replace physid_mask_of_physid() with a
new function that does not create large stacks. This is a problem only
on large x86_64 systems.

this paves the way to increase MAX_APICS.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Cc: mingo@elte.hu
Cc: tglx@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:28 +02:00
Ingo Molnar dc163a41ff SGI UV: TLB shootdown using broadcast assist unit
TLB shootdown for SGI UV.

v5: 6/12 corrections/improvements per Ingo's second review

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:25 +02:00
Cliff Wickman b194b12050 SGI UV: TLB shootdown using broadcast assist unit, cleanups
TLB shootdown for SGI UV.

v1: 6/2 original
v2: 6/3 corrections/improvements per Ingo's review
v3: 6/4 split atomic operations off to a separate patch (Jeremy's review)
v4: 6/12 include <mach_apic.h> rather than <asm/mach-bigsmp/mach_apic.h>
         (fixes a !SMP build problem that Ingo found)
         fix the index on uv_table_bases[blade]

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:24 +02:00
Cliff Wickman 73e991f45f x86 atomic operations: atomic_or_long() atomic_inc_short()
Provide atomic operations for increment of a 16-bit integer and
logical OR into a 64-bit integer.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:23 +02:00
Cliff Wickman 1812924bb1 x86, SGI UV: TLB shootdown using broadcast assist unit
TLB shootdown for SGI UV.

Depends on patch (in tip/x86/irq):
   x86-update-macros-used-by-uv-platform.patch   Jack Steiner May 29

This patch provides the ability to flush TLB's in cpu's that are not on
the local node.  The hardware mechanism for distributing the flush
messages is the UV's "broadcast assist unit".

The hook to intercept TLB shootdown requests is a 2-line change to
native_flush_tlb_others() (arch/x86/kernel/tlb_64.c).

This code has been tested on a hardware simulator. The real hardware
is not yet available.

The shootdown statistics are provided through /proc/sgi_uv/ptc_statistics.
The use of /sys was considered, but would have required the use of
many /sys files.  The debugfs was also considered, but these statistics
should be available on an ongoing basis, not just for debugging.

Issues to be fixed later:
- The IRQ for the messaging interrupt is currently hardcoded as 200
  (see UV_BAU_MESSAGE).  It should be dynamically assigned in the future.
- The use of appropriate udelay()'s is untested, as they are a problem
  in the simulator.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:23:22 +02:00
Ingo Molnar d98b940ab2 Merge branch 'linus' into x86/irq 2008-07-08 12:23:00 +02:00
Ingo Molnar 4b62ac9a2b Merge branch 'x86/nmi' into x86/devel
Conflicts:

	arch/x86/kernel/nmi.c
	arch/x86/kernel/nmi_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:17:08 +02:00
Ingo Molnar 2b4fa851b2 Merge branch 'x86/numa' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/e820.c
	arch/x86/kernel/efi_64.c
	arch/x86/kernel/mpparse.c
	arch/x86/kernel/setup.c
	arch/x86/kernel/setup_32.c
	arch/x86/mm/init_64.c
	include/asm-x86/proto.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:59:23 +02:00
Bernhard Walle 3fd052b1b4 x86: add flags parameter to reserve_bootmem_generic()
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.

It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.

The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:49:49 +02:00
Mike Travis 3461b0af02 x86: remove static boot_cpu_pda array v2
* Remove the boot_cpu_pda array and pointer table from the data section.
    Allocate the pointer table and array during init.  do_boot_cpu()
    will reallocate the pda in node local memory and if the cpu is being
    brought up before the bootmem array is released (after_bootmem = 0),
    then it will free the initial pda.  This will happen for all cpus
    present at system startup.

    This removes 512k + 32k bytes from the data section.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:25 +02:00
Mike Travis 9f248bde9d x86: remove the static 256k node_to_cpumask_map
* Consolidate node_to_cpumask operations and remove the 256k
    byte node_to_cpumask_map.  This is done by allocating the
    node_to_cpumask_map array after the number of possible nodes
    (nr_node_ids) is known.

  * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
    been increased.  It now shows faults when calling node_to_cpumask()
    and node_to_cpumask_ptr().

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:24 +02:00
Mike Travis 7891a24e1e x86: restore pda nodenumber field
* Restore the nodenumber field in the x86_64 pda.  This field is slightly
    different than the x86_cpu_to_node_map mainly because it's a static
    indication of which node the cpu is on while the cpu to node map is a
    dyanamic mapping that may get reset if the cpu goes offline.  This also
    simplifies the numa_node_id() macro.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:23 +02:00
Mike Travis 23ca4bba3e x86: cleanup early per cpu variables/accesses v4
* Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
    used by some per_cpu variables that are initialized and accessed
    before there are per_cpu areas allocated.

    ["Early" in respect to per_cpu variables is "earlier than the per_cpu
    areas have been setup".]

    This patchset adds these new macros:

	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
	EXPORT_EARLY_PER_CPU_SYMBOL(_name)
	DECLARE_EARLY_PER_CPU(_type, _name)

	early_per_cpu_ptr(_name)
	early_per_cpu_map(_name, _idx)
	early_per_cpu(_name, _cpu)

    The DEFINE macro defines the per_cpu variable as well as the early
    map and pointer.  It also initializes the per_cpu variable and map
    elements to "_initvalue".  The early_* macros provide access to
    the initial map (usually setup during system init) and the early
    pointer.  This pointer is initialized to point to the early map
    but is then NULL'ed when the actual per_cpu areas are setup.  After
    that the per_cpu variable is the correct access to the variable.

    The early_per_cpu() macro is not very efficient but does show how to
    access the variable if you have a function that can be called both
    "early" and "late".  It tests the early ptr to be NULL, and if not
    then it's still valid.  Otherwise, the per_cpu variable is used
    instead:

	#define early_per_cpu(_name, _cpu) 			\
		(early_per_cpu_ptr(_name) ?			\
			early_per_cpu_ptr(_name)[_cpu] :	\
			per_cpu(_name, _cpu))

    A better method is to actually check the pointer manually.  In the
    case below, numa_set_node can be called both "early" and "late":

	void __cpuinit numa_set_node(int cpu, int node)
	{
	    int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);

	    if (cpu_to_node_map)
		    cpu_to_node_map[cpu] = node;
	    else
		    per_cpu(x86_cpu_to_node_map, cpu) = node;
	}

  * Add a flag "arch_provides_topology_pointers" that indicates pointers
    to topology cpumask_t maps are available.  Otherwise, use the function
    returning the cpumask_t value.  This is useful if cpumask_t set size
    is very large to avoid copying data on to/off of the stack.

  * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
    the non-debug case has been optimized a bit.

  * Remove an unreferenced compiler warning in drivers/base/topology.c

  * Clean up #ifdef in setup.c

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:20 +02:00
Ingo Molnar 3de352bbd8 Merge branch 'x86/mpparse' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/io_apic_32.c
	arch/x86/kernel/setup_64.c
	arch/x86/mm/init_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:14:58 +02:00
Yinghai Lu fcfa146e41 x86: update mptable fix with no ioapic v2
if the system doesn't have ioapic, we don't need to store entries for mptable
update

also let mp_config_acpi_gsi not call func in mpparse
so later could decouple mpparse with acpi more easily

Reported-by: Daniel Exner <dex@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Exner <dex@dragonslave.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:39:07 +02:00
Yinghai Lu 95a71a45c2 x86: cleanup machine_specific_memory_setup, v2
1. let 64bit support 88 and e801 too
2. introduce default_machine_specific_memory_setup, and reuse it
   for voyager

v2: fix 64 bit compiling

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:39:01 +02:00
Yinghai Lu 66a6f8d539 x86: remove unused file after numaq etc depends on genericarch
we don't need those mach_mpspec.h files now.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:54 +02:00
Yinghai Lu 1c6e55032e x86: use acpi_numa_init to parse on 32-bit numa
seperate SRAT finding and parsing from get_memcfg_from_srat,
and let getmemcfg_from_srat only handle array from previous step.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:47 +02:00
Yinghai Lu 064d25f120 x86: merge setup_memory_map with e820
... and kill e820_32/64.c and e820_32/64.h

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:25 +02:00
Yinghai Lu cc9f7a0ccf x86: kill bad_ppro
so don't punish all other cpus without that problem when init highmem

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:19 +02:00
Yinghai Lu 41c094fd3c x86: move e820_resource_resources to e820.c
and make 32-bit resource registration more like 64 bit.

also move probe_roms back to setup_32.c

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:14 +02:00
Huang, Ying 8c5beb50d3 x86 boot: pass E820 memory map entries more than 128 via linked list of setup data
Because of the size limits of struct boot_params (zero page), the
maximum number of E820 memory map entries can be passed to kernel is
128. As pointed by Paul Jackson, there is some machine produced by SGI
with so many nodes that the number of E820 memory map entries is more
than 128. To enabling Linux kernel on these system, a new setup data
type named SETUP_E820_EXT is defined to pass additional memory map
entries to Linux kernel.

This patch is based on x86/auto-latest branch of git-x86 tree and has
been tested on x86_64 and i386 platform.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:39 +02:00
Yinghai Lu b5bc6c0e55 x86, mm: use add_highpages_with_active_regions() for high pages init v2
use early_node_map to init high pages, so we can remove page_is_ram() and
page_is_reserved_early() in the big loop with add_one_highpage

also remove page_is_reserved_early(), it is not needed anymore.

v2: fix the build of other platforms

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:25 +02:00
Yinghai Lu d0be6bdea1 x86: rename two e820 related functions
rename update_memory_range to e820_update_range
rename add_memory_region to e820_add_region

to make it more clear that they are about e820 map operations.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:01 +02:00
Yinghai Lu d867e5310b x86: keep MP_intsrc_info untouched if we do not update mptable
Daniel Exner reported IO-APIC enumeration breakage in linux-next.

Alexey Starikovskiy found out that it might be related to
commit 2944e16b25 "x86: update mptable".

use enable_update_mptable to decide if need check before add mp_irqs array.

Reported-by: Daniel Exner <webmaster@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:40 +02:00
Yinghai Lu d2dbf34332 x86: clean up reserve_bootmem_generic() and port it to 32-bit
1. add reserve_bootmem_generic for 32bit
2. change len to unsigned long
3. make early_res_to_bootmem to use it

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:17 +02:00
Yinghai Lu ab4a465e96 x86: e820 merge parsing of the mem=/memmap= boot parameters
since we now have 32-bit support for e820_register_active_regions(),
we can merge the parsing of the mem=/memmap= boot parameters.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:35:38 +02:00
Bernhard Walle 8b2ef1d728 x86: add flags parameter to reserve_bootmem_generic()
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.

It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.

The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:34:54 +02:00
Ingo Molnar 896395c290 Merge branch 'linus' into tmp.x86.mpparse.new 2008-07-08 10:32:56 +02:00
Ingo Molnar 1b8ba39a3f Merge branch 'x86/irq' into x86/devel
Conflicts:

	arch/x86/kernel/i8259.c
	arch/x86/kernel/irqinit_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:53:57 +02:00
Ingo Molnar 58cf35228f Merge branches 'x86/mmio', 'x86/delay', 'x86/idle', 'x86/oprofile', 'x86/debug', 'x86/ptrace' and 'x86/amd-iommu' into x86/devel 2008-07-08 09:46:15 +02:00
Ingo Molnar 3c1ca43faf Merge branch 'x86/setup' into x86/devel 2008-07-08 09:43:01 +02:00
Ingo Molnar 6924d1ab8b Merge branches 'x86/numa-fixes', 'x86/apic', 'x86/apm', 'x86/bitops', 'x86/build', 'x86/cleanups', 'x86/cpa', 'x86/cpu', 'x86/defconfig', 'x86/gart', 'x86/i8259', 'x86/intel', 'x86/irqstats', 'x86/kconfig', 'x86/ldt', 'x86/mce', 'x86/memtest', 'x86/pat', 'x86/ptemask', 'x86/resumetrace', 'x86/threadinfo', 'x86/timers', 'x86/vdso' and 'x86/xen' into x86/devel 2008-07-08 09:16:56 +02:00
Cyrill Gorcunov d3f020d2f9 x86, io-apic: define names for redirection table entry fields
Each I/O APIC redirection table entry has a number of fields.
Define names for them to eliminate reference by hard coded
numbers.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:21 +02:00
Maciej W. Rozycki d788bada2f x86: APIC/SMP: Downgrade the NMI watchdog for "noapic"
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip has been deactivated as a result of "noapic".  Downgrade to the
local APIC watchdog similarly to what is done for the UP case.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:20 +02:00
Maciej W. Rozycki 148b508309 x86: NMI watchdog: Downgrade helper
A downgrade helper for the NMI watchdog to be used in all places where
the I/O APIC watchdog may have been requested, but the I/O APIC is found
not to be there or meant to be left disabled.  This is so that the
reconfiguration is cosistent and defined in a single place only.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:16 +02:00
Thomas Gleixner 0715650958 x86: move pci_routirq declaration to pci.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:13:08 +02:00
Maciej W. Rozycki 35542c5ebc x86: I/O APIC: clean up the 8259A on a NMI watchdog failure
There is no point in keeping the 8259A enabled if the I/O APIC NMI
watchdog has failed and the 8259A is not used to pass through regular
timer interrupts.  This fixes problems with some systems where some logic
gets confused.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:59 +02:00
Maciej W. Rozycki ecd29476ae x86: I/O APIC: remove parameters to fiddle with the 8259A
Remove the "disable_8254_timer" and "enable_8254_timer" kernel
parameters.  Now that AEOI acknowledgements are no longer needed for
correct timer operation, the 8259A can be kept disabled unconditionally
unless interrupts, either timer or watchdog ones, are actually passed
through it.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:12:54 +02:00
Thomas Gleixner 65280e613f x86: janitor CPA statistics patch
1) Remove __meminit from update_pages_count. It is used inside
split_pages()

2) Make the code depend on PROC_FS. Doing statistics for nothing is
useless and not adding useless code is nice to the Linux tiny folks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 08:12:05 +02:00
Andi Kleen ce0c0e50f9 x86, generic: CPA add statistics about state of direct mapping v4
Add information about the mapping state of the direct mapping to
/proc/meminfo. I chose /proc/meminfo because that is where all the other
memory statistics are too and it is a generally useful metric even
outside debugging situations. A lot of split kernel pages means the
kernel will run slower.

This way we can see how many large pages are really used for it and how
many are split.

Useful for general insight into the kernel.

v2: Add hotplug locking to 64bit to plug a very obscure theoretical race.
    32bit doesn't need it because it doesn't support hotadd for lowmem.
    Fix some typos
v3: Rename dpages_cnt
    Add CONFIG ifdef for count update as requested by tglx
    Expand description
v4: Fix stupid bugs added in v3
    Move update_page_count to pageattr.c

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 08:11:45 +02:00
Ingo Molnar 93022136ff Merge commit 'v2.6.26-rc9' into x86/cpu 2008-07-08 07:47:47 +02:00
Robert Richter 3a27dd1ce5 x86: Move PCI IO ECS code to x86/pci
"Form follows function". Code is now where it belongs to.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 07:47:39 +02:00
Thomas Gleixner aa276e1caf x86, clockevents: add C1E aware idle function
C1E on AMD machines is like C3 but without control from the OS. Up to
now we disabled the local apic timer for those machines as it stops
when the CPU goes into C1E. This excludes those machines from high
resolution timers / dynamic ticks, which hurts especially X2 based
laptops.

The current boot time C1E detection has another, more serious flaw
as well: some BIOSes do not enable C1E until the ACPI processor module
is loaded. This causes systems to stop working after that point.

To work nicely with C1E enabled machines we use a separate idle
function, which checks on idle entry whether C1E was enabled in the
Interrupt Pending Message MSR. This allows us to do timer broadcasting
for C1E and covers the late enablement of C1E as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 07:47:18 +02:00
Ingo Molnar d763d5edf9 Merge branch 'linus' into tracing/mmiotrace 2008-07-07 08:07:35 +02:00
Ingo Molnar 68083e05d7 Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
Anthony Liguori ca3739327b x86: KVM guest: Add memory clobber to hypercalls
Hypercalls can modify arbitrary regions of memory.  Make sure to indicate this
in the clobber list.  This fixes a hang when using KVM_GUEST kernel built with
GCC 4.3.0.

This was originally spotted and analyzed by Marcelo.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-06 11:05:18 +03:00
Richard Kennedy 84e65b0a84 x86: cacheline_align tss_struct
The manual padding to align on cacheline size only worked in 32 bit
In 64 bit the structure was not aligned and contained wasted space.

use the compiler ____cachline_aligned to save space & properly align
this structure.

x86_64_default size goes from 9136 -> 8960
x86_64_AMD     size goes from 9136 -> 8896

built & running on 2.6.26-rc8.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 16:47:19 +02:00
Joerg Roedel 999ba417cc x86, AMD IOMMU: flush domain TLB when there is more than one page to flush
This patch changes the domain TLB flushing behavior of the driver. When there
is more than one page to flush it flushes the whole domain TLB instead of every
single page. So we send only a single command to the IOMMU in every case which
is faster to execute.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:44:40 +02:00
Linus Torvalds bbad5d4750 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ptrace GET/SET FPXREGS broken
  x86: fix cpu hotplug crash
  x86: section/warning fixes
  x86: shift bits the right way in native_read_tscp
2008-06-30 08:56:57 -07:00
Dmitry Baryshkov 323ec001c6 x86: use generic per-device dma coherent allocator
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 12:51:07 +02:00
Matti Linnanvuori ac2564c445 x86: add compilation checks to pci_unmap_*() macros
Add compilation checks to pci_unmap_ macros.

Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 12:22:01 +02:00
Ingo Molnar 92af4e2902 x86, AMD IOMMU, build fix #2
fix:

 arch/x86/kernel/amd_iommu.c: In function ‘amd_iommu_init_dma_ops':
 arch/x86/kernel/amd_iommu.c:940: error: lvalue required as left operand of assignment
 arch/x86/kernel/amd_iommu.c:941: error: lvalue required as left operand of assignment

due to !CONFIG_GART_IOMMU.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:52:34 +02:00
Joerg Roedel c6da992e16 x86, AMD IOMMU: add amd_iommu.h to export functions to the generic x86 dma code
This patch adds the amd_iommu.h file which will be included in the generic
code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:21 +02:00
Joerg Roedel 8d283c35a2 x86, AMD IOMMU: add header file for driver data structures and defines
This patch adds a header file local to the AMD IOMMU driver with constants and
data structures needed in the code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 10:12:07 +02:00
Max Asbock 41aefdcc98 x86: shift bits the right way in native_read_tscp
native_read_tscp shifts the bits in the high order value in the
wrong direction, the attached patch fixes that.

Signed-off-by: Max Asbock <masbock@linux.vnet.ibm.com>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-26 14:49:17 +02:00
Jens Axboe 3b16cf8748 x86: convert to generic helpers for IPI function calls
This converts x86, x86-64, and xen to use the new helpers for
smp_call_function() and friends, and adds support for
smp_call_function_single().

Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:21:54 +02:00
Jeremy Fitzhardinge 08b882c627 paravirt: add hooks for ptep_modify_prot_start/commit
This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and
ptep_modify_prot_commit.  This allows the hypervisor-specific backends to
implement these in some more efficient way.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 15:16:00 +02:00
Ingo Molnar 8700600a74 Merge branch 'linus' into x86/nmi 2008-06-25 12:31:28 +02:00
Ingo Molnar 0ed368c71a Merge branch 'linus' into x86/kconfig 2008-06-25 12:30:54 +02:00
Ingo Molnar cbd6712406 Merge branch 'linus' into x86/irq 2008-06-25 12:30:49 +02:00
Ingo Molnar 48cf937f48 Merge branch 'linus' into x86/i8259 2008-06-25 12:30:33 +02:00
Ingo Molnar 037a6079eb Merge branch 'linus' into x86/gart 2008-06-25 12:30:26 +02:00
Ingo Molnar 8b7ef4ec5b Merge branch 'linus' into x86/fixmap 2008-06-25 12:30:21 +02:00
Ingo Molnar 1262b0088f Merge branch 'linus' into x86/cleanups 2008-06-25 12:29:32 +02:00
Ingo Molnar 97e6722b8d Merge branch 'linus' into tracing/ftrace 2008-06-25 12:27:56 +02:00
Ingo Molnar d02859ecb3 Merge commit 'v2.6.26-rc8' into x86/xen
Conflicts:

	arch/x86/xen/enlighten.c
	arch/x86/xen/mmu.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 12:16:51 +02:00
Linus Torvalds 919c0d14ae Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Remove now unused structs from kvm_para.h
  x86: KVM guest: Use the paravirt clocksource structs and functions
  KVM: Make kvm host use the paravirt clocksource structs
  x86: Make xen use the paravirt clocksource structs and functions
  x86: Add structs and functions for paravirt clocksource
  KVM: VMX: Fix host msr corruption with preemption enabled
  KVM: ioapic: fix lost interrupt when changing a device's irq
  KVM: MMU: Fix oops on guest userspace access to guest pagetable
  KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
  KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
  KVM: close timer injection race window in __vcpu_run
  KVM: Fix race between timer migration and vcpu migration
2008-06-24 18:09:06 -07:00
Gerd Hoffmann 6b1ed90865 KVM: Remove now unused structs from kvm_para.h
The kvm_* structs are obsoleted by the pvclock_* ones.
Now all users have been switched over and the old structs
can be dropped.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 21:02:33 +03:00
Gerd Hoffmann 50d0a0f987 KVM: Make kvm host use the paravirt clocksource structs
This patch updates the kvm host code to use the pvclock structs.
It also makes the paravirt clock compatible with Xen.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 21:02:32 +03:00
Gerd Hoffmann 7af192c954 x86: Add structs and functions for paravirt clocksource
This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).

It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code.  They are enabled using
CONFIG_PARAVIRT_CLOCK.

Subsequent patches of this series will put the code in use.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 21:02:31 +03:00
Jeremy Fitzhardinge 2849914393 xen: remove support for non-PAE 32-bit
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used.  xen-unstable has now officially dropped
non-PAE support.  Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 17:00:55 +02:00
Robert Richter b664d6bbee x86: add X86_FEATURE_IBS cpu feature
This adds IBS to the cpu feature flags allowing Perfmon and OProfile
to use cpu_has().

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:59:40 +02:00
Andreas Herrmann bcc643dc28 x86: introduce macro to check whether an address range is in the ISA range
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:48 +02:00
Abhishek Sagar 395a59d0f8 ftrace: store mcount address in rec->ip
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:10:56 +02:00
Ingo Molnar 009b9fc98d Merge branch 'linus' into x86/threadinfo 2008-06-23 11:53:03 +02:00
Ingo Molnar f34bfb1bee Merge branch 'linus' into tracing/ftrace 2008-06-23 11:11:42 +02:00
Ingo Molnar 437a0a54ee x86, bitops: make constant-bit set/clear_bit ops faster, gcc workaround
Jeremy Fitzhardinge reported this compiler bug:

Suggestion from Linus: add "r" to the input constraint of the
set_bit()/clear_bit()'s constant 'nr' branch:

Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":

 CC      init/main.o
include2/asm/bitops.h: In function `start_kernel':
include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
include2/asm/bitops.h:59: error: impossible constraint in `asm'
include2/asm/bitops.h:59: error: impossible constraint in `asm'
include2/asm/bitops.h:59: error: impossible constraint in `asm'

Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-21 07:57:24 +02:00
Jeremy Fitzhardinge aeaaa59c7e x86/paravirt/xen: add set_fixmap pv_mmu_ops
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:56 +02:00
Jeremy Fitzhardinge d494a96125 x86: implement set_pte_vaddr
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:54 +02:00
Jeremy Fitzhardinge 7c7e6e07e2 x86: unify __set_fixmap
In both cases, I went with the 32-bit behaviour.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:51 +02:00
Jeremy Fitzhardinge 944256e00a x86: unify asm-x86/fixmap*.h
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:49 +02:00
Ingo Molnar 7dbceaf9bb x86, bitops: make constant-bit set/clear_bit ops faster, adapt, clean up
fix integration bug introduced by "x86: bitops take an unsigned long *"
which turned "(void *) + x" into "(long *) + x".

small cleanups to make it more apparent which value get propagated where.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 08:08:49 +02:00
Jan Beulich 5f0120b578 x86-64: remove unnecessary ptregs call stubs
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Andi Kleen" <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 14:25:11 +02:00
Jordan Crouse ffe6e1da86 x86, geode: add a VSA2 ID for General Software
General Software writes their own VSA2 module for their version
of the Geode BIOS, which returns a different ID then the standard
VSA2.  This was causing the framebuffer driver to break for most
GSW boards.

Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Cc: tglx@linutronix.de
Cc: linux-geode@lists.infradead.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 14:19:03 +02:00
Linus Torvalds 1a750e0cd7 x86, bitops: make constant-bit set/clear_bit ops faster
On Wed, 18 Jun 2008, Linus Torvalds wrote:
>
> And yes, the "lock andl" should be noticeably faster than the xchgl.

I dunno. Here's a untested (!!) patch that turns constant-bit
set/clear_bit ops into byte mask ops (lock orb/andb).

It's not exactly pretty. The reason for using the byte versions is that a
locked op is serialized in the memory pipeline anyway, so there are no
forwarding issues (that could slow down things when we access things with
different sizes), and the byte ops are a lot smaller than 32-bit and
particularly 64-bit ops (big constants, and the 64-bit ops need the REX
prefix byte too).

[ Side note: I wonder if we should turn the "test_bit()" C version into a
  "char *" version too.. It could actually help with alias analysis, since
  char pointers can alias anything. So it might be the RightThing(tm) to
  do for multiple reasons. I dunno. It's a separate issue. ]

It does actually shrink the kernel image a bit (a couple of hundred bytes
on the text segment for my everything-compiled-in image), and while it's
totally untested the (admittedly few) code generation points I looked at
seemed sane. And "lock orb" should be noticeably faster than "lock bts".

If somebody wants to play with it, go wild. I didn't do "change_bit()",
because nobody sane uses that thing anyway. I guarantee nothing. And if it
breaks, nobody saw me do anything.  You can't prove this email wasn't sent
by somebody who is good at forging smtp.

This does require a gcc that is recent enough for "__builtin_constant_p()"
to work in an inline function, but I suspect our kernel requirements are
already higher than that. And if you do have an old gcc that is supported,
the worst that would happen is that the optimization doesn't trigger.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 13:45:51 +02:00
Jeremy Fitzhardinge ad524d46f3 x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits.
When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
we could have up to 52 bits of physical address in a pte.

The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.

This is a bugfix for two cases:
1. running a 32-bit PAE kernel on a machine with
  more than 64GB RAM.
2. running a 32-bit PAE Xen guest on a host machine with
  more than 64GB RAM

In both cases, a pte could need to have more than 36 bits of physical,
and masking it to 36-bits will cause fairly severe havoc.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 10:08:44 +02:00
Vegard Nossum 0db125c467 x86: more header fixes
Summary: Add missing include guards for some x86 headers.

This has only had the most rudimentary testing, but is hopefully obviously
correct.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 12:27:03 +02:00
Jeremy Fitzhardinge e6e07d8a2d x86: make asm/asm.h work for asm code.
This is useful for unifying some pieces of asm code.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-06-16 23:31:20 -07:00
Ingo Molnar 7aaaec38fc Merge branch 'linus' into x86/kconfig 2008-06-16 11:28:04 +02:00
Ingo Molnar c54f9da1c8 Merge branch 'linus' into x86/irqstats 2008-06-16 11:27:53 +02:00
Ingo Molnar d939d2851f Merge branch 'linus' into x86/irq 2008-06-16 11:27:45 +02:00
Ingo Molnar 688d22e23a Merge branch 'linus' into x86/xen 2008-06-16 11:21:27 +02:00
Ingo Molnar 3557b18fcb Merge branch 'linus' into x86/ptemask 2008-06-16 11:20:37 +02:00
Ingo Molnar faeca31d06 Merge branch 'linus' into x86/pat 2008-06-16 11:20:28 +02:00
Ingo Molnar 1791a78c0b Merge branch 'linus' into x86/cleanups 2008-06-16 11:17:50 +02:00
Ingo Molnar e765ee90da Merge branch 'linus' into tracing/ftrace 2008-06-16 11:15:58 +02:00
Ingo Molnar 28638ea4f8 Merge branch 'linus' into x86/nmi
Conflicts:

	arch/x86/kernel/nmi_32.c
2008-06-16 10:17:15 +02:00
Rafael J. Wysocki 6703f6d10d x86, gart: add resume handling
If GART IOMMU is used on an AMD64 system, the northbridge registers
related to it should be restored during resume so that memory is not
corrupted.  Make gart_resume() handle that as appropriate.

Ref. http://lkml.org/lkml/2008/5/25/96 and the following thread.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 14:11:25 +02:00
Ingo Molnar bb6dfb32f9 Merge branch 'linus' into x86/gart 2008-06-12 11:27:22 +02:00
Rafael J. Wysocki bf07dc8649 x86: remove obsolete PM definitions from NMI header
Remove obsolete and no longer used PM-related definitions from
include/asm-x86/nmi.h.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 11:14:58 +02:00
Andreas Herrmann 499f8f84b8 x86: rename pat_wc_enabled to pat_enabled
BTW, what does pat_wc_enabled stand for? Does it mean
"write-combining"?

Currently it is used to globally switch on or off PAT support.
Thus I renamed it to pat_enabled.
I think this increases readability (and hope that I didn't miss
something).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:27 +02:00
Andreas Herrmann cd7a4e936d x86: PAT: fixed checkpatch errors (and whitespaces)
x86: PAT: fixed checkpatch errors (and whitespaces)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:24 +02:00
Yinghai Lu e3f2baebf4 PCI/x86: early dump pci conf space v2
Allows us to dump PCI space before any kernel changes have been made.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:52 -07:00
Yinghai Lu e7891c733f PCI/x86: write_pci_config_byte fix offset
also add write_pci_config_16

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:51 -07:00
Thomas Gleixner aa83f3f2cf x86: cleanup C1E enabled detection
Rename the "MSR_K8_ENABLE_C1E" MSR to INT_PENDING_MSG, which is the
name in the data sheet as well. Move the C1E mask to the header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:07 +02:00
Robert Richter 9e26d84273 fix build bug in "x86: add PCI extended config space access for AMD Barcelona"
Also much less code now.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:32:53 +02:00
Jeremy Fitzhardinge ce8e37cdbd x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits.
When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
we could have up to 52 bits of physical address in a pte.

The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:31:20 +02:00
Ingo Molnar af1cf204ba x86, mpparse: build fix
fix:

  LD      .tmp_vmlinux1
  arch/x86/kernel/built-in.o: In function `setup_arch':
  : undefined reference to `early_reserve_e820_mpc_new'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:35:12 +02:00
Yinghai Lu d49c428840 x86: make generic arch support NUMAQ
... so it could fall back to normal numa and we'd reduce the impact of the
NUMAQ subarch.

NUMAQ depends on GENERICARCH
also decouple genericarch numa from acpi.
also make it fall back to bigsmp if apicid > 8.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:34:42 +02:00
Yinghai Lu e0da336468 x86: introduce max_physical_apicid for bigsmp switching
a multi-socket test-system with 3 or 4 ioapics, when 4 dualcore cpus or
2 quadcore cpus installed, needs to switch to bigsmp or physflat.

CPU apic id is [4,11] instead of [0,7], and we need to check max apic
id instead of cpu numbers.

also add check for 32 bit when acpi is not compiled in or acpi=off.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:32:09 +02:00
Cyrill Gorcunov 3ed3f06295 x86: nmi - consolidate nmi_watchdog_default for 32bit mode
64bit mode bootstrap code does set nmi_watchdog to NMI_NONE
by default and doing the same on 32bit mode is safe too.
Such an action saves us from several #ifdef.

Btw, my previous commit

commit 19ec673ced
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date:   Wed May 28 23:00:47 2008 +0400

    x86: nmi - fix incorrect NMI watchdog used by default

did not fix the problem completely, moreover it
introduced additional bug - nmi_watchdog would be
set to either NMI_LOCAL_APIC or NMI_IO_APIC
_regardless_ to boot option if being enabled thru
/proc/sys/kernel/nmi_watchdog. Sorry for that.
Fix it too.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: macro@linux-mips.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:13:59 +02:00
Huang, Ying c45a707dbe x86: linked list of setup_data for i386
This patch adds linked list of struct setup_data supported for i386.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying 0c51a965ed x86: extract common part of head32.c and head64.c into head.c
This patch extracts the common part of head32.c and head64.c into head.c.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying ecacf09f7d x86: reserve EFI memory map with reserve_early
This patch reserves the EFI memory map with reserve_early(). Because EFI
memory map is allocated by bootloader, if it is not reserved by
reserved_early(), it may be overwritten through address returned by
find_e820_area().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying d0ec2c6f2c x86: reserve highmem pages via reserve_early
This patch makes early reserved highmem pages become reserved
pages. This can be used for highmem pages allocated by bootloader such
as EFI memory map, linked list of setup_data, etc.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Suresh Siddha e8a496ac8c x86: fix broken math-emu with lazy allocation of fpu area
Fix the math emulation that got broken with the recent lazy allocation of FPU
area. init_fpu() need to be added for the math-emulation path aswell
for the FPU area allocation.

math emulation enabled kernel booted fine with this, in the presence
of "no387 nofxsr" boot param.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Yinghai Lu 7b2a0a6c48 x86: make 32-bit use e820_register_active_regions()
this way 32-bit is more similar to 64-bit, and smarter e820 and numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:58 +02:00
Yinghai Lu ee0c80fadf x86: move e820_register_active() to e820.c
to prepare 32-bit to use it.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:57 +02:00
Yinghai Lu ab530e1f78 x86: early check if a system is numaq
so we could fall back to one node numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:06 +02:00
Ingo Molnar 835fc943f3 x86: mp build fix
fix:

 drivers/built-in.o: In function `acpi_pci_irq_enable':
 : undefined reference to `mp_config_acpi_gsi'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 14:44:42 +02:00
Yinghai Lu 2944e16b25 x86: update mptable
make mptable to be consistent with acpi routing, so we could:

1. kexec kernel with acpi=off
2. work around BIOSes where acpi routing is working, but mptable is
   not right, so can use kernel/kexec to start other OSes that don't have
   good acpi support.

command line: update_mptable

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:27 +02:00
Yinghai Lu 0596152388 x86, 32-bit: change propagate_e820_map() back to find_max_pfn()
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.

also for sparse it will call alloc_bootmem ... before we set up bootmem.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu ba924c81dd x86, numa, 32-bit: increase max_elements to 1024
so every element will represent 64M instead of 256M.

AMD opteron could have HW memory hole remapping, so could have
[0, 8g + 64M) on node0. Reduce element size to 64M to keep that on node 0

Later we need to use find_e820_area() to allocate memory_node_map like
on 64-bit. But need to move memory_present out of populate_mem_map...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:24 +02:00
Ingo Molnar 4c1cbafb88 x86 mpparse: build fix
fix this build bug:

 drivers/acpi/pci_irq.c: In function 'acpi_pci_irq_enable':
 drivers/acpi/pci_irq.c:574: error: implicit declaration of function 'mp_config_acpi_gsi'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 09:40:57 +02:00
Jack Steiner 9f5314fb4d x86, uv: update macros used by UV platform
Update the UV address macros to better describe the
fields of UV physical addresses. Improve comments
in the header files. Add additional MMR definitions.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:56:00 +02:00
Yinghai Lu fb093eab6d x86: remove duplicated e820 func in setup.h
we already have them in e820.h

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:51:03 +02:00
Vegard Nossum 6330a30a76 x86: break mutual header inclusion
This breaks up the mutual inclusion between headers ptrace.h and vm86.h
by moving some small part of vm86.h which is needed by ptrace.h into
processor-flags.h.

We also try to move #include lines to the top.

This has been compile tested on x86_32 and x86_64 defconfig, and run
through 'make headers_check'.

Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:48:23 +02:00
Vegard Nossum 83bea8e1fa x86: fix incomplete include guard in include/asm-x86/seccomp_32.h
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:45:28 +02:00
Linus Torvalds c1f64a5800 x86: MMIO and gcc re-ordering issue
On Tue, 27 May 2008, Linus Torvalds wrote:
>
> Expecting people to fix up all drivers is simply not going to happen. And
> serializing things shouldn't be *that* expensive. People who cannot take
> the expense can continue to use the magic __raw_writel() etc stuff.

Of course, for non-x86, you kind of have to expect drivers to be
well-behaved, so non-x86 can probably avoid this simply because there are
less relevant drivers involved.

Here's a UNTESTED patch for x86 that may or may not compile and work, and
which serializes (on a compiler level) the IO accesses against regular
memory accesses.

__read[bwlq]()/__write[bwlq]() are not serialized with a :"memory"
barrier, although since they still use "asm volatile" I suspect that i
practice they are probably serial too. Did not look very closely at any
generated code (only did a trivial test to see that the code looks
*roughly* correct).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:29:31 +02:00
Ingo Molnar 1a5726528a fix build bug in "x86: add PCI extended config space access for AMD Barcelona" 2008-06-02 12:26:21 +02:00
Pavel Machek c46e62f735 i8259: fix final ugliness
Introduce IRQx_VECTOR on 32-bit, so that #ifdef noise is kept
down. There should be no object code change.

[ mingo@elte.hu: merged to x86/irq not x86/i8259 due to x86/irq having
  restructured the vector code into asm-x86/irq_vectors.h, which this
  patch touches. ]

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 11:55:52 +02:00
Ingo Molnar 6fc92866a4 fix build bug in "x86: add PCI extended config space access for AMD Barcelona"
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 11:51:35 +02:00
Robert Richter 831d991821 x86: add PCI extended config space access for AMD Barcelona
This patch implements PCI extended configuration space access for
AMD's Barcelona CPUs. It extends the method using CF8/CFC IO
addresses. An x86 capability bit has been introduced that is set for
CPUs supporting PCI extended config space accesses.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 11:51:19 +02:00
Yinghai Lu a5481280b2 x86: extend e820 early_res support 32bit -fix #5
reserve early numa kva, so it will not clash with new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:53 +02:00
Yinghai Lu f0d43100f1 x86: extend e820 early_res support 32bit -fix #3
introduce init_pg_table_start, so xen PV could specify the value.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:47 +02:00
Kristian Høgsberg 3b6b9293d0 x86: Honor 'quiet' command line option in real mode boot decompressor.
This patch lets the early real mode code look for the 'quiet' option
on the kernel command line and pass a loadflag to the decompressor.
When this flag is set, we suppress the "Decompressing Linux... Parsing
ELF... done." messages.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 17:00:47 -07:00
Thomas Gleixner d364319b98 x86: move mmconfig declarations to header
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-30 15:45:17 -07:00
Ingo Molnar ba3a597423 - fix typo in include/asm-x86/nmi.h 2008-05-30 14:55:47 +02:00
Cyrill Gorcunov 19ec673ced x86: nmi - fix incorrect NMI watchdog used by default
The commit

	commit 4b82b27770
	Author: Cyrill Gorcunov <gorcunov@gmail.com>
	Date:   Sat May 24 19:36:35 2008 +0400

set nmi_watchdog to NMI_IO_APIC as by default. This causes hangs on some
machines with buggy watchdogs. Fix it - i.e. restore old behaviour.

Thanks to Sitsofe Wheeler and Adrian Bunk for catching the problem
and Maciej W. Rozycki for explanation what is going on there.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-28 21:04:53 +02:00
Jeremy Fitzhardinge 8006ec3e91 xen: add configurable max domain size
Add a config option to set the max size of a Xen domain.  This is used
to scale the size of the physical-to-machine array; it ends up using
around 1 page/GByte, so there's no reason to be very restrictive.

For a 32-bit guest, the default value of 8GB is probably sufficient;
there's not much point in giving a 32-bit machine much more memory
than that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge d451bb7aa8 xen: make phys_to_machine structure dynamic
We now support the use of memory hotplug, so the physical to machine
page mapping structure must be dynamic.  This is implemented as a
two-level radix tree structure, which allows us to efficiently
incrementally allocate memory for the p2m table as new pages are
added.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge a15af1c9ea x86/paravirt: add pte_flags to just get pte flags
Add pte_flags() to extract the flags from a pte.  This is a special
case of pte_val() which is only guaranteed to return the pte's flags
correctly; the page number may be corrupted or missing.

The intent is to allow paravirt implementations to return pte flags
without having to do any translation of the page number (most notably,
Xen).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge 349c709f42 xen: use new sched_op
Use the new sched_op hypercall, mainly because xenner doesn't support
the old one.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge 7b1333aa4c xen: use hypercall rather than clts
Xen will trap and emulate clts, but its better to use a hypercall.
Also, xenner doesn't handle clts.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge 4e09e21ccb x86: use symbolic constant in stts()
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:04:29 +02:00
Jeremy Fitzhardinge 4226ab93d8 x86: use pteval_t for _PAGE_FOO
Rather than making _PAGE_* constants signed, and then relying on
sign-extension to make sure that masks derived from them are wide
enough, just explicitly type them pteval_t.  This guarantees that they
and any derived values are the right size for the current pte format.

The reliance on sign extension is fragile, and invokes some very
subtle corners of the C type system.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:01:20 +02:00
H. Peter Anvin 1a20d3ecf5 x86: string_32.h: workaround for broken gcc 4.0
gcc 4.0 fails to allocate %eax for the pattern operand in the rep
store instructions used by memset; force it to do so by declaring a
register variable.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-26 13:36:53 -07:00
Sven Wegener 4b6011bc6e x86: Remove obsolete LOCK macro from include/asm-x86/atomic_64.h
Commit d167a518 "[PATCH] x86_64: x86_64 version of the smp alternative patch."
has left the LOCK macro in include/asm-x86_64/atomic.h, which is now
include/asm-x86/atomic_64.h. Its scope should be local to the file, other
architectures don't provide it, I couldn't find an in-tree user of it and
allyesconfig, allmodconfig and allnoconfig build fine without it, so this patch
removes it.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:42:25 +02:00
Cyrill Gorcunov 4b82b27770 x86: nmi_32.c - add nmi_watchdog_default helper
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov ddca03c98a x86: nmi - unify die_nmi() interface
By slightly changing 32bit mode die_nmi() we may unify the
interface and make it common for both (32/64bit) modes

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:50 +02:00
Fernando Luis Vazquez Cao 2e4db9d2c5 x86: remove duplicate declaration of unknown_nmi_panic
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 17:16:50 +02:00
Alexey Starikovskiy ce6444d39f x86: mp_bus_id_to_pci_bus is not needed 2008-05-25 12:01:25 +02:00
Yinghai Lu bf62f3981c x86: move e820_mark_nosave_regions to e820.c
and make e820_mark_nosave_regions to take limit_pfn to use max_low_pfn
for 32bit and end_pfn for 64bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 11:35:53 +02:00
Alexey Starikovskiy 2fddb6e28e x86: make config_irqsrc not MPspec specific
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy ec2cd0a22e x86: make struct config_ioapic not MPspec specific
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy 5f8951487d x86: make mp_ioapic_routing definition local
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Alexey Starikovskiy 32c5061265 x86: move es7000_plat out of mpparse.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Yinghai Lu a4c81cf684 x86: extend e820 ealy_res support 32bit
move early_res related from e820_64.c to e820.c
make edba detection to be done in head32.c
remove smp_alloc_memory, because we have fixed trampoline address now.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

 arch/x86/kernel/e820.c              |  214 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/e820_64.c           |  196 --------------------------------
 arch/x86/kernel/head32.c            |   76 ++++++++++++
 arch/x86/kernel/setup_32.c          |  109 +++---------------
 arch/x86/kernel/smpboot.c           |   17 --
 arch/x86/kernel/trampoline.c        |    2
 arch/x86/mach-voyager/voyager_smp.c |    9 -
 include/asm-x86/e820.h              |    6 +
 include/asm-x86/e820_64.h           |    9 -
 include/asm-x86/smp.h               |    1
 arch/x86/kernel/e820.c              |  214 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/e820_64.c           |  196 --------------------------------
 arch/x86/kernel/head32.c            |   76 ++++++++++++
 arch/x86/kernel/setup_32.c          |  109 +++---------------
 arch/x86/kernel/smpboot.c           |   17 --
 arch/x86/kernel/trampoline.c        |    2
 arch/x86/mach-voyager/voyager_smp.c |    9 -
 include/asm-x86/e820.h              |    6 +
 include/asm-x86/e820_64.h           |    9 -
 include/asm-x86/smp.h               |    1
 arch/x86/kernel/e820.c              |  214 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/e820_64.c           |  196 --------------------------------
 arch/x86/kernel/head32.c            |   76 ++++++++++++
 arch/x86/kernel/setup_32.c          |  109 +++---------------
 arch/x86/kernel/smpboot.c           |   17 --
 arch/x86/kernel/trampoline.c        |    2
 arch/x86/mach-voyager/voyager_smp.c |    9 -
 include/asm-x86/e820.h              |    6 +
 include/asm-x86/e820_64.h           |    9 -
 include/asm-x86/smp.h               |    1
 10 files changed, 320 insertions(+), 319 deletions(-)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson 6e9bcc796b x86 boot: change sanitize_e820_map parameter from byte to int to allow bigger memory maps
The map size counter passed into, and back out of, sanitize_e820_map(),
was an eight bit type (char or u8), as derived from its origins in
legacy BIOS E820 structures.  This patch changes that type to an 'int',
to allow this sanitize routine to also be used on larger maps (larger
than the 256 count that fits in a char).  The legacy BIOS E820 interface
of course does not change; that remains at 8 bits for this count, holding
up to E820MAX == 128 entries.  But the kernel internals can handle more
when those additional memory map entries are passed from the BIOS via
EFI interfaces.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson 028b785888 x86 boot: extend some internal memory map arrays to handle larger EFI input
Extend internal boot time memory tables to allow for up to
three entries per node, which may be larger than the 128 E820MAX
entries handled by the legacy BIOS E820 interface.  The EFI
interface, if present, is capable of passing memory map
entries for these larger node counts.

This patch requires an earlier patch that rewrote code depending
on these array sizes from using E820MAX explicitly to size loops,
to instead using ARRAY_SIZE() of the applicable array.

Another patch following this one will provide the code to pick
up additional memory entries passed via the EFI interface from
the BIOS and insert them in the following, now enlarged, arrays.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson c3965bd151 x86 boot: proper use of ARRAY_SIZE instead of repeated E820MAX constant
This patch is motivated by a subsequent patch which will allow for more
memory map entries on EFI supported systems than can be passed via the x86
legacy BIOS E820 interface.  The legacy interface is limited to E820MAX ==
128 memory entries, and that "E820MAX" manifest constant was used as the
size for several arrays and loops over those arrays.

The primary change in this patch is to change code loop sizes over those
arrays from using the constant E820MAX, to using the ARRAY_SIZE() macro
evaluated for the array being looped.  That way, a subsequent patch can
change the size of some of these arrays, without breaking this code.

This patch also adds a parameter to the sanitize_e820_map() routine,
which had an implicit size for the array passed it of E820MAX entries.
This new parameter explicitly passes the size of said array.  Once again,
this will allow a subsequent patch to change that array size for some
calls to sanitize_e820_map() without breaking the code.

As part of enhancing the sanitize_e820_map() interface this way, I further
combined the unnecessarily distinct x86_32 and x86_64 declarations for
this routine into a single, commonly used, declaration.

This patch in itself should make no difference to the resulting kernel
binary.

[ mingo@elte.hu: merged to -tip ]

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson e3f8ba81fd x86 boot: include missing smp.h header
The patch:
    x86: convert cpu_to_apicid to be a per cpu variable
introduced a dependency of ipi.h on smp.h in x86
builds with an allnoconfig.  Including smp.h in ipi.h
fixes the build error:
    In file included from arch/x86/kernel/traps_64.c:48:
    include/asm/ipi.h: In function 'send_IPI_mask_sequence':
    include/asm/ipi.h:114: error: 'per_cpu__x86_cpu_to_apicid' undeclared (first use in this function)

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Yinghai Lu b79cd8f126 x86: make e820.c to have common functions
remove the duplicated copy of these functions.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Yinghai Lu 42651f1582 x86: fix trimming e820 with MTRR holes.
converting MTRR layout from continous to discrete, some time could run out of
MTRRs. So add gran_sizek to prevent that by dumpping small RAM piece less than
gran_sizek.

previous trimming only can handle highest_pfn from mtrr to end_pfn from e820.
when have more than 4g RAM installed, there will be holes below 4g. so need to
check ram below 4g is coverred well.

need to be applied after
	[PATCH] x86: mtrr cleanup for converting continuous to discrete layout v7

Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:09 +02:00
Alexander van Heukelum 0dbfafa5fc x86: move i386 memory setup code to e820_32.c
The x86_64 code has centralized the memory setup code in
e820_64.c. This patch copies that approach to i386:

- early_param("mem", ...) parsing is moved from
setup_32.c to e820_32.c.

- setup_memory_map() and finish_e820_parsing() are
factored out from setup_arch(), and declarations
are added to e820_32.h.

- print_memory_map() is made static and removed from
e820_32.h.

- user_defined_memmap is marked as __initdata.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:09 +02:00
Joe Perches 78d64fc21d x86: include/asm-x86/string_32.h - style only
Looked at this file because of __memcpy warnings.
Thought it could use a style/checkpatch cleanup.

No change in vmlinux.

[tglx: fixed the remaining issues ]

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:36 +02:00
Christoph Lameter f0766440dd x86: unify current.h
Simply stitch these together. There are just two definitions that are shared
but the file is resonably small and putting these things together shows that
further unifications requires a unification of the per cpu / pda handling
between both arches.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:35 +02:00
Andrew Morton 23eb271b91 x86: setup_force_cpu_cap(): don't do clear_bit(non-unsigned-long)
Another hack to make proper prototyping of x86 bitops viable.

Cc: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:34 +02:00
Fernando Luis Vazquez Cao 2237ce2057 x86: cleanup, remove duplicate declaration of unknown_nmi_panic
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:33 +02:00
Cyrill Gorcunov 9831bfb201 x86 - hide X86_VM_MASK from userland programs v3
X86_VM_MASK is kernel specific flags so hide it from userland programs.

It should be defined *before* ptrace.h inclusion because of circular
link between these files

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:31 +02:00
Jan Beulich ebdd561a19 x86: constify data in reboot.c
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:30 +02:00
Jan Beulich 4c8ab98249 i386: move FIX_ACPI_* into non-permanent range
.. as they are used at early boot time only.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:30 +02:00
Thomas Gleixner d1097635de x86: move mmconfig declarations to header
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:27 +02:00
Thomas Gleixner 635ee41838 x86: create prototype for (un)map_devmem
Global functions need a prototype. Add it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:25 +02:00
Andrew Morton 5136dea573 x86: bitops take an unsigned long *
All (or most) other architectures do this.  So should x86.  Fix.

Cc: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:51:31 +02:00
Jan Beulich a2eddfa959 x86: make /proc/stat account for all interrupts
LAPIC interrupts, which don't go through the generic interrupt handling
code, aren't accounted for in /proc/stat. Hence this patch adds a
mechanism architectures can use to accordingly adjust the statistics.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 07:11:49 +02:00
Jan Beulich 63687a528c x86: move tracedata to RODATA
.. allowing it to be write-protected just as other read-only data
under CONFIG_DEBUG_RODATA.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 07:09:47 +02:00
Thomas Gleixner 3711ccb07b x86: fixup the fallout of the bitops changes
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 18:16:12 +02:00
Pavel Machek 21fd5132b2 x86: automatical unification of i8259.c
Make conversion of i8259 very mechanical -- i8259 was generated by
 diff -D, with too different parts left in i8259_32 and
i8259_64.c. Only "by hand" changes were removal of #ifdef from middle
of the comment (prevented compilation) and removal of one static to
allow splitting into files.

Of course, it will need some cleanups now, and those will follow.

Signed-of-by: Pavel Machek <pavel@suse.cz>
2008-05-24 16:44:26 +02:00
Pavel Machek 40bd217400 x86: automatical unification of i8259.c
Make conversion of i8259 very mechanical -- i8259 was generated by
2008-05-24 15:57:09 +02:00
Pekka Paalanen 0fd0e3da45 x86: mmiotrace full patch, preview 1
kmmio.c handles the list of mmio probes with callbacks, list of traced
pages, and attaching into the page fault handler and die notifier. It
arms, traps and disarms the given pages, this is the core of mmiotrace.

mmio-mod.c is a user interface, hooking into ioremap functions and
registering the mmio probes. It also decodes the required information
from trapped mmio accesses via the pre and post callbacks in each probe.
Currently, hooking into ioremap functions works by redefining the symbols
of the target (binary) kernel module, so that it calls the traced
versions of the functions.

The most notable changes done since the last discussion are:
- kmmio.c is a built-in, not part of the module
- direct call from fault.c to kmmio.c, removing all dynamic hooks
- prepare for unregistering probes at any time
- make kmmio re-initializable and accessible to more than one user
- rewrite kmmio locking to remove all spinlocks from page fault path

Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe()
or is there a better way?

The function called via call_rcu() itself calls call_rcu() again,
will this work or break? There I need a second grace period for RCU
after the first grace period for page faults.

Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack
that next. At some point I will start looking into how to make mmiotrace
a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should
make the user space part of mmiotracing as simple as
'cat /debug/trace/mmio > dump.txt'.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:22:12 +02:00
Pekka Paalanen 10c43d2eb5 x86: explicit call to mmiotrace in do_page_fault()
The custom page fault handler list is replaced with a single function
pointer. All related functions and variables are renamed for
mmiotrace.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: pq@iki.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:21:55 +02:00
Pekka Paalanen 86069782d6 x86: add a list for custom page fault handlers.
Provides kernel modules a way to register custom page fault handlers.
On every page fault this will call a list of registered functions. The
functions may handle the fault and force do_page_fault() to return
immediately.

This functionality is similar to the now removed page fault notifiers.
Custom page fault handlers are used by debugging and reverse engineering
tools. Mmiotrace is one such tool and a patch to add it into the tree
will follow.

The custom page fault handlers are called earlier in do_page_fault()
than the page fault notifiers were.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:16:38 +02:00
Steven Rostedt dfa60aba04 ftrace: use nops instead of jmp
This patch patches the call to mcount with nops instead
of a jmp over the mcount call.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 20:33:28 +02:00
Steven Rostedt 81d68a96a3 ftrace: trace irq disabled critical timings
This patch adds latency tracing for critical timings
(how long interrupts are disabled for).

 "irqsoff" is added to /debugfs/tracing/available_tracers

Note:
  tracing_max_latency
    also holds the max latency for irqsoff (in usecs).
   (default to large number so one must start latency tracing)

  tracing_thresh
    threshold (in usecs) to always print out if irqs off
    is detected to be longer than stated here.
    If irq_thresh is non-zero, then max_irq_latency
    is ignored.

Here's an example of a trace with ftrace_enabled = 0

=======
preemption latency trace v1.1.5 on 2.6.24-rc7
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
 latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
    -----------------
    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------
 => started at: _spin_lock_irqsave+0x2a/0xb7
 => ended at:   _spin_unlock_irqrestore+0x32/0x5f

                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
 swapper-0     1d.s3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
 swapper-0     1d.s3  100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
 swapper-0     1d.s3  100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)

vim:ft=help
=======

And this is a trace with ftrace_enabled == 1

=======
preemption latency trace v1.1.5 on 2.6.24-rc7
--------------------------------------------------------------------
 latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
    -----------------
    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------
 => started at: _spin_lock_irqsave+0x2a/0xb7
 => ended at:   _spin_unlock_irqrestore+0x32/0x5f

                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
 swapper-0     1dNs3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
 swapper-0     1dNs3   46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000])
 swapper-0     1dNs3   46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000])
 swapper-0     1dNs3   46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000])
 swapper-0     1dNs3   47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000])
 swapper-0     1dNs3   47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
 swapper-0     1dNs3   97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
 swapper-0     1dNs3   98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000])
 swapper-0     1dNs3   99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000])
 swapper-0     1dNs3  101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000])
 swapper-0     1dNs3  102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
 swapper-0     1dNs3  102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)

vim:ft=help
=======

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 20:32:46 +02:00
Steven Rostedt 23adec554a x86: add notrace annotations to vsyscall.
Add the notrace annotations to the vsyscall functions - there we are
not in kernel context yet, so the tracer function cannot (and must not)
be called.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 20:31:39 +02:00
Mike Travis 334ef7a7ab x86: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate

Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2d474871e2fb092eb46a0930aba5442e10eb96cc
Author: Mike Travis <travis@sgi.com>
Date:   Mon May 12 21:21:13 2008 +0200
2008-05-23 18:35:12 +02:00
Ingo Molnar b1979a5fda x86: prevent PGE flush from interruption/preemption
CR4 manipulation is not protected against interrupts and preemption,
but KVM uses smp_function_call to manipulate the X86_CR4_VMXE bit
either from the CPU hotplug code or from the kvm_init call.

We need to protect the CR4 manipulation from both interrupts and
preemption.

Original bug report: http://lkml.org/lkml/2008/5/7/48
Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=10642

This is not a regression from 2.6.25, it's a long standing and hard to
trigger bug.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 18:16:15 +02:00
Jeremy Fitzhardinge 3843fc2575 xen: remove support for non-PAE 32-bit
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used.  xen-unstable has now officially dropped
non-PAE support.  Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 18:42:49 +02:00
Pavel Machek 0abbc78a01 x86, aperture_64: use symbolic constants
Factor-out common aperture_valid code.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 11:35:14 +02:00
Hugh Dickins a8375bd81c x86: strengthen 64-bit p?d_bad()
The x86_64 pgd_bad(), pud_bad(), pmd_bad() inlines have differed from
their x86_32 counterparts in a couple of ways: they've been unnecessarily
weak (e.g. letting 0 or 1 count as good), and were typed as unsigned long.
Strengthen them and return int.

The PAE pmd_bad was too weak before, allowing any junk in the upper half;
but got strengthened by the patch correcting its ~PAGE_MASK to ~PTE_MASK.
The PAE pud_bad already said ~PTE_MASK; and since it folds into pgd_bad,
and we don't set the protection bits at that level, it'll do as is.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 08:14:45 -07:00
Jeremy Fitzhardinge cbb3077cbe xen: use PTE_MASK in pte_mfn()
Use PTE_MASK to extract mfn from pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:21 -07:00
Jeremy Fitzhardinge ba23cef5c2 x86: use PTE_MASK rather than ad-hoc mask
Use ~PTE_MASK to extract the non-pfn parts of the pte (ie, the pte
flags), rather than constructing an ad-hoc mask.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:21 -07:00
Jeremy Fitzhardinge 86aaf4fd4e x86: clarify use of _PAGE_CHG_MASK
_PAGE_CHG_MASK is defined as the set of bits not updated by
pte_modify(); specifically, the pfn itself, and the Accessed and Dirty
bits (which are updated by hardware).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:21 -07:00
Jeremy Fitzhardinge 7f84133af6 x86: use PTE_MASK in pgtable_32.h
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:21 -07:00
Jeremy Fitzhardinge a4d6886270 x86: use PTE_MASK in 32-bit PAE
Use PTE_MASK in 3-level pagetables (ie, 32-bit PAE).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:21 -07:00
Jeremy Fitzhardinge c57c05d003 x86: rearrange __(VIRTUAL|PHYSICAL)_MASK
Put the definitions of __(VIRTUAL|PHYSICAL)_MASK before their uses.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:20 -07:00
Jeremy Fitzhardinge 1bb271db63 x86: fix warning on 32-bit non-PAE
Fix the warning:

include2/asm/pgtable.h: In function `pte_modify':
include2/asm/pgtable.h:290: warning: left shift count >= width of type

On 32-bit PAE the virtual and physical addresses are both 32-bits,
so it ends up evaluating 1<<32.  Do the shift as a 64-bit shift then
cast to the appropriate size.  This should all be done at compile time,
and so have no effect on generated code.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:20 -07:00
Jeremy Fitzhardinge 2bd3a99c9d x86: define PTE_MASK in a universally useful way
Define PTE_MASK so that it contains a meaningful value for all x86
pagetable configurations.  Previously it was defined as a "long" which
means that it was too short to cover a 32-bit PAE pte entry.

It is now defined as a pteval_t, which is an integer type long enough
to contain a full pte (or pmd, pud, pgd).

This fixes an Xorg crash on 32-bit x86 with PAE due to corruption of the
NX bit in mprotect due to the incorrect type/value of PTE_MASK reported
by Hugh Dickins:

  "Yes, thanks Jeremy: I've checked that each stage builds and runs X on
   my boxes here, x86_32 and x86_32+PAE and x86_64.  (So even 1/8 is
   enough to fix the PAT pte_modify issue, though 2/8 then fixes
   compiler warnings.)"

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-20 07:51:20 -07:00
Avi Kivity 107d6d2efa KVM: x86 emulator: fix writes to registers with modrm encodings
A register destination encoded with a mod=3 encoding left dst.ptr NULL.
Normally we don't trap writes to registers, but in the case of smsw, we do.

Fix by pointing dst.ptr at the destination register.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-18 14:34:14 +03:00
Thomas Gleixner 538f0fd0f2 Merge branch 'linus' into x86/gart 2008-05-17 17:12:24 +02:00
Ingo Molnar dedd4915af bitops: fix build in struct thread_info
we can move flags from u32 to natural size - assembly code uses
offsetof.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-17 14:39:31 +02:00
Venki Pallipadi 1c12c4cf94 mprotect: prevent alteration of the PAT bits
There is a defect in mprotect, which lets the user change the page cache
type bits by-passing the kernel reserve_memtype and free_memtype
wrappers.  Fix the problem by not letting mprotect change the PAT bits.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-14 19:11:15 -07:00
Pavel Machek 3bb6fbf996 x86 gart: factor out common code
Cleanup gart handling on amd64 a bit: move common code into
enable_gart_translation , and use symbolic register names where
appropriate.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:10 +02:00
Pavel Machek aa134f1b09 x86: iommu: use symbolic constants, not hardcoded numbers
Move symbolic constants into gart.h, and use them instead of hardcoded
constant.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:10 +02:00
Jan Beulich f8096f92b8 x86: separate cmpxchg8b checking from PAE checking
.. allowing the former to be use in non-PAE kernels, too.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:09 +02:00
Alan Mayer 305b92a232 x86: change FIRST_SYSTEM_VECTOR to a variable
The SGI UV system needs several more system vectors than a vanilla
x86_64 system.  Rather than burden the other archs with extra system
vectors that they don't use, change FIRST_SYSTEM_VECTOR to a variable,
so that it can be dynamic.

Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:06 +02:00
Thomas Gleixner 1a331957ef x86: move eisa_set_level_irq declaration to header
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:06 +02:00
Thomas Gleixner 88a83350bc x86: declare setup_apic_routing
Global functions need a prototype.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:06 +02:00
Thomas Gleixner 22067d4501 x86: unify irq.h
Not much difference in those files.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Thomas Gleixner 22dc12d1f6 x86: unify hwirq.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Thomas Gleixner 97e7b6f54c x86: unify apic interrupt function declarations
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Thomas Gleixner 0bc471d930 x86: move BUILD_IRQ macro magic to i8259_64.c
i8259_64.c is the only place which uses those macros.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Thomas Gleixner 9b7dc567d0 x86: unify interrupt vector defines
The interrupt vector defines are copied 4 times around with minimal
differences. Move them all into asm-x86/irq_vectors.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Thomas Gleixner 2e0884362d x86: move common declarations to hw_irq.h
Move the common declarations from hw_irq_32/64 into hw_irq.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Alan Mayer 6859a84029 x86: resize NR_IRQS for large machines
On machines with very large numbers of cpus, tables that are dimensioned
by NR_IRQS get very large, especially the irq_desc table.  They are also
very sparsely used.  When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
to set NR_IRQS, otherwise use NR_CPUS.

Signed-off-by: Alan Mayer <ajm@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:05 +02:00
Ingo Molnar 8a6c160a2a x86: redo thread_info.h change
redo Roland's "signals: x86 TS_RESTORE_SIGMASK" ontop of the unified
thread_info.h file.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:03 +02:00
Christoph Lameter b84200b3a0 x86: thread_info: merge thread_info allocation
Make them similar so that both use THREAD_ORDER and THREAD_FLAGS and have a
THREAD_SIZE definition that is setup in asm/page_xx.h

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:03 +02:00
Christoph Lameter 00c1bb133c x86: thread_info: merge tif masks
The TIF masks are basically the same. x86_32 also has _TIF_SYSCALL_EMU which is
zero for the 64 bit case. The tif masks become the same.

x86_64 has an additional _TIF_DONOTIFY_MASK. Does not hurt for the 32 bit case
since it is only used in x86_64 arch code.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:03 +02:00
Christoph Lameter e57549b017 x86: thread_info: merge TIF_ flags.
Both TIF lists are essentially the same. x86_32 also has TIF_SYSCALL_EMU which
must be undefined for the 64 bit case.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:03 +02:00
Christoph Lameter 24e2de6e28 x86: thread_info: PREEMPT_ACTIVE
Same for both 32 and 64 bit so simply put it in to the common area.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:03 +02:00
Christoph Lameter 3351cc03c0 x86: threadinfo: merge INIT_THREAD_INFO
Both definitions are the same. So move to common x86 area.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:03 +02:00
Christoph Lameter 006c484bb3 x86: common thread_info definitions
Merge the thread_info definition into one structure definition for both arches.

The __u32 is equal to unsigned long for 32 bit.

sysenter_return is used both for the IA32 emulation for 64 and x86_32.
Avoid complicated #ifdef by simply always including it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:03 +02:00
Christoph Lameter f2ea3b1d4d x86: threadinfo: merge thread sync state definitions
Merge both. x86_64 has an additional TS_COMPAT that is harmless
for 32 bit.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:02 +02:00
Christoph Lameter 12a638e13c x86: threadinfo: common include files
Move shared includes to a common area in thread_info.h

Adds asm/types.h for x86_64 and linux/compiler.h for x86_32. Not needed but
we can avoid some ifdeffing and it simplifies later joining.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:02 +02:00
Christoph Lameter 2052e8d40a x86: merge thread_info.h
Simple merge of both thread_info_32.h and thread_info_64.h into thread_info.h.

Comments for #ifndef __ASM_THREAD_INFO_H and #ifdef __KERNEL__
are the same.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:02 +02:00
Ingo Molnar 1c7d06d419 revert: thread_info.h change
temporarily revert parts of "signals: x86 TS_RESTORE_SIGMASK".

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:02 +02:00
Linus Torvalds 3e1b83ab39 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86: rdc: leds build/config fix
  x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system)
  x86: revert commit 709f744 ("x86: bitops asm constraint fixes")
  x86: restrict keyboard io ports reservation to make ipmi driver work
  x86: fix fpu restore from sig return
  x86: remove spew print out about bus to node mapping
  x86: revert printk format warning change which is for linux-next
  x86: cleanup PAT cpu validation
  x86: geode: define geode_has_vsa2() even if CONFIG_MGEODE_LX is not set
  x86: GEODE: cache results from geode_has_vsa2() and uninline
  x86: revert geode config dependency
2008-05-10 21:10:48 -07:00
Linus Torvalds 39f004ba27 Make <asm-x86/spinlock.h> use ACCESS_ONCE()
..instead of cooking up its own uglier local version of it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-10 19:52:43 -07:00
Vaidyanathan Srinivasan 5c3a121d52 x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system)
System topology on intel based system needs to be exported
for non-numa case as well.

All parts of asm-i386/topology.h has come under
#ifdef CONFIG_NUMA after the merge to asm-x86/topology.h

/sys/devices/system/cpu/cpu?/topology/* is populated based on
ENABLE_TOPO_DEFINES

The sysfs cpu topology is not being populated on my dual socket
dual core xeon 5160 processor based (x86 32 bit) system.

CONFIG_NUMA is not set in my case yet the topology is relevant
and useful.

irqbalance daemon application depends on topology to build the
cpus and package list and it fails on Fedora9 beta since the
sysfs topology was not being populated in the 2.6.25 kernel.

I am not sure if it was intentional to not define ENABLE_TOPO_DEFINES
for non-numa systems.

This fix has been tested on the above mentioned dual core, dual socket
system.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-05-10 19:31:45 +02:00
Simon Holm Thøgersen eb2b4e682a x86: revert commit 709f744 ("x86: bitops asm constraint fixes")
709f744 causes my computer to freeze during the start up of X and my
login manger (GDM). It gets to the point where it has shown the default
X mouse cursor logo (a big X / cross) and does not respond to anything
from that point on.

This worked fine before 709f744, and it works fine with 709f744
reverted on top of Linus' current tree (f74d505). The revert had
conflicts, as far as I can tell due to white space changes. The diff I
ended up with is below.

It is 100% reproducible.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Suresh Siddha fd3c3ed5d1 x86: fix fpu restore from sig return
If the task never used fpu, initialize the fpu before restoring the FP
state from the signal handler context. This will allocate the fpu
state, if the task never needed it before.

Reported-and-bisected-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Frederik Deweerdt <deweerdt@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Thomas Gleixner 8d4a430085 x86: cleanup PAT cpu validation
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.

Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.

Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.

The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:51 +02:00
Andres Salomon cb3f43b22b x86: geode: define geode_has_vsa2() even if CONFIG_MGEODE_LX is not set
We want drivers to be able to use geode_has_vsa2 without having to worry
about what model geode is being compiled for.  This patch ensures that
geode_has_vsa2 is always defined.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:50 +02:00
Andres Salomon 547acec7ec x86: GEODE: cache results from geode_has_vsa2() and uninline
This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.

[akpm@linux-foundation.org: cleanup]

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:50 +02:00
Hugh Dickins aeed5fce37 x86: fix PAE pmd_bad bootup warning
Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.

That came from 9fc34113f6 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.

And revert that cded932b75 x86: fix pmd_bad
and pud_bad to support huge pages.  It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.

Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.

Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests.  Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.

However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page?  get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-06 13:08:58 -07:00
Linus Torvalds 45ea2103d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86: fix setup printk format warning
  x86: olpc build fix
  x86: video/fbdev.c: add MODULE_LICENSE
  x86: fix up bootparam.h for userspace inclusion
  x86: relocs ELF handling - use SELFMAG instead of numeric constant
  x86: vdso ELF handling - use SELFMAG instead of numeric constant
  x86: remove dell reboot dmi quirk board name match
  x86: es7000 build fix
  x86: make additional_cpus static
  x86: make start_secondary() static
  kbuild, suspend, x86: fix rebuild of wakeup.bin
  uml: fix gcc problem
  x86: undo visws/numaq build changes
2008-05-04 17:11:43 -07:00
Rusty Russell afaafe50ee x86: fix up bootparam.h for userspace inclusion
commit 8b664aa66e (x86, boot: add linked
list of struct setup_data) put a new struct in bootparam.h, but didn't
use the userspace-safe types.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Huang Ying <ying.huang@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Andrea Arcangeli bc1a34f1bf KVM: avoid fx_init() schedule in atomic
This make sure not to schedule in atomic during fx_init. I also
changed the name of fpu_init to fx_finit to avoid duplicating the name
with fpu_init that is already used in the kernel, this makes grep
simpler if nothing else.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:48 +03:00
Sheng Yang 1439442c7b KVM: VMX: Enable EPT feature for KVM
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:42 +03:00
Sheng Yang b7ebfb0509 KVM: VMX: Prepare an identity page table for EPT in real mode
[aliguory: plug leak]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:41 +03:00
Sheng Yang 7b52345e2c KVM: MMU: Add EPT support
Enable kvm_set_spte() to generate EPT entries.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:38 +03:00
Sheng Yang 67253af52e KVM: Add kvm_x86_ops get_tdp_level()
The function get_tdp_level() provided the number of tdp level for EPT and
NPT rather than the NPT specific macro.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:34 +03:00
H. Peter Anvin edfa5cfa3d x86: types: use <asm-generic/int-*.h> for the x86 architecture
This modifies <asm-x86/types.h> to use the <asm-generic/int-*.h>
generic include files.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
2008-05-02 16:18:42 -07:00
Jean Delvare ef3fb66ced dmi: clean-up dmi helper declarations
The declaration of dmi helper functions is a bit messy and inconsistent at the
moment:

* On ia64 they are declared in <asm/io.h>.
* On x86-64 they are declared in <asm/dmi.h>.
* On i386 they are declared both in <asm/io.h> and <asm/dmi.h>.

Fix the header files so that the dmi helper functions are consistently
defined in <asm/dmi.h>.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Roman Zippel f8bd2258e2 remove div_long_long_rem
x86 is the only arch right now, which provides an optimized for
div_long_long_rem and it has the downside that one has to be very careful that
the divide doesn't overflow.

The API is a little akward, as the arguments for the unsigned divide are
signed.  The signed version also doesn't handle a negative divisor and
produces worse code on 64bit archs.

There is little incentive to keep this API alive, so this converts the few
users to the new API.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel 6f6d6a1a6a rename div64_64 to div64_u64
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel 2418f4f28f introduce explicit signed/unsigned 64bit divide
The current do_div doesn't explicitly say that it's unsigned and the signed
counterpart is missing, which is e.g.  needed when dealing with time values.

This introduces 64bit signed/unsigned divide functions which also attempts to
cleanup the somewhat awkward calling API, which often requires the use of
temporary variables for the dividend.  To avoid the need for temporary
variables everywhere for the remainder, each divide variant also provides a
version which doesn't return the remainder.

Each architecture can now provide optimized versions of these function,
otherwise generic fallback implementations will be used.

As an example I provided an alternative for the current x86 divide, which
avoids the asm casts and using an union allows gcc to generate better code.
It also avoids the upper divde in a few more cases, where the result is known
(i.e.  upper quotient is zero).

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Hugh Dickins 5f464707c8 x86: fix HT cpu booting on 32-bit
Since recent smpboot 32/64-bit merge, my dual Xeon with HT has been
booting only 2 of its 4 cpus (when running an i386 kernel; but x86_64
is okay).  J.A. Magallón reports the same.

 native_cpu_up: bad cpu 2
 native_cpu_up: bad cpu 3

The mach-default cpu_present_to_apicid() was just returning cpu number
(2, 3) instead of apicid (6, 7): looks like we now need the x86_64 code
even for the i386 case.

Comparing with other versions of cpu_present_to_apicid(), it seems a
good idea to include an NR_CPUS test too, since cpu_present() doesn't
include that; but that wasn't a problem here, and may no problem at all.

Prior to that smpboot merge, my Xeon booted the two HT siblings on one
physical first, then the two siblings on the other physical after - when
i386, but alternated them when x86_64.  Since the merge, the x86_64
sequence is unchanged, but the i386 sequence is now like x86_64.

I prefer this consistency, and I prefer the new sequence: booting with
maxcpus=2 then uses the independent physicals without HT sharing.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar 3e8f7e35f3 x86 VISWS: build fix
the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.

also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Adrian Bunk 9cbfe20068 x86: remove Xgt_desc_struct
The comment says it should have been removed in 2.6.25.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Jeff Dike 730f412c08 asm-*/futex.h should include linux/uaccess.h
Lots of asm-*/futex.h call pagefault_enable and pagefault_disable, which
are declared in linux/uaccess.h, without including linux/uaccess.h.

They all include asm/uaccess.h, so this patch replaces asm/uaccess.h
with linux/uaccess.h.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:52 -07:00
Roland McGrath 5a8da0ea82 signals: x86 TS_RESTORE_SIGMASK
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own
set_restore_sigmask() function.  This saves the costly SMP-safe set_bit
operation, which we do not need for the sigmask flag since TIF_SIGPENDING
always has to be set too.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Alex Chiang fe086a7bea [IA64] Provide ACPI fixup for /proc/cpuinfo/physical_id
Legacy HP ia64 platforms currently cannot provide
/proc/cpuinfo/physical_id due to legacy SAL/PAL implementations.
However, that physical topology information can be obtained
via ACPI.

Provide an interface that gives ACPI one last chance to provide
physical_id for these legacy platforms. This logic only comes
into play iff:

- ACPI actually provides slot information for the CPU
- we lack a valid socket_id

Otherwise, we don't do anything.

Since x86 uses the ACPI processor driver as well, we provide a nop
stub function for arch_fix_phys_package_id() in asm-x86/topology.h

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-04-29 15:05:29 -07:00
Linus Torvalds 1f43c53930 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86: fix PCI MSI breaks when booting with nosmp
  x86: vget_cycles() __always_inline
  x86: add more boot protocol documentation
  bootprotocol: cleanup
  x86: fix warning in "x86: clean up vSMP detection"
  x86: !x & y typo in mtrr code
2008-04-29 09:03:19 -07:00
Linus Torvalds 5f78e4d339 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci:
  x86: add pci=check_enable_amd_mmconf and dmi check
  x86: work around io allocation overlap of HT links
  acpi: get boot_cpu_id as early for k8_scan_nodes
  x86_64: don't need set default res if only have one root bus
  x86: double check the multi root bus with fam10h mmconf
  x86: multi pci root bus with different io resource range, on 64-bit
  x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
  x86: get mp_bus_to_node early
  x86 pci: remove checking type for mmconfig probe
  x86: remove unneeded check in mmconf reject
  driver core: try parent numa_node at first before using default
  x86: seperate mmconf for fam10h out from setup_64.c
  x86: if acpi=off, force setting the mmconf for fam10h
  x86_64: check MSR to get MMCONFIG for AMD Family 10h
  x86_64: check and enable MMCONFIG for AMD Family 10h
  x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
  x86: mmconf enable mcfg early
  x86: clear pci_mmcfg_virt when mmcfg get rejected
  x86: validate against acpi motherboard resources

Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to
OLPC support manually.
2008-04-29 08:26:51 -07:00
Harvey Harrison 6510d41954 kernel: Move arches to use common unaligned access
Unaligned access is ok for the following arches:
cris, m68k, mn10300, powerpc, s390, x86

Arches that use the memmove implementation for native endian, and
the byteshifting for the opposite endianness.
h8300, m32r, xtensa

Packed struct for native endian, byteshifting for other endian:
alpha, blackfin, ia64, parisc, sparc, sparc64, mips, sh

m86knommu is generic_be for Coldfire, otherwise unaligned access is ok.

frv, arm chooses endianness based on compiler settings, uses the byteshifting
versions.  Remove the unaligned trap handler from frv as it is now unused.

v850 is le, uses the byteshifting versions for both be and le.

Remove the now unused asm-generic implementation.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:27 -07:00
Andres Salomon 3ef0e1f8ca x86: olpc: add One Laptop Per Child architecture support
This adds support for OLPC XO hardware.  Open Firmware on XOs don't contain
the VSA, so it is necessary to emulate the PCI BARs in the kernel.  This also
adds functionality for running EC commands, and a CONFIG_OLPC.

A number of OLPC drivers depend upon CONFIG_OLPC.

olpc_ec_timeout is a hack to work around Embedded Controller bugs.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: geode_has_vsa build fix]
[akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Adrian Bunk 7d195a5409 proper extern for late_time_init
Add a proper extern for late_time_init in include/linux/init.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:03 -07:00
Hugh Dickins 9752082560 x86: vget_cycles() __always_inline
Mark vget_cycles() as __always_inline, so gcc is never tempted to make
the vsyscall vread_tsc() dive into kernel text, with resulting SIGSEGV.

This was a self-inflicted wound: I've not seen that happen with unhacked
sources; but for debug reasons I'd changed my x86/Makefile to compile
no-unit-at-a-time, and that in conjunction with OPTIMIZE_INLINING=y
ended up with vget_cycles() in kernel text.  Perhaps it can happen
in other ways: safer to use __always_inline.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-29 13:45:24 +02:00
Andres Salomon fd96795630 gxfb/lxfb: detect framebuffer size using an MSR if VSA2 isn't available
If there's no VSA2 (ie, if we're using tinybios or OpenFirmware), use the
GLIU's P2D Range Offset Descriptor to determine how much memory we have
available for the framebuffer.

Originally based on a patch by Jordan Crouse.  Tested with OpenFirmware;
Pascal informs me that tinybios has a stub that fills in P2D_RO0.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:40 -07:00
Andres Salomon 61a517a063 gxfb/lxfb: use VSA definitions when fetching framebuffer size
..Rather than using magic constants.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:40 -07:00
Andres Salomon e9338364e6 x86: GEODE: add Virtual Systems Architecture detection
This is generic VSA2 detection.  It's used by OLPC to determine whether or not
the BIOS contains VSA2, but since other BIOSes are coming out that don't use
the VSA (ie, tinybios), it might end up being useful for others.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:35 -07:00
Andres Salomon 32bf87e369 x86: geode: MSR cleanup
This cleans up a few MSR-using drivers in the following manner:
  - Ensures MSRs are all defined in asm/geode.h, rather than in misc
    places
  - Makes the naming consistent; cs553[56] ones begin with MSR_,
    GX-specific ones start with MSR_GX_, and LX-specific ones start
    with MSR_LX_.  Also, make the names match the data sheet.
  - Use MSR names rather than numbers in source code
  - Document the fact that the LX's MSR_PADSEL has the wrong value
    in the data sheet.  That's, uh, good to note.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:35 -07:00
Gerald Schaefer 7f2e9525ba hugetlbfs: common code update for s390
Huge ptes have a special type on s390 and cannot be handled with the standard
pte functions in certain cases, e.g.  because of a different location of the
invalid bit.  This patch adds some new architecture- specific functions to
hugetlb common code, as a prerequisite for the s390 large page support.

This won't affect other architectures in functionality, but I need to add some
new dummy inline functions to the headers.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Gerald Schaefer 8fe627ec5b hugetlbfs: add missing TLB flush to hugetlb_cow()
A cow break on a hugetlbfs page with page_count > 1 will set a new pte with
set_huge_pte_at(), w/o any tlb flush operation.  The old pte will remain in
the tlb and subsequent write access to the page will result in a page fault
loop, for as long as it may take until the tlb is flushed from somewhere else.
 This patch introduces an architecture-specific huge_ptep_clear_flush()
function, which is called before the the set_huge_pte_at() in hugetlb_cow().

ATTENTION: This is just a nop on all architectures for now, the s390
implementation will come with our large page patch later.  Other architectures
should define their own huge_ptep_clear_flush() if needed.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Gerald Schaefer 6d779079bf hugetlbfs: architecture header cleanup
This patch moves all architecture functions for hugetlb to architecture header
files (include/asm-foo/hugetlb.h) and converts all macros to inline functions.
 It also removes (!) ARCH_HAS_HUGEPAGE_ONLY_RANGE,
ARCH_HAS_HUGETLB_FREE_PGD_RANGE, ARCH_HAS_PREPARE_HUGEPAGE_RANGE,
ARCH_HAS_SETCLEAR_HUGE_PTE and ARCH_HAS_HUGETLB_PREFAULT_HOOK.

Getting rid of the ARCH_HAS_xxx #ifdef and macro fugliness should increase
readability and maintainability, at the price of some code duplication.  An
asm-generic common part would have reduced the loc, but we would end up with
new ARCH_HAS_xxx defines eventually.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:25 -07:00
Nick Piggin 7e675137a8 mm: introduce pte_special pte bit
s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
model (which is more dynamic than most).  Instead, they had proposed to
implement it with an additional path through vm_normal_page(), using a bit in
the pte to determine whether or not the page should be refcounted:

vm_normal_page()
{
	...
        if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
                if (vma->vm_flags & VM_MIXEDMAP) {
#ifdef s390
			if (!mixedmap_refcount_pte(pte))
				return NULL;
#else
                        if (!pfn_valid(pfn))
                                return NULL;
#endif
                        goto out;
                }
	...
}

This is fine, however if we are allowed to use a bit in the pte to determine
refcountedness, we can use that to _completely_ replace all the vma based
schemes.  So instead of adding more cases to the already complex vma-based
scheme, we can have a clearly seperate and simple pte-based scheme (and get
slightly better code generation in the process):

vm_normal_page()
{
#ifdef s390
	if (!mixedmap_refcount_pte(pte))
		return NULL;
	return pte_page(pte);
#else
	...
#endif
}

And finally, we may rather make this concept usable by any architecture rather
than making it s390 only, so implement a new type of pte state for this.
Unfortunately the old vma based code must stay, because some architectures may
not be able to spare pte bits.  This makes vm_normal_page a little bit more
ugly than we would like, but the 2 cases are clearly seperate.

So introduce a pte_special pte state, and use it in mm/memory.c.  It is
currently a noop for all architectures, so this doesn't actually result in any
compiled code changes to mm/memory.o.

BTW:
I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
The reason is that, regardless of where vm_normal_page is actually
implemented, the *abstraction* is still exactly the same. Also, while it
depends on whether the architecture has pte_special or not, that is the
only two possible cases, and it really isn't an arch specific function --
the role of the arch code should be to provide primitive functions and
accessors with which to build the core code; pte_special does that. We do
not want architectures to know or care about vm_normal_page itself, and
we definitely don't want them being able to invent something new there
out of sight of mm/ code. If we made vm_normal_page an arch function, then
we have to make vm_insert_mixed (next patch) an arch function too. So I
don't think moving it to arch code fundamentally improves any abstractions,
while it does practically make the code more difficult to follow, for both
mm and arch developers, and easier to misuse.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:23 -07:00
Linus Torvalds 42cadc8600 Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
  KVM: kill file->f_count abuse in kvm
  KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
  KVM: SVM: remove selective CR0 comment
  KVM: SVM: remove now obsolete FIXME comment
  KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
  KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
  KVM: export kvm_lapic_set_tpr() to modules
  KVM: SVM: sync TPR value to V_TPR field in the VMCB
  KVM: ppc: PowerPC 440 KVM implementation
  KVM: Add MAINTAINERS entry for PowerPC KVM
  KVM: ppc: Add DCR access information to struct kvm_run
  ppc: Export tlb_44x_hwater for KVM
  KVM: Rename debugfs_dir to kvm_debugfs_dir
  KVM: x86 emulator: fix lea to really get the effective address
  KVM: x86 emulator: fix smsw and lmsw with a memory operand
  KVM: x86 emulator: initialize src.val and dst.val for register operands
  KVM: SVM: force a new asid when initializing the vmcb
  KVM: fix kvm_vcpu_kick vs __vcpu_run race
  KVM: add ioctls to save/store mpstate
  KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
  ...
2008-04-27 10:13:52 -07:00
Marcelo Tosatti 62d9f0dbc9 KVM: add ioctls to save/store mpstate
So userspace can save/restore the mpstate during migration.

[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:16 +03:00
Avi Kivity a45352908b KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
We wish to export it to userspace, so move it into the kvm namespace.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:04:13 +03:00
Feng (Eric) Liu 2714d1d3d6 KVM: Add trace markers
Trace markers allow userspace to trace execution of a virtual machine
in order to monitor its performance.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:19 +03:00
Joerg Roedel 53371b5098 KVM: SVM: add intercept for machine check exception
To properly forward a MCE occured while the guest is running to the host, we
have to intercept this exception and call the host handler by hand. This is
implemented by this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:18 +03:00
Anthony Liguori 35149e2129 KVM: MMU: Don't assume struct page for x86
This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty().  Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.

We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results.  When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.

This does not implement support for avoiding reference counting for reserved
RAM or for IO memory.  However, it should make those things pretty straight
forward.

Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those.  I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.

[avi: fix overflow when shifting left pfns by adding casts]

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:15 +03:00
Joerg Roedel 9c20456a32 KVM: function declaration parameter name cleanup
The kvm_host.h file for x86 declares the functions kvm_set_cr[0348]. In the
header file their second parameter is named cr0 in all cases. This patch
renames the parameters so that they match the function name.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:55 +03:00
Marcelo Tosatti 3200f405a1 KVM: MMU: unify slots_lock usage
Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.

Also fix some callsites that were not grabbing the lock properly.

[avi: drop slots_lock while in guest mode to avoid holding the lock
      for indefinite periods]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:52 +03:00
Izik Eidus 37817f2982 KVM: x86: hardware task switching support
This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm.  It allows operating systems which use it,
like freedos, to run as kvm guests.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:39 +03:00
Izik Eidus 2e4d265349 KVM: x86: add functions to get the cpl of vcpu
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:38 +03:00
Avi Kivity 69a9f69bb2 KVM: Move some x86 specific constants and structures to include/asm-x86
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:34 +03:00
Glauber Costa 3c62c62502 x86: make native_machine_shutdown non-static
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:30 +03:00
Glauber Costa ed23dc6f5b x86: allow machine_crash_shutdown to be replaced
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:29 +03:00
Marcelo Tosatti 2f333bcb4e KVM: MMU: hypercall based pte updates and TLB flushes
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - one mmu_op hypercall instead of one per op
 - allow 64-bit gpa on hypercall
 - don't pass host errors (-ENOMEM) to guest]

[akpm: warning fix on i386]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:27 +03:00
Avi Kivity 9f81128591 KVM: Provide unlocked version of emulator_write_phys()
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:26 +03:00
Marcelo Tosatti a28e4f5a62 KVM: add basic paravirt support
Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:24 +03:00
Sheng Yang e0f63cb927 KVM: Add save/restore supporting of in kernel PIT
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:22 +03:00
Sheng Yang 7837699fa6 KVM: In kernel PIT model
The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.

[marcelo: make last_injected_time per-guest]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:21 +03:00
Avi Kivity 2d3ad1f40c KVM: Prefix control register accessors with kvm_ to avoid namespace pollution
Names like 'set_cr3()' look dangerously close to affecting the host.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:26 +03:00
Marcelo Tosatti 05da45583d KVM: MMU: large page support
Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed.  If the largepage contains
write-protected pages, a large pte is not used.

Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.

Anthony measures a 4% improvement on 4-way kernbench, with NPT.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Marcelo Tosatti 2e53d63acb KVM: MMU: ignore zapped root pagetables
Mark zapped root pagetables as invalid and ignore such pages during lookup.

This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Amit Shah f11c3a8d84 KVM: Add stat counter for hypercalls
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Glauber de Oliveira Costa 18068523d3 KVM: paravirtualized clocksource: host part
This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.

The area is binary compatible with xen, as we use the same shadow_info
structure.

[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Joerg Roedel cc4b6871e7 KVM: export the load_pdptrs() function to modules
The load_pdptrs() function is required in the SVM module for NPT support.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:20 +03:00
Joerg Roedel 1855267210 KVM: export information about NPT to generic x86 code
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel f2b4b7ddf6 KVM: make EFER_RESERVED_BITS configurable for architecture code
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Sheng Yang 2384d2b326 KVM: VMX: Enable Virtual Processor Identification (VPID)
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.

[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Dong, Eddie 1ae0a13def KVM: MMU: Simplify hash table indexing
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Peter Zijlstra 7f424a8b08 fix idle (arch, acpi and apm) and lockdep
OK, so 25-mm1 gave a lockdep error which made me look into this.

The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561

The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.

So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:

  idle routines are entered with IRQs disabled
  idle routines will exit with IRQs enabled

Nearly all already did this in one form or another.

Merge the 32 and 64 bit bits so they no longer have different bugs.

As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-27 00:01:45 +02:00
Yinghai Lu 30a18d6c3f x86: multi pci root bus with different io resource range, on 64-bit
scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.

this can fix a system without _CRS for multi pci root bus.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu 871d5f8dd0 x86: get mp_bus_to_node early
Currently, on an amd k8 system with multi ht chains, the numa_node of
pci devices under /sys/devices/pci0000:80/* is always 0, even if that
chain is on node 1 or 2 or 3.

Workaround: pcibus_to_node(bus) is used when we want to get the node that
pci_device is on.

In struct device, we already have numa_node member, and we could use
dev_to_node()/set_dev_node() to get and set numa_node in the device.
set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
and pcibus_to_node uses bus->sysdata for nodeid.

The problem is when pci_add_device is called, bus->sysdata is not assigned
correct nodeid yet. The result is that numa_node will always be 0.

pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
mp_bus_to_node mapping before these two are called, and thus
get_mp_bus_to_node could get correct node for sysdata in root bus.

In scanning of the root bus, all child busses will take parent bus sysdata.
So all pci_device->dev.numa_node will be assigned correctly and automatically.

Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
could also could make other bus specific device get the correct numa_node
too.

This is an updated version of pci_sysdata and Jeff's pci_domain patch.

[ mingo@elte.hu: build fix ]

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Linus Torvalds c3bf9bc243 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
  x86_64/mm: check and print vmemmap allocation continuous
  x86_64: fix setup_node_bootmem to support big mem excluding with memmap
  x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
  mm: allow reserve_bootmem() cross nodes
  mm: offset align in alloc_bootmem()
  mm: fix alloc_bootmem_core to use fast searching for all nodes
  mm: make mem_map allocation continuous
2008-04-26 14:04:32 -07:00
Yinghai Lu 1a27fc0a42 x86_64: fix setup_node_bootmem to support big mem excluding with memmap
typical case: four sockets system, every node has 4g ram, and we are using:

	memmap=10g$4g

to mask out memory on node1 and node2

when numa is enabled, early_node_mem is used to get node_data and node_bootmap.

if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.

so check it and print out some info about it.

need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.

depends on "mm: make reserve_bootmem can crossed the nodes".

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Linus Torvalds 9b79ed952b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
  x86, bitops: select the generic bitmap search functions
  x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
  x86: finalize bitops unification
  x86, UML: remove x86-specific implementations of find_first_bit
  x86: optimize find_first_bit for small bitmaps
  x86: switch 64-bit to generic find_first_bit
  x86: generic versions of find_first_(zero_)bit, convert i386
  bitops: use __fls for fls64 on 64-bit archs
  generic: implement __fls on all 64-bit archs
  generic: introduce a generic __fls implementation
  x86: merge the simple bitops and move them to bitops.h
  x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
  x86, uml: fix uml with generic find_next_bit for x86
  x86: change x86 to use generic find_next_bit
  uml: Kconfig cleanup
  uml: fix build error
2008-04-26 13:46:11 -07:00