Commit Graph

2277 Commits

Author SHA1 Message Date
Jeremy Fitzhardinge 9a4029fd34 xen: ignore RW mapping of RO pages in pagetable_init
When setting up the initial pagetable, which includes mappings of all
low physical memory, ignore a mapping which tries to set the RW bit on
an RO pte.  An RO pte indicates a page which is part of the current
pagetable, and so it cannot be allowed to become RW.

Once xen_pagetable_setup_done is called, set_pte reverts to its normal
behaviour.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: ebiederm@xmission.com (Eric W. Biederman)
2007-07-18 08:47:43 -07:00
Jeremy Fitzhardinge f4f97b3ea9 xen: Complete pagetable pinning
Xen requires all active pagetables to be marked read-only.  When the
base of the pagetable is loaded into %cr3, the hypervisor validates
the entire pagetable and only allows the load to proceed if it all
checks out.

This is pretty slow, so to mitigate this cost Xen has a notion of
pinned pagetables.  Pinned pagetables are pagetables which are
considered to be active even if no processor's cr3 is pointing to is.
This means that it must remain read-only and all updates are validated
by the hypervisor.  This makes context switches much cheaper, because
the hypervisor doesn't need to revalidate the pagetable each time.

This also adds a new paravirt hook which is called during setup once
the zones and memory allocator have been initialized.  When the
init_mm pagetable is first built, the struct page array does not yet
exist, and so there's nowhere to put he init_mm pagetable's PG_pinned
flags.  Once the zones are initialized and the struct page array
exists, we can set the PG_pinned flags for those pages.

This patch also adds the Xen support for pte pages allocated out of
highmem (highpte) by implementing xen_kmap_atomic_pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Zach Amsden <zach@vmware.com>
2007-07-18 08:47:43 -07:00
Jeremy Fitzhardinge e738fca8d7 xen: configuration
Put config options for Xen after the core pieces are in place.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2007-07-18 08:47:43 -07:00
Jeremy Fitzhardinge 15c84731d6 xen: time implementation
Xen maintains a base clock which measures nanoseconds since system
boot.  This is provided to guests via a shared page which contains a
base time in ns, a tsc timestamp at that point and tsc frequency
parameters.  Guests can compute the current time by reading the tsc
and using it to extrapolate the current time from the basetime.  The
hypervisor makes sure that the frequency parameters are updated
regularly, paricularly if the tsc changes rate or stops.

This is implemented as a clocksource, so the interface to the rest of
the kernel is a simple clocksource which simply returns the current
time directly in nanoseconds.

Xen also provides a simple timer mechanism, which allows a timeout to
be set in the future.  When that time arrives, a timer event is sent
to the guest.  There are two timer interfaces:
 - An old one which also delivers a stream of (unused) ticks at 100Hz,
   and on the same event, the actual timer events.  The 100Hz ticks
   cause a lot of spurious wakeups, but are basically harmless.
 - The new timer interface doesn't have the 100Hz ticks, and can also
   fail if the specified time is in the past.

This code presents the Xen timer as a clockevent driver, and uses the
new interface by preference.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
2007-07-18 08:47:43 -07:00
Jeremy Fitzhardinge e46cdb66c8 xen: event channels
Xen implements interrupts in terms of event channels.  Each guest
domain gets 1024 event channels which can be used for a variety of
purposes, such as Xen timer events, inter-domain events,
inter-processor events (IPI) or for real hardware IRQs.

Within the kernel, we map the event channels to IRQs, and implement
the whole interrupt handling using a Xen irq_chip.

Rather than setting NR_IRQ to 1024 under PARAVIRT in order to
accomodate Xen, we create a dynamic mapping between event channels and
IRQs.  Ideally, Linux will eventually move towards dynamically
allocating per-irq structures, and we can use a 1:1 mapping between
event channels and irqs.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric W. Biederman <ebiederm@xmission.com>
2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge 3b827c1b3a xen: virtual mmu
Xen pagetable handling, including the machinery to implement direct
pagetables.

Xen presents the real CPU's pagetables directly to guests, with no
added shadowing or other layer of abstraction.  Naturally this means
the hypervisor must maintain close control over what the guest can put
into the pagetable.

When the guest modifies the pte/pmd/pgd, it must convert its
domain-specific notion of a "physical" pfn into a global machine frame
number (mfn) before inserting the entry into the pagetable.  Xen will
check to make sure the domain is allowed to create a mapping of the
given mfn.

Xen also requires that all mappings the guest has of its own active
pagetable are read-only.  This is relatively easy to implement in
Linux because all pagetables share the same pte pages for kernel
mappings, so updating the pte in one pagetable will implicitly update
the mapping in all pagetables.

Normally a pagetable becomes active when you point to it with cr3 (or
the Xen equivalent), but when you do so, Xen must check the whole
pagetable for correctness, which is clearly a performance problem.

Xen solves this with pinning which keeps a pagetable effectively
active even if its currently unused, which means that all the normal
update rules are enforced.  This means that it need not revalidate the
pagetable when loading cr3.

This patch has a first-cut implementation of pinning, but it is more
fully implemented in a later patch.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge 5ead97c84f xen: Core Xen implementation
This patch is a rollup of all the core pieces of the Xen
implementation, including:
 - booting and setup
 - pagetable setup
 - privileged instructions
 - segmentation
 - interrupt flags
 - upcalls
 - multicall batching

BOOTING AND SETUP

The vmlinux image is decorated with ELF notes which tell the Xen
domain builder what the kernel's requirements are; the domain builder
then constructs the address space accordingly and starts the kernel.

Xen has its own entrypoint for the kernel (contained in an ELF note).
The ELF notes are set up by xen-head.S, which is included into head.S.
In principle it could be linked separately, but it seems to provoke
lots of binutils bugs.

Because the domain builder starts the kernel in a fairly sane state
(32-bit protected mode, paging enabled, flat segments set up), there's
not a lot of setup needed before starting the kernel proper.  The main
steps are:
  1. Install the Xen paravirt_ops, which is simply a matter of a
     structure assignment.
  2. Set init_mm to use the Xen-supplied pagetables (analogous to the
     head.S generated pagetables in a native boot).
  3. Reserve address space for Xen, since it takes a chunk at the top
     of the address space for its own use.
  4. Call start_kernel()

PAGETABLE SETUP

Once we hit the main kernel boot sequence, it will end up calling back
via paravirt_ops to set up various pieces of Xen specific state.  One
of the critical things which requires a bit of extra care is the
construction of the initial init_mm pagetable.  Because Xen places
tight constraints on pagetables (an active pagetable must always be
valid, and must always be mapped read-only to the guest domain), we
need to be careful when constructing the new pagetable to keep these
constraints in mind.  It turns out that the easiest way to do this is
use the initial Xen-provided pagetable as a template, and then just
insert new mappings for memory where a mapping doesn't already exist.

This means that during pagetable setup, it uses a special version of
xen_set_pte which ignores any attempt to remap a read-only page as
read-write (since Xen will map its own initial pagetable as RO), but
lets other changes to the ptes happen, so that things like NX are set
properly.

PRIVILEGED INSTRUCTIONS AND SEGMENTATION

When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
This means that it is more privileged than user-mode in ring 3, but it
still can't run privileged instructions directly.  Non-performance
critical instructions are dealt with by taking a privilege exception
and trapping into the hypervisor and emulating the instruction, but
more performance-critical instructions have their own specific
paravirt_ops.  In many cases we can avoid having to do any hypercalls
for these instructions, or the Xen implementation is quite different
from the normal native version.

The privileged instructions fall into the broad classes of:
  Segmentation: setting up the GDT and the GDT entries, LDT,
     TLS and so on.  Xen doesn't allow the GDT to be directly
     modified; all GDT updates are done via hypercalls where the new
     entries can be validated.  This is important because Xen uses
     segment limits to prevent the guest kernel from damaging the
     hypervisor itself.
  Traps and exceptions: Xen uses a special format for trap entrypoints,
     so when the kernel wants to set an IDT entry, it needs to be
     converted to the form Xen expects.  Xen sets int 0x80 up specially
     so that the trap goes straight from userspace into the guest kernel
     without going via the hypervisor.  sysenter isn't supported.
  Kernel stack: The esp0 entry is extracted from the tss and provided to
     Xen.
  TLB operations: the various TLB calls are mapped into corresponding
     Xen hypercalls.
  Control registers: all the control registers are privileged.  The most
     important is cr3, which points to the base of the current pagetable,
     and we handle it specially.

Another instruction we treat specially is CPUID, even though its not
privileged.  We want to control what CPU features are visible to the
rest of the kernel, and so CPUID ends up going into a paravirt_op.
Xen implements this mainly to disable the ACPI and APIC subsystems.

INTERRUPT FLAGS

Xen maintains its own separate flag for masking events, which is
contained within the per-cpu vcpu_info structure.  Because the guest
kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
ignored (and must be, because even if a guest domain disables
interrupts for itself, it can't disable them overall).

(A note on terminology: "events" and interrupts are effectively
synonymous.  However, rather than using an "enable flag", Xen uses a
"mask flag", which blocks event delivery when it is non-zero.)

There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
are implemented to manage the Xen event mask state.  The only thing
worth noting is that when events are unmasked, we need to explicitly
see if there's a pending event and call into the hypervisor to make
sure it gets delivered.

UPCALLS

Xen needs a couple of upcall (or callback) functions to be implemented
by each guest.  One is the event upcalls, which is how events
(interrupts, effectively) are delivered to the guests.  The other is
the failsafe callback, which is used to report errors in either
reloading a segment register, or caused by iret.  These are
implemented in i386/kernel/entry.S so they can jump into the normal
iret_exc path when necessary.

MULTICALL BATCHING

Xen provides a multicall mechanism, which allows multiple hypercalls
to be issued at once in order to mitigate the cost of trapping into
the hypervisor.  This is particularly useful for context switches,
since the 4-5 hypercalls they would normally need (reload cr3, update
TLS, maybe update LDT) can be reduced to one.  This patch implements a
generic batching mechanism for hypercalls, which gets used in many
places in the Xen code.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ian Pratt <ian.pratt@xensource.com>
Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Cc: Adrian Bunk <bunk@stusta.de>
2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge 24037a8b69 Add nosegneg capability to the vsyscall page notes
Add the "nosegneg" fake capabilty to the vsyscall page notes. This is
used by the runtime linker to select a glibc version which then
disables negative-offset accesses to the thread-local segment via
%gs. These accesses require emulation in Xen (because segments are
truncated to protect the hypervisor address space) and avoiding them
provides a measurable performance boost.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ulrich Drepper <drepper@redhat.com>
2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge 688340ea34 Add a sched_clock paravirt_op
The tsc-based get_scheduled_cycles interface is not a good match for
Xen's runstate accounting, which reports everything in nanoseconds.

This patch replaces this interface with a sched_clock interface, which
matches both Xen and VMI's requirements.

In order to do this, we:
   1. replace get_scheduled_cycles with sched_clock
   2. hoist cycles_2_ns into a common header
   3. update vmi accordingly

One thing to note: because sched_clock is implemented as a weak
function in kernel/sched.c, we must define a real function in order to
override this weak binding.  This means the usual paravirt_ops
technique of using an inline function won't work in this case.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: john stultz <johnstul@us.ibm.com>
2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge d572929cdd paravirt: helper to disable all IO space
In a virtual environment, device drivers such as legacy IDE will waste
quite a lot of time probing for their devices which will never appear.
This helper function allows a paravirt implementation to lay claim to
the whole iomem and ioport space, thereby disabling all device drivers
trying to claim IO resources.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge bdef40a6af paravirt: export __supported_pte_mask
__supported_pte_mask is needed when constructing pte values.  Xen
device drivers need to do this to make mappings of foreign pages (ie,
pages granted to us by other domains).

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge c70df74376 paravirt: make siblingmap functions visible
Paravirt implementations need to set the sibling map on new cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge 724faa89cc paravirt: unstatic smp_store_cpu_info
Paravirt implementations need to store cpu info when bringing up cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge 5378701324 paravirt: unstatic leave_mm
Make globally leave_mm visible, specifically so that Xen can use it to
shoot-down lazy uses of cr3.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge 6996d3b63f paravirt: add a hook for once the allocator is ready
Add a hook so that the paravirt backend knows when the allocator is
ready.  This is useful for the obvious reason that the allocator is
available, but the other side-effect of having the bootmem allocator
available is that each page now has an associated "struct page".

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge fdb4c338c8 paravirt: add an "mm" argument to alloc_pt
It's useful to know which mm is allocating a pagetable.  Xen uses this
to determine whether the pagetable being added to is pinned or not.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge 810bab448e use elfnote.h to generate vsyscall notes.
Use existing elfnote.h to generate vsyscall notes, rather than doing
it locally.  Changes elfnote.h a bit to suit, since this is the first
asm user, and it wasn't quite right.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.com>
2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge 86313c488a usermodehelper: Tidy up waiting
Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that.  I've
preserved the integer values so that any callers I've missed should
still work OK.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
2007-07-18 08:47:40 -07:00
Amit Arora 97ac73506c sys_fallocate() implementation on i386, x86_64 and powerpc
fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()

Signed-off-by: Amit Arora <aarora@in.ibm.com>
2007-07-17 21:42:44 -04:00
Jeff Garzik 8e1c091ccc arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()
Mark variables with uninitialized_var() if such a warning appears,
and analysis proves that the var is initialized properly on all paths
it is used.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-17 16:23:19 -04:00
Linus Torvalds 49c13b51a1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits)
  KVM: Use CPU_DYING for disabling virtualization
  KVM: Tune hotplug/suspend IPIs
  KVM: Keep track of which cpus have virtualization enabled
  SMP: Allow smp_call_function_single() to current cpu
  i386: Allow smp_call_function_single() to current cpu
  x86_64: Allow smp_call_function_single() to current cpu
  HOTPLUG: Adapt thermal throttle to CPU_DYING
  HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING
  HOTPLUG: Add CPU_DYING notifier
  KVM: Clean up #includes
  KVM: Remove kvmfs in favor of the anonymous inodes source
  KVM: SVM: Reliably detect if SVM was disabled by BIOS
  KVM: VMX: Remove unnecessary code in vmx_tlb_flush()
  KVM: MMU: Fix Wrong tlb flush order
  KVM: VMX: Reinitialize the real-mode tss when entering real mode
  KVM: Avoid useless memory write when possible
  KVM: Fix x86 emulator writeback
  KVM: Add support for in-kernel pio handlers
  KVM: VMX: Fix interrupt checking on lightweight exit
  KVM: Adds support for in-kernel mmio handlers
  ...
2007-07-17 11:50:26 -07:00
Antonino A. Daplas 623e71b035 fbcon: allow fbcon to use the primary display driver
Allow fbcon to select the primary display adapter using the
fb_is_primary_device() arch-specific helper.  If a a primary adapter is
detected, fbcon will unbind the old adapter from the VT layer, then rebind
using the new adapter.  This requires that bind_/unbind_con_driver() be made
public.

Because this feature may produce unexpected behavior (from the user's POV),
this must be explicitly enabled in Kconfig.

[akpm@linux-foundation.org: export unbind_con_driver]
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:11 -07:00
Antonino A. Daplas 317b3c2167 fbdev: detect primary display device
Add function helper, fb_is_primary_device().  Given struct fb_info, it will
return a nonzero value if the device is the primary display.

Currently, only the i386 is supported where the function checks for the
IORESOURCE_ROM_SHADOW flag.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:11 -07:00
Andrew Morton f2890255b0 i386: speedup touch_nmi_watchdog
Avoid dirtying remote cpu's memory if it already has the correct value.

Cc: Andi Kleen <ak@suse.de>
Cc: Konrad Rzeszutek <konrad@darnok.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:04 -07:00
Ananth N Mavinakayanahalli 87a7defb0d Kprobes on select architectures no longer EXPERIMENTAL
Based on usage and testing over the past couple of years, kprobes on
i386, ia64, powerpc and x86_64 is no longer EXPERIMENTAL.

This is a follow-up to Robert P.J. Day's patch making "Instrumentation
support" non-EXPERIMENTAL:

	http://marc.info/?l=linux-kernel&m=118396955423812&w=2

Arch maintainers for sparc64, avr32 and s390 need to take a similar call.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Alexey Dobriyan f284ce7269 PTRACE_POKEDATA consolidation
Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata()
function.

AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless
return EPERM.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Alexey Dobriyan 7664732315 PTRACE_PEEKDATA consolidation
Identical implementations of PTRACE_PEEKDATA go into generic_ptrace_peekdata()
function.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Pavel Emelianov bcdcd8e725 Report that kernel is tainted if there was an OOPS
If the kernel OOPSed or BUGed then it probably should be considered as
tainted.  Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel.  This saves a lot of time explaining oddities in the
calltraces.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson  -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Rafael J. Wysocki 8314418629 Freezer: make kernel threads nonfreezable by default
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves.  This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie.  to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear.  Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Sam Ravnborg d3ab78560b kbuild: remove hardcoded apic_es7000 from modpost
Replace the hardcoded variable name apic_es7000 in modpost
with a __initdata_refok marker.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-07-16 23:24:51 +02:00
Heiko Carstens 608e261968 generic bug: use show_regs() instead of dump_stack()
The current generic bug implementation has a call to dump_stack() in case a
WARN_ON(whatever) gets hit.  Since report_bug(), which calls dump_stack(),
gets called from an exception handler we can do better: just pass the
pt_regs structure to report_bug() and pass it to show_regs() in case of a
warning.  This will give more debug informations like register contents,
etc...  In addition this avoids some pointless lines that dump_stack()
emits, since it includes a stack backtrace of the exception handler which
is of no interest in case of a warning.  E.g.  on s390 the following lines
are currently always present in a stack backtrace if dump_stack() gets
called from report_bug():

 [<000000000001517a>] show_trace+0x92/0xe8)
 [<0000000000015270>] show_stack+0xa0/0xd0
 [<00000000000152ce>] dump_stack+0x2e/0x3c
 [<0000000000195450>] report_bug+0x98/0xf8
 [<0000000000016cc8>] illegal_op+0x1fc/0x21c
 [<00000000000227d6>] sysc_return+0x0/0x10

Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:51 -07:00
Andrea Arcangeli cf99abace7 make seccomp zerocost in schedule
This follows a suggestion from Chuck Ebbert on how to make seccomp
absolutely zerocost in schedule too.  The only remaining footprint of
seccomp is in terms of the bzImage size that becomes a few bytes (perhaps
even a few kbytes) larger, measure it if you care in the embedded.

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:50 -07:00
Jan Engelhardt 2a07c8f9cd Use menuconfig objects II - oprofile
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:40 -07:00
Eric W. Biderman b1c931e393 x86: initial fixmap support
Needed to get fixed virtual address for USB debug and earlycon with mmio.

Signed-off-by: Eric W. Biderman <ebiderman@xmisson.com>
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:35 -07:00
Avi Kivity de48935391 i386: Allow smp_call_function_single() to current cpu
This removes the requirement for callers to get_cpu() to check in simple
cases.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:50 +03:00
Avi Kivity 38ef6d195f HOTPLUG: Adapt thermal throttle to CPU_DYING
CPU_DYING is notified in atomic context, so no taking mutexes here.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:50 +03:00
Dave Jones 9a60ddbcb7 [CPUFREQ] Fix typos in powernow-k8 printk's.
Based on a patch from Joachim which didn't apply, so I fixed
it up by hand, and also corrected the surrounding indentation
a little.

Signed-off-by: Joachim.Deguara <joachim.deguara@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-07-13 01:34:10 -04:00
Andrew Morton aac22d0a79 [CPUFREQ] powernow-k8 compile fix.
Make it compile on UP.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-07-13 01:29:51 -04:00
Adrian Bunk 68485695e5 [CPUFREQ] the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI
This patch contains the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-07-13 01:29:51 -04:00
Rafał Bilski 905497c4b2 [CPUFREQ] Longhaul - Option to disable ACPI C3 support
On some motherboards ACPI C3 is available, but it isn't
causing frequency transition on VIA Nehemiah. Longhaul
wasn't working at all earlier, but due to
scaling_cur_speed returning true CPU frequency now, it
looks like CPU is getting stuck at highest frequency
since 2.6.21. I didn't find a reason. Halt is causing
frequency transition.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-07-13 01:29:50 -04:00
Linus Torvalds 702ed6ef37 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix sysfs_create_file return value handling
  [CPUFREQ] ondemand: fix tickless accounting and software coordination bug
  [CPUFREQ] ondemand: add a check to avoid negative load calculation
  [CPUFREQ] Keep userspace governor quiet when it is not being used
  [CPUFREQ] Longhaul - Proper register access
  [CPUFREQ] Kconfig powernow-k8 driver should depend on ACPI P-States driver
  [CPUFREQ] Longhaul - Replace ACPI functions with direct I/O
  [CPUFREQ] Longhaul - Remove duplicate multipliers
  [CPUFREQ] Longhaul - Embedded "conservative"
  [CPUFREQ] acpi-cpufreq: Proper ReadModifyWrite of PERF_CTL MSR
  [CPUFREQ] check return value of sysfs_create_file
  [CPUFREQ] Longhaul - Check ACPI "BM DMA in progress" bit
  [CPUFREQ] Longhaul - Move old_ratio to correct place
  [CPUFREQ] Longhaul - VT8237 support
  [CPUFREQ] Longhaul - Use all kinds of support
  [CPUFREQ] powernow-k8: clarify number of cores.
2007-07-12 13:42:43 -07:00
Linus Torvalds 21ba0f88ae Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (34 commits)
  PCI: Only build PCI syscalls on architectures that want them
  PCI: limit pci_get_bus_and_slot to domain 0
  PCI: hotplug: acpiphp: avoid acpiphp "cannot get bridge info" PCI hotplug failure
  PCI: hotplug: acpiphp: remove hot plug parameter write to PCI host bridge
  PCI: hotplug: acpiphp: fix slot poweroff problem on systems without _PS3
  PCI: hotplug: pciehp: wait for 1 second after power off slot
  PCI: pci_set_power_state(): check for PM capabilities earlier
  PCI: cpci_hotplug: Convert to use the kthread API
  PCI: add pci_try_set_mwi
  PCI: pcie: remove SPIN_LOCK_UNLOCKED
  PCI: ROUND_UP macro cleanup in drivers/pci
  PCI: remove pci_dac_dma_... APIs
  PCI: pci-x-pci-express-read-control-interfaces cleanups
  PCI: Fix typo in include/linux/pci.h
  PCI: pci_ids, remove double or more empty lines
  PCI: pci_ids, add atheros and 3com_2 vendors
  PCI: pci_ids, reorder some entries
  PCI: i386: traps, change VENDOR to DEVICE
  PCI: ATM: lanai, change VENDOR to DEVICE
  PCI: Change all drivers to use pci_device->revision
  ...
2007-07-12 13:40:57 -07:00
H. Peter Anvin c397368232 Remove old i386 setup code
This removes the old i386 setup code.  This is done as a separate patch
to avoid breaking git bisect as some of the i386 code was also used by
the old x86-64 code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:56 -07:00
H. Peter Anvin 4fd06960f1 Use the new x86 setup code for i386
This patch hooks the new x86 setup code into the Makefile machinery.  It
also adapts boot/tools/build.c to a two-file (as opposed to three-file)
universe, and simplifies it substantially.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin f2d98ae63d Linker script for the new x86 setup code
Linker script to define the layout of the new x86 setup code.
Includes assert for size overflow and a misaligned setup header.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 626073132b Assembly header and main routine for new x86 setup code
The assembly header and initialization code, and the main() routine.
main.c also contains some miscellaneous very short routines.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 7052fdd890 Code for actual protected-mode entry
This is the code which actually does the switch to protected mode,
including all preparation.  It is also responsible for invoking the
boot loader hooks, if present.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 5e8ddcbe86 Video mode probing support for the new x86 setup code
Video mode probing for the new x86 setup code.  This code breaks down
different drivers into modules.  This code deliberately drops support
for a lot of the vendor-specific mode probing present in the assembly
version, since a lot of those probes have been found to be stale in
current versions of those chips -- frequently, support for those modes
have been dropped from recent video BIOSes due to space constraints,
but the video BIOS signatures are still the same.

However, additional drivers should be extremely straightforward to plug
in, if desirable.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 337496eb73 Voyager support for the new x86 setup code
Voyager support for the new x86 setup code.  This implements the same
functionality as the assembly version.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 449f2ab946 Memory probing support for the new x86 setup code
Probe memory (INT 15h: E820, E801, 88).

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 3b53d3045b MCA support for new x86 setup code
MCA probing support for the new x86 setup code.  This implements the
same functionality as the assembly version.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin d13444a5a5 EDD probing code for the new x86 setup code
Probe EDD and MBR signatures, in order to make it easier to map
physical hard drives to BIOS drives.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 31b54f40e1 CPU features verification for the new x86 setup code
Verify that the CPU has enough features to run the kernel.  This may
entail enabling features on some CPUs.

By doing this in the setup code we can be guaranteed to still be able to
write to the console through the BIOS.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 0008ea39bd Version string for the new x86 setup code
Module which only includes the kernel version string.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 1543610ad7 Console-writing code for the new x86 setup code
This implements writing text to the console, including printf().

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin e44c22f65f Command-line parsing code for the new x86 setup code
Simple command-line parser which allows us to access the kernel command
line from the setup code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 49df18fa3f APM probing code
APM probing code for the new x86 setup code.  This implements the
same functionality as the assembly version.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 5a8a8128bc A20 handling code
A20 handling code for the new x86 setup code.  This implements the same
algorithms as the assembly version.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin 5be8656615 String-handling functions for the new x86 setup code.
strcmp(), memcpy(), memset(), as well as routines to copy to and from
other segments (as pointed to by fs and gs).

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin ad7e906d56 Simple bitops for the new x86 setup code.
A simple collection of bitops for the new x86 setup code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin 62bd0337d0 Top header file for new x86 setup code
Top header file for the new x86 setup code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin f7f4a5fbd2 Header file to produce 16-bit code with gcc
gcc for i386 can be used with the assembly prefix ".code16gcc" to generate
16-bit (real-mode) code.  This header file provides the assembly prefix.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin 48c7ae674f Make struct boot_params a real structure, and remove obsolete fields
Make struct boot_params a real structure, and remove the handling of
some obsolete fields, in particular hd*_info, which was only used by
the ST-506 driver, and likely to be wrong for that driver on any
modern BIOS.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin 9c25d134b3 Make definitions for struct e820entry and struct e820map consistent
Make definitions for struct e820entry and struct e820map
consistent between i386 and x86-64.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin 85414b693a Define zero-page offset 0x1e4 as a scratch field, and use it
The relocatable kernel code needs a scratch field for the decompressor
to determine its own location.  It was using a location inside
struct screen_info; reserve a free location and document it as scratch
instead.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
Venki Pallipadi 1d67953f2b Use a new CPU feature word to cover features that are spread around
Some Intel features are spread around in different CPUID leafs like 0x5,
0x6 and 0xA.  Make this feature detection code common across i386 and
x86_64.

Display Intel Dynamic Acceleration feature in /proc/cpuinfo. This feature
will be enabled automatically by current acpi-cpufreq driver.

Refer to Intel Software Developer's Manual for more details about the feature.

Thanks to hpa (H Peter Anvin) for the making the actual code detecting the
scattered features data-driven.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin de32e04175 x86 Kconfig: change X86_MINIMUM_CPU_MODEL to X86_MINIMUM_CPU_FAMILY
The X86_MINIMUM_CPU_MODEL name isn't really right, so change it to
X86_MINIMUM_CPU_FAMILY.  Also, the default minimum should be 3, not 0.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin ec481536b1 Unify the CPU features vectors between i386 and x86-64
Unify the handling of the CPU features vectors between i386 and x86-64.
This also adopts the collapsing of features which are required at
compile-time into constant tests from x86-64 to i386.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
Jiri Slaby c43eaa02ab PCI: i386: traps, change VENDOR to DEVICE
traps, change VENDOR to DEVICE

Change macro for SGI lithium (arch/i386/mach-visws/traps.c) device from
VENDOR to DEVICE, because it's a device id.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:02:10 -07:00
Auke Kok 44c10138fd PCI: Change all drivers to use pci_device->revision
Instead of all drivers reading pci config space to get the revision
ID, they can now use the pci_device->revision member.

This exposes some issues where drivers where reading a word or a dword
for the revision number, and adding useless error-handling around the
read. Some drivers even just read it for no purpose of all.

In devices where the revision ID is being copied over and used in what
appears to be the equivalent of hotpath, I have left the copy code
and the cached copy as not to influence the driver's performance.

Compile tested with make all{yes,mod}config on x86_64 and i386.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:02:10 -07:00
Ingo Molnar bb29ab2686 sched: x86, track TSC-unstable events
track TSC-unstable events and propagate it to the scheduler code.
Also allow sched_clock() to be used when the TSC is unstable,
the rq_clock() wrapper creates a reliable clock out of it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09 18:51:59 +02:00
Ingo Molnar 0437e109e1 sched: zap the migration init / cache-hot balancing code
the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.

this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.

(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)

under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09 18:51:57 +02:00
Dave Jones 38377be88a Clean up E7520/7320/7525 quirk printk.
The printk level in this printk is bogus, as the previous printk
didn't have a terminating \n resulting in ..

Intel E7520/7320/7525 detected.<6>Disabling irq balancing and affinity

It also never printed a \n at all in the case where we didn't do
the quirk.

Change it to only make noise if it actually does something useful.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-07 13:53:13 -07:00
Andres Salomon 95069f89e8 GEODE: reboot fixup for geode machines with CS5536 boards
Writing to MSR 0x51400017 forces a hard reset on CS5536-based machines,
this has the reboot fixup do just that if such a board is detected.

Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-06 11:45:11 -07:00
Vivek Goyal 071922c08c i386: es7000 build breakage fix
o Commit 1833d6bc72 broke the build if
  compiled with CONFIG_ES7000=y and CONFIG_X86_GENERICARCH=n

arch/i386/kernel/built-in.o(.init.text+0x4fa9): In function `acpi_parse_madt':
: undefined reference to `acpi_madt_oem_check'
arch/i386/kernel/built-in.o(.init.text+0x7406): In function `smp_read_mpc':
: undefined reference to `mps_oem_check'
arch/i386/kernel/built-in.o(.init.text+0x8990): In function
`connect_bsp_APIC':
: undefined reference to `enable_apic_mode'
make: *** [.tmp_vmlinux1] Error 1

o Fix the build issue. Provided the definitions of missing functions.

o Don't have ES7000 machine. Only compile tested.

Cc: Len Brown <lenb@kernel.org>
Cc: Natalie Protasevich <protasnb@gmail.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-06 10:23:43 -07:00
Loic Prylli d25c1ba2fa MTRR: Fix race causing set_mtrr to go into infinite loop
Processors synchronization in set_mtrr requires the .gate field to be set
after .count field is properly initialized.  Without an explicit barrier,
the compiler was reordering those memory stores.  That was sometimes
causing a processor (in ipi_handler) to see the .gate change and decrement
.count before the latter is set by set_mtrr() (which then hangs in a
infinite loop with irqs disabled).

Signed-off-by: Loic Prylli <loic@myri.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-06 10:23:43 -07:00
Jason Wessel 1e2e99f0e4 i386: fix regression, endless loop in ptrace singlestep over an int80
The commit 635cf99a80 introduced a
regression.  Executing a ptrace single step after certain int80
accesses will infinitely loop and never advance the PC.

The TIF_SINGLESTEP check should be done on the return from the syscall
and not before it.

I loops on each single step on the pop right after the int80 which writes out
to the console.  At that point you can issue as many single steps as you want
and it will not advance any further.

The test case is below:

/* Test whether singlestep through an int80 syscall works.
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/user.h>
#include <string.h>

static int child, status;
static struct user_regs_struct regs;

static void do_child()
{
	char str[80] = "child: int80 test\n";

	ptrace(PTRACE_TRACEME, 0, 0, 0);
	kill(getpid(), SIGUSR1);
	write(fileno(stdout),str,strlen(str));
	asm ("int $0x80" : : "a" (20)); /* getpid */
}

static void do_parent()
{
	unsigned long eip, expected = 0;
again:
	waitpid(child, &status, 0);
	if (WIFEXITED(status) || WIFSIGNALED(status))
		return;

	if (WIFSTOPPED(status)) {
		ptrace(PTRACE_GETREGS, child, 0, &regs);
		eip = regs.eip;
		if (expected)
			fprintf(stderr, "child stop @ %08lx, expected %08lx %s\n",
					eip, expected,
					eip == expected ? "" : " <== ERROR");

		if (*(unsigned short *)eip == 0x80cd) {
			fprintf(stderr, "int 0x80 at %08x\n", (unsigned int)eip);
			expected = eip + 2;
		} else
			expected = 0;

		ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
	}
	goto again;
}

int main(int argc, char * const argv[])
{
	child = fork();
	if (child)
		do_parent();
	else
		do_child();
	return 0;
}

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: <stable@kernel.org>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-06 10:23:43 -07:00
Linus Torvalds ba609a9d97 Remove some unused variables
When Andi reverted the HPET resource reservation (in commit
0f8dc2f065), he didn't remove the now
unused variables, which just causes gcc to be noisy.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-03 18:27:53 -07:00
Andi Kleen 5dcccd8d7e Revert perfctr reservation to 2.6.21 state
With this change it works again when the nmi watchdog is disabled.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-03 18:11:35 -07:00
Andi Kleen 0f8dc2f065 Revert HPET resource reservation
Matthias Lenk reports that the PCI subsystem would move the HPET on
SB400/SB600-based systems, where the HPET is in BAR1 of the SMbus
controller.

The reason? The ACPI layer registered the PCI MMIO range as being busy
too early, before PCI enumeration had happened, causing the PCI layer to
decide that it should relocate the resources somewhere else.

Firmware resources should be marked busy _after_ the PCI enumeration and
probing has happened, not before.

Remove the too-early reservation, we'll fix it up to do it properly
later.  In the meantime, this solves the regression.

Tested-by: Matthias Lenk <matthias.lenk@amd.com>
Cc: Aaron Durbin <adurbin@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-03 18:09:46 -07:00
Andrew Morton 84288ad89e i386: mtrr crash fix
Commit 3ebad59056 ("[PATCH] x86: Save and
restore the fixed-range MTRRs of the BSP when suspending") added mtrr
operations without verifying that the CPU has MTRRs.  Crashes transmeta
CPUs.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux@horizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-01 12:29:44 -07:00
Linus Torvalds 4710bcce8e i386: remove bogus mtrr range check
Commit 9215da3320 "fixed" the MTRR range
check to not allow any MTRR's under the 1MB mark (since that's where the
fixed MTRR's are active).

However, that was totally bogus, since it's normal (and almost required)
to have a large variable MTRR that starts at 0, and covers some large
percentage of the whole RAM, and then using the fixed MTRR's to override
that large MTRR to handle the special ISA hole in the 640k-1M region.

The old check was bogus too (checking that no variable MTRR is used that
is entirely under the 1MB range), but at least it wasn't actively
detrimental, because no sane situation would ever trigger such MTRR
usage in the first place.

That said, the whole notion of not allowing variable MTRR's in the low
1MB is just stupid, so rather than revert the commit, this just removes
the whole sad and unnecessary check entirely.

Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Luca Palermo <darkmage@sabayonlinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-01 10:56:11 -07:00
Randy Dunlap 80581c43d0 mtrr/cyrix: fix sections
main.c::mtrr_add() or mtrr_del() [exported]
calls main.c::mtrr_add_page() or mtrr_del_page() or mtrr_restore() [resume]
calls main.c::set_mtrr()
calls main.c::ipi_handler()
calls main.c::mtrr_if->set_all() == which can be cyrix_set_all

WARNING: arch/i386/kernel/built-in.o(.text+0x8657): Section mismatch: reference to .init.data: (between 'cyrix_set_all' and 'centaur_get_free_region')
WARNING: arch/i386/kernel/built-in.o(.text+0x866b): Section mismatch: reference to .init.data: (between 'cyrix_set_all' and 'centaur_get_free_region')
WARNING: arch/i386/kernel/built-in.o(.text+0x867e): Section mismatch: reference to .init.data: (between 'cyrix_set_all' and 'centaur_get_free_region')
WARNING: arch/i386/kernel/built-in.o(.text+0x8684): Section mismatch: reference to .init.data: (between 'cyrix_set_all' and 'centaur_get_free_region')
WARNING: arch/i386/kernel/built-in.o(.text+0x868a): Section mismatch: reference to .init.data: (between 'cyrix_set_all' and 'centaur_get_free_region')

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-28 11:34:53 -07:00
Tian Kevin c8cbee61c9 ACPI: preserve the ebx value in acpi_copy_wakeup_routine
Register %ebx serves as the "global offset table base register" for
position-independent code.  For absolute code, %ebx serves as a local
register and has no specified role in the function calling sequence.  In
either case, a function must preserve the register value for the caller.

acpi_copy_wakeup_routine overrides %ebx without saving it, this may corrupt
the called data.

Kevin found that most time the value of Sx is saved in %esi, however
sometimes compiler also uses %ebx.  When this happens, suspends fails since
sleep value in ebx is changed by acpi_copy_wakeup_routine.

The same funtion in X86_64 doesn't have this problem.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Looks-okay-to: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-24 08:59:12 -07:00
Andi Kleen 9d9bbd4d24 i386: Make CMPXCHG64 only dependent on PAE
It is only used for PAE kernels in set_64bit.

The problem is that due to a old Windows bug many CPUs need magic MSRs
to enable CMPXCHG64, and we can't do that nicely early enough before
it is potentially used.

But since we only need it in PAE kernels so only force the checking
for CMPXCHG65 with PAE.

This fixes a boot failure on Transmeta Crusoe

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-22 18:41:18 -07:00
Arjan van de Ven 0864a4e201 Allow DEBUG_RODATA and KPROBES to co-exist
Do not mark the kernel text read only if KPROBES is in the kernel;
kprobes needs to hot-patch the kernel text to insert it's
instrumentation.

In this case, only mark the .rodata segment as read only.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: S. P. Prasanna <prasanna@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: William Cohen <wcohen@redhat.com>
Cc: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-21 16:02:50 -07:00
Rafał Bilski 689eba77cb [CPUFREQ] Longhaul - Proper register access
In previous commit I used u32 for u16 register.
This code will work only when ACPI block address is set.
For now it is only for VT8235 and VT8237.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-06-21 12:57:53 -04:00
Yinghai Lu bf8c481742 x86_64: fix link warning between for .text and .init.text
WARNING: arch/x86_64/kernel/built-in.o(.text+0xace9): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad09): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad38): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: drivers/built-in.o(.text+0x3a680): Section mismatch: reference to .init.text:acpi_map_pxm_to_node (between 'acpi_get_node' and 'acpi_lock_ac_dir')

AK: also marked mtrr_bp_init __init to avoid some more warnings

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-20 14:27:26 -07:00
Andi Kleen 018d2ad0cc x86: change_page_attr bandaids
- Disable CLFLUSH again; it is still broken. Always do WBINVD.
- Always flush in the i386 case, not only when there are deferred pages.

These are both brute-force inefficient fixes, to be improved
next release cycle.

The changes to i386 are a little more extensive than strictly
needed (some dead code added), but it is more similar to the x86-64 version
now and the dead code will be used soon.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-20 14:27:26 -07:00
Andi Kleen 55181000cd x86: Disable KPROBES with DEBUG_RODATA for now
Right now Kprobes cannot write to the write protected kernel text when
DEBUG_RODATA is enabled. Disallow this in Kconfig for now.

Temporary fix for 2.6.22. In .23 add code to temporarily
unprotect it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-20 14:27:26 -07:00
Andi Kleen 388c19e176 x86: Disable DAC on VIA bridges
Several reports that VIA bridges don't support DAC and corrupt
data.  I don't know if it's fixed, but let's just blacklist
them all for now.

It can be overwritten with iommu=usedac

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-20 14:27:25 -07:00
Björn Steinbrink da88ba17de perfctr-watchdog: fix interchanged parameters to release_{evntsel,perfctr}_nmi
Fix oops triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog

The culprit seems to be 09198e68501a7e34737cd9264d266f42429abcdc:
[PATCH] i386: Clean up NMI watchdog code

In two places, the parameters to release_{evntsel,perfctr}_nmi
got interchanged during the cleanup.

Fix interchanged parameters to release_{evntsel,perfctr}_nmi.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16 13:16:15 -07:00
Björn Steinbrink 54c6ed7562 i386: use the right wrapper to disable the NMI watchdog
When disabled through /proc/sys/kernel/nmi_watchdog, the NMI watchdog uses the
stop() method directly, which does not decrement the activity counter, leading
to a BUG().  Use the wrapper function instead to fix that.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16 13:16:15 -07:00
Björn Steinbrink faa4cfa6b3 i386: fix NMI watchdog not reserving its MSRs
At system boot time, the NMI watchdog no longer reserved its MSRs, allowing
other subsystems to mess with them.  Fix that.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16 13:16:15 -07:00
Zhang Rui 3cdf552be2 ACPI: Discard invalid elements in _PSS package
Make sure that the _PSS list is sorted in
descending order by typical power dissipation.

http://bugzilla.kernel.org/show_bug.cgi?id=7880

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-06-13 21:24:02 -04:00
Yoann Padioleau a17627ef88 potential parse error in ifdef part 3
Fix various bits of obviously-busted code which we're not happening to
compile, due to ifdefs.

Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-08 17:23:33 -07:00
Steven Rostedt e5e3c84b70 enable interrupts in user path of page fault.
This is a minor fix, but what is currently there is essentially wrong.
In do_page_fault, if the faulting address from user code happens to be
in kernel address space (int *p = (int*)-1; p = 0xbed;)  then the
do_page_fault handler will jump over the local_irq_enable with the

  goto bad_area_nosemaphore;

But the first line there sees this is user code and goes through the
process of sending a signal to send SIGSEGV to the user task. This whole
time interrupts are disabled and the task can not be preempted by a
higher priority task.

This patch always enables interrupts in the user path of the
bad_area_nosemaphore.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-07 17:05:03 -07:00
Joshua Hoblitt e8666b2718 [CPUFREQ] Kconfig powernow-k8 driver should depend on ACPI P-States driver
powernow-k8 really needs to use ACPI to function on SMP systems.
The current Kconfig allows us to build kernels which fail mysteriously
for some users due to us trying to automatically enable this, and
getting it wrong.  It's easier to just present this as an option
to the user.

Signed-off-by: Joshua Hoblitt <jhoblitt@ifa.hawaii.edu>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-06-06 17:23:46 -04:00
Rafał Bilski 275bc6b7f6 [CPUFREQ] Longhaul - Replace ACPI functions with direct I/O
Current version of "bm status" bit test works as long as
no USB device is in use. When USB device is plugged in ACPI
function in this context is always returning 1. Until reboot.
Direct I/O is working fine even when many USB devices are
connected.
Change bm_timeout value to less annoying. 1000 is still much
more then worst case observed and it is much better when status
bit gets stuck.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-06-06 17:22:17 -04:00
Andrew Morton 4c738480d2 mtrr atomicity fix
Rafael gets this on an SMP box with kernel preemption enabled, during
hibernation and restore (100% of the time):

Enabling non-boot CPUs ...
BUG: using smp_processor_id() in preemptible [00000001] code: bash/4514
caller is mtrr_save_state+0x9/0x40

Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-04 13:25:09 -07:00
Tear 4d2fafd17a ACPI: Remove Dell Optiplex GX240 from the ACPI blacklist
I have a Dell Optiplex GX240 and when I boot Linux, ACPI gets set up by only
acpi=ht.  dmesg shows the following line:

   DELL GX240 detected: force use of acpi=ht

Everything seemed to be fine.  However, I discovered that everything is not
fine.  The USB controller works so slowly that copying a few (uncached) 1
megabyte large photos from a USB-enabled digital camera takes many minutes
instead of a couple of seconds.

I am using Linux 2.6.21.1 on a Debian 4.0 ("Etch") system.

I thought that this might be related to ACPI.  So I tried to boot with _only_
"acpi=force" appended to the kernel command line.  Voila, the USB controller
started to work at full speed and copying photos from my digital camera took
only seconds.

I tested the system with "acpi=force" and could not find anything which did
not work.

I thought that this might be related to interrupts and APIC as well.  (Note
that this is APIC, not ACPI.) I tried booting with _only_ "noapic" and
"nolapic" appended to the command line.  Again, the USB controller started to
work at full speed.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-06-02 00:40:37 -04:00
Sam Ravnborg f8281a2b66 microcode: fix section mismatch warning
Fix the following section mismatch warnings in microcode.c:
WARNING: arch/i386/kernel/built-in.o(.init.text+0x3966): Section mismatch: reference to .exit.text: (between 'microcode_init' and 'parse_maxcpus')
WARNING: arch/i386/kernel/built-in.o(.init.text+0x3992): Section mismatch: reference to .exit.text: (between 'microcode_init' and 'parse_maxcpus')

The warning are caused by a function marked __init that
calls a function marked __exit.
Functions marked __exit may be discarded either during link or run-time
and thus the reference is not good.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:30 -07:00
Tim Gardner b9e82af823 Work around Dell E520 BIOS reboot bug
Force Dell E520 to use the BIOS to shutdown/reboot.

I have at least one report that this patch fixes shutdown/reboot
problems on the Dell E520 platform.

(Andi says: People can always set the boot option.  It hardly seems like a
critical issue needing a backport.)

Signed-off-by: Tim Gardner <tim.gardner@ubuntu.com>
Acked-by: Andi Kleen <ak@suse.de>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:28 -07:00
Chris Wright 0939c17c7b x86: fix oprofile double free
Chuck reports that the recent fix from Andi to oprofile
6c977aad03 introduces a double free.  Each
cpu's cpu_msrs is setup to point to cpu 0's, which causes free_msrs to free
cpu 0's pointers for_each_possible_cpu.  Rather than copy the pointers, do
a deep copy instead.

[acme@redhat.com: allocate_msrs() was using for_each_online_cpu()]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dave Jones <davej@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:28 -07:00
Alexey Dobriyan fa0aa866c8 Fix vmi.c compilation
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:27 -07:00
Ivan Kokshaysky 73a74ed3a6 PCI: i386: fixup for Siemens Nixdorf AG FSC Multiprocessor Interrupt Controllers
Wolfgang gets:

 PCI: Cannot allocate resource region 0 of device 0000:00:04.0
 PCI: Error while updating region 0000:00:04.0/0 (a8008000 != fec08000)

Note that the BAR seems to have high address bits hardwired to fec00000.
And device 0000:00:04.0 is

 00:04.0 System peripheral: Siemens Nixdorf AG FSC Multiprocessor Interrupt Controller (rev 02)

I'd guess that when we try to reassign this resource, PCI interrupts might
just stop working. This could explain SCSI timeouts and other weird things.

Cc: Wolfgang Erig <Wolfgang.Erig@gmx.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-31 16:56:37 -07:00
Linus Torvalds 8387c1a463 smpboot: fix cachesize comparison in smp_tune_scheduling()
Jarek Poplawski noted that boot_cpu_data.x86_cache_size is signed int
and can be < 0 too.

In fact we test for it. Except we assigned it to an unsigned value..

Cc: Jarek Poplawski <jarkao2@o2.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-31 07:55:16 -07:00
Rafał Bilski ce243823af [CPUFREQ] Longhaul - Remove duplicate multipliers
Remove duplicate multipliers in clock_ratio table. On 1,4GHz
Nehemiah two frequencies are present twice in table. It isn't
fatal, but with voltage scaling enabled each will be set twice.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-29 16:56:40 -04:00
Rafał Bilski 73e107d4a1 [CPUFREQ] Longhaul - Embedded "conservative"
Longhaul with voltage scaling enabled works great on Ezra
CPU (Longhaul ver. 2). As long as "conservative" governor is
used. Both "ondemand" and "userspace" can change voltage
from min to max at once. Motherboard unfortunatly turns off
when vid difference is big. Longhaul was printing warning
message, but it is not enough. Now driver will have
"conservative" governor built in and will split bigger
changes to smaller ones.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-29 16:56:40 -04:00
Venki Pallipadi 13424f6514 [CPUFREQ] acpi-cpufreq: Proper ReadModifyWrite of PERF_CTL MSR
During recent acpi-cpufreq changes, writing to PERF_CTL msr
changed from RMW of entire 64 bit to RMW of low 32 bit and clearing of
upper 32 bit. Fix it back to do a proper RMW of the MSR.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-29 16:56:40 -04:00
Rafał Bilski 489dc5cb18 [CPUFREQ] Longhaul - Check ACPI "BM DMA in progress" bit
It is good idea to wait for PCI bus to become idle before
frequency change. Thanks to ACPI it is possible. It makes
sense only when northbridge support is in use because it is
only case in which we can disable arbiter after check if PCI
bus is busy.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-29 16:56:39 -04:00
Rafał Bilski 1b11d4ca6d [CPUFREQ] Longhaul - Move old_ratio to correct place
Move one line where it should be. After first transition
Longhaul will skip frequency transition if destination
frequency is already set.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-29 16:56:39 -04:00
Rafał Bilski 920dd0fbba [CPUFREQ] Longhaul - VT8237 support
Looks like VT8237 has the same bits which VT8235 has.
Poke registers if it is found.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-29 16:56:39 -04:00
Rafał Bilski 7d5edcc028 [CPUFREQ] Longhaul - Use all kinds of support
This patch is removing southbridge support as separate
kind of support. Instead it is used to make other kinds
of support more stable. Also northbridge and ACPI C3
support both will be used if both are available.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-29 16:56:39 -04:00
William Lee Irwin III af669c9729 i386 bigsmp: section mismatch fixes
WARNING: arch/i386/mach-generic/built-in.o(.data+0xc4): Section mismatch: reference to .init.text: (between 'apic_bigsmp' and 'cpu.4905')

This appears to be resolvable by removing all the __init and __initdata
qualifiers from arch/i386/mach-generic/bigsmp.c

Signed-off-by: William Irwin <bill.irwin@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:15 -07:00
Stefan Richter 1dbf37e8ad i386, x86-64: show that CONFIG_HOTPLUG_CPU is required for suspend on SMP
It's not sufficiently documented that CONFIG_HOTPLUG_CPU is required for
suspend/hibernation on SMP.

Point out the non-obvious.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:14 -07:00
Linus Torvalds 080e89270a Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fix
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fix:
  mm/slab: fix section mismatch warning
  mm: fix section mismatch warnings
  init/main: use __init_refok to fix section mismatch
  kbuild: introduce __init_refok/__initdata_refok to supress section mismatch warnings
  all-archs: consolidate .data section definition in asm-generic
  all-archs: consolidate .text section definition in asm-generic
  kbuild: add "Section mismatch" warning whitelist for powerpc
  kbuild: make better section mismatch reports on i386, arm and mips
  kbuild: make modpost section warnings clearer
  kconfig: search harder for curses library in check-lxdialog.sh
  kbuild: include limits.h in sumversion.c for PATH_MAX
  powerpc: Fix the MODALIAS generation in modpost for of devices
2007-05-21 12:03:04 -07:00
Brian Gerst 17304383eb i386: fix PGE mask
cr4 is a 32-bit register, so casting the mask to an unsigned char is wrong,
as it clears more than the PGE bit.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:56:57 -07:00
Andi Kleen 39427d6e59 i386: Enable CX8/PGE CPUID bits early on VIA C3
Fix boot failures with the early CPUID checking on VIA C3

Includes fixes from Christian Volkmann

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:56:57 -07:00
Christian Volkmann 4c1f59d8be i386: Fix wrong CPU error message in early boot path
- boot/setup.S did not print "PANIC: CPU too old for this kernel"
  ( not visible, also the message did not match )
- I add "# missed before: set ds"
  => somebody should check if I am right with the way to set.
  => seems to be a generic error in setup.S not to set "ds" for error messages.

AK: extracted patch out of other changes
AK: also couldn't find any other case where ds is wrong
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:56:57 -07:00
Andi Kleen c12ceb766e i386: Clear MCE flag on AMD K6
It reports machine check capability in CPUID, but doesn't actually
implement all the necessary MSRs of the standard Intel machine
check architecture.

This fixes a boot failure on K6s recently introduced.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:56:57 -07:00
Andi Kleen 6c977aad03 i386: Fix K8/core2 oprofile on multiple CPUs
Only try to allocate MSRs once instead of for every CPU.

This assumes the MSRs are the same on all CPUs which is currently
true. P4-HT is a special case for different SMT threads, but the code
always saves/restores all MSRs so it works identical.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:56:56 -07:00
Andi Kleen 20c3a3d0dd i386: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:56:56 -07:00
Alexey Dobriyan e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Sam Ravnborg ca967258b6 all-archs: consolidate .data section definition in asm-generic
With this consolidation we can now modify the .data
section definition in one spot for all archs.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-05-19 09:11:57 +02:00
Sam Ravnborg 7664709b44 all-archs: consolidate .text section definition in asm-generic
Move definition of .text section to asm-generic.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-05-19 09:11:57 +02:00
Dave Jones 904f7a3f04 [CPUFREQ] powernow-k8: clarify number of cores.
Indicate number of processors and cores more cleanly
in startup messages.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-18 13:25:12 -04:00
Linus Torvalds b46522394d Revert "[PATCH] x86: Drop cc-options call for all options supported in gcc 3.2+"
This reverts commit c8fdd24725.

It turns out the kernel was correct, and the gcc complaint was a gcc
bug.  The preferred stack boundary is expressed not in bytes, but in the
the log2() of the preferred boundary, so "-mpreferred-stack-boundary=2"
is in fact exactly what we want, but a gcc that is compiled for x86-64
will consider it an error (because the 64-bit calling sequence says that
the stack should be 16-byte aligned) even if we are then using "-m32" to
generate 32-bit code.

Noted-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Jan Hubicka <jh@suse.cz>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 20:18:11 -07:00
Hugh Dickins bb49b32fec i386: don't check_pgt_cache in flush_tlb_mm
No other architecture calls check_pgt_cache() from within flush_tlb_mm(),
and i386 is already calling check_pgt_cache() from the usual places,
tlb_finish_mmu() and cpu_idle() (the latter being odd, but not unusual).
flush_tlb_mm() has no business to be freeing pages: remove that line, which
sneaked in with slub's i386 support.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:05 -07:00
Bernhard Walle df652fe173 i386/x86-64: fix section mismatch
WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to
.init.text:mtrr_bp_init from .text between 'id entify_cpu' (at offset 0x6571)
and 'IRQ0x20_interrupt'

It's because identify_cpu() which is __cpuinit calls mtrr_bp_init() which is
__init(). __cpuinit() expands to nothing if CONFIG_HOTPLUG_CPU=y and so the
call is illegal.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:04 -07:00
Linus Torvalds 28aa483f80 x86: Fix discontigmem + non-HIGHMEM compile
It's not necessarily a very sane configuration, but people running "make
randconfig" noticed it wouldn't compile.  This fixes some obvious
problems in discontig.c to allow a clean compile.

Acked-by: andrew hendry <andrew.hendry@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 18:45:49 -07:00
Linus Torvalds 1ca9bc4f2a Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Correct revision mask for powernow-k8
  [CPUFREQ] powernow-k7: fix MHz rounding issue with perflib
  [CPUFREQ] Support rev H AMD64s in powernow-k8
2007-05-15 12:10:00 -07:00
Jeremy Fitzhardinge 6a3ee3d552 i386: fix voyager build
This adds an smp_ops for voyager, and hooks things up appropriately.  This is
the first baby-step to making subarch runtime switchable.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 08:54:01 -07:00
Jeremy Fitzhardinge 297d9c035e i386: move common parts of smp into their own file
Several parts of kernel/smp.c and smpboot.c are generally useful for other
subarchitectures and paravirt_ops implementations, so make them available for
reuse.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 08:54:00 -07:00
Dave Jones 99fbe1ac21 [CPUFREQ] Correct revision mask for powernow-k8
Mark Langsdorf points out that the correct define for this
revision bump is 0x80000.  Also to save us having to keep
renaming the #define, give it a more meaningful name.

Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-14 18:27:29 -04:00
Linus Torvalds faa8b6c3c2 Revert "ipmi: add new IPMI nmi watchdog handling"
This reverts commit f64da958df.

Andi Kleen is unhappy with the changes, and they really do not seem
worth it.  IPMI could use DIE_NMI_IPI instead of the new callback, even
though that ends up having its own set of problems too, mainly because
the IPMI code cannot really know the NMI was from IPMI or not.

Manually fix up conflicts in arch/x86_64/kernel/traps.c and
drivers/char/ipmi/ipmi_watchdog.c.

Cc: Andi Kleen <ak@suse.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Corey Minyard <minyard@acm.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-14 15:24:24 -07:00
Daniel Drake dc2585eb47 [CPUFREQ] powernow-k7: fix MHz rounding issue with perflib
When the PST tables are broken, powernow-k7 uses ACPI's processor_perflib to
deduce the available frequency multipliers from the _PSS tables.

Upon frequency change, processor_perflib performs some verification on the
frequency (checks that it's within allowable bounds).

powernow-k7 deals with absolute frequencies in KHz, whereas perflib only
deals with MHz values. When performing the above verification, perflib
multiplies the MHz values by 1000 to obtain the KHz value.

We then end up with situations like the following:
 - powernow-k7 multiplies the multiplier by the FSB, and obtains a value
   such as 1266768 KHz
 - perflib belives the same state has frequency of 1266 MHz
 - acpi_processor_ppc_notifier calls cpufreq_verify_within_limits to verify
   that 1266768 is in the allowable range of 0 to 1266000 (i.e. 1266 * 1000)
 - it's not, so that frequency is rejected
 - the maximum CPU frequency is not reachable

This patch solves the problem by rounding up the MHz values stored in perflib's
tables. Additionally it corrects a broken URL.

It also fixes http://bugzilla.kernel.org/show_bug.cgi?id=8255 although this
case is a bit different: the frequencies in the _PSS tables are wildly wrong,
but we get better results if we force ACPI to respect the fsb * multiplier
calculations (even though it seems that the multiplier values aren't entirely
correct either).

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-13 17:25:13 -04:00
Dave Jones 30046e5885 [CPUFREQ] Support rev H AMD64s in powernow-k8
Reported-by: Calvin Dodge <caldodge@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-05-13 11:55:14 -04:00
Christoph Lameter f1d1a842d8 SLUB: i386 support
SLUB cannot run on i386 at this point because i386 uses the page->private and
page->index field of slab pages for the pgd cache.

Make SLUB run on i386 by replacing the pgd slab cache with a quicklist.
Limit the changes as much as possible. Leave the improvised linked list in place
etc etc. This has been working here for a couple of weeks now.

Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-12 11:26:22 -07:00
Andi Kleen fd0581bbb4 i386: Fix compilation of verify_cpu.S on old binutils
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 12:53:00 -07:00
Davide Libenzi fdb902b122 signal/timer/event: eventfd wire up x86 arches
This patch wires the eventfd system call to the x86 architectures.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:37 -07:00
Davide Libenzi 57ac889850 signal/timer/event: timerfd wire up x86 arches
This patch wires the timerfd system call to the x86 architectures.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:36 -07:00
Davide Libenzi 2121e24bd8 signal/timer/event: signalfd wire up x86 arches
This patch wires the signalfd system call to the x86 architectures.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:36 -07:00
Eric W. Biederman 5a18c92aab Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization"
This reverts commit c9ccf30d77.

Entering the kernel at startup_32 without passing our real mode data in
%esi, and without guaranteeing that physical and virtual addresses are
identity mapped makes head.S impossible to maintain.

The only user of this infrastructure is lguest which is not merged so
nothing we currently support will break by removing this over designed
nightmare, and only the pending lguest patches will be affected.  The
pending Xen patches have a different entry point that they use.

We are currently discussing what Xen and lguest need to do to boot the
kernel in a more normal fashion so using startup_32 in this weird manner is
clearly not their long term direction.

So let's remove this code in head.S before it causes brain damage to people
trying to maintain head.S

Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-10 09:26:53 -07:00
Linus Torvalds 9a9136e270 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  sound: convert "sound" subdirectory to UTF-8
  MAINTAINERS: Add cxacru website/mailing list
  include files: convert "include" subdirectory to UTF-8
  general: convert "kernel" subdirectory to UTF-8
  documentation: convert the Documentation directory to UTF-8
  Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
  remove broken URLs from net drivers' output
  Magic number prefix consistency change to Documentation/magic-number.txt
  trivial: s/i_sem /i_mutex/
  fix file specification in comments
  drivers/base/platform.c: fix small typo in doc
  misc doc and kconfig typos
  Remove obsolete fat_cvf help text
  Fix occurrences of "the the "
  Fix minor typoes in kernel/module.c
  Kconfig: Remove reference to external mqueue library
  Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
  Correct comments in genrtc.c to refer to correct /proc file.
  Fix more "deprecated" spellos.
  Fix "deprecated" typoes.
  ...

Fix trivial comment conflict in kernel/relay.c.
2007-05-09 12:54:17 -07:00
H. Peter Anvin 21c42bd8db i386: cpu/transmeta.c: fix definition of USER686
The definition of USER686 is supposed to be a mask of feature bits,
not an OR of feature numbers!  It happened to work anyway on the only
processor affected, simply by pure coincidence.  Fix.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:49:33 -07:00
David Rientjes affd872ebb i386: voyager: use __maybe_unused
Replace automatic variable instances of __attribute__((unused)) with
__maybe_unused in mca_nmi_hook().

Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
David Rientjes f6744c02bc i386 pci: use __maybe_unused
Use the new macro here

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Roman Zippel c9f4f06d31 wrap access to thread_info
Recently a few direct accesses to the thread_info in the task structure snuck
back, so this wraps them with the appropriate wrapper.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Rafael J. Wysocki 455c017ae3 microcode: use suspend-related CPU hotplug notifications
Make the microcode driver use the suspend-related CPU hotplug notifications
to handle the CPU hotplug events occuring during system-wide suspend and
resume transitions.  Remove the global variable suspend_cpu_hotplug
previously used for this purpose.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Rafael J. Wysocki 8bb7844286 Add suspend-related notifications for CPU hotplug
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Fernando Luis Vazquez Cao a36166c6ef Use the APIC to determine the hardware processor id - i386
hard_smp_processor_id used to be just a macro that hard-coded
hard_smp_processor_id to 0 in the non SMP case.  When booting non SMP kernels
on hardware where the boot ioapic id is not 0 this turns out to be a problem.
This is happens frequently in the case of kdump and once in a great while in
the case of real hardware.

Use the APIC to determine the hardware processor id in both UP and SMP kernels
to fix this issue.

Notice that hard_smp_processor_id is only used by SMP code or by code that
works with apics so we do not need to handle the case when apics are not
present and hard_smp_processor_id should never be called there.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Uwe Kleine-König 5886269962 fix file specification in comments
Many files include the filename at the beginning, serveral used a wrong one.

Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:16 +02:00
Robert P. J. Day fd2dbc92e3 Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Fix a couple grammatical errors in arch/i386/Kconfig.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:23:41 +02:00
David Sterba 3dde6ad8fc Fix trivial typos in Kconfig* files
Fix several typos in help text in Kconfig* files.

Signed-off-by: David Sterba <dave@jikos.cz>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:12:20 +02:00
Linus Torvalds 01e73be3c8 Revert "fbdev: ignore VESA modes if framebuffer is disabled"
This reverts commit 464bdd33e9.

Peter Anvin correctly points out that VESA modes have nothing to do with
frame buffers per se - they are often just regular extended text modes.
Disabling them just because we don't have frame buffer support is very
wrong.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Antonino A. Daplas <adaplas@gmail.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 20:12:30 -07:00
Linus Torvalds 36f021b579 Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (32 commits)
  Use menuconfig objects - hwmon
  hwmon/smsc47b397: Use dynamic sysfs callbacks
  hwmon/smsc47b397: Convert to a platform driver
  hwmon/w83781d: Deprecate W83627HF support
  hwmon/w83781d: Use dynamic sysfs callbacks
  hwmon/w83781d: Be less i2c_client-centric
  hwmon/w83781d: Clean up conversion macros
  hwmon/w83781d: No longer use i2c-isa
  hwmon/ams: Do not print error on systems without apple motion sensor
  hwmon/ams: Fix I2C read retry logic
  hwmon: New AD7416, AD7417 and AD7418 driver
  hwmon/coretemp: Add documentation
  hwmon: New coretemp driver
  i386: Use functions from library in msr driver
  i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpu
  hwmon/lm75: Use dynamic sysfs callbacks
  hwmon/lm78: Use dynamic sysfs callbacks
  hwmon/lm78: Be less i2c_client-centric
  hwmon/lm78: No longer use i2c-isa
  hwmon: New max6650 driver
  ...
2007-05-08 12:07:28 -07:00
Antonino A. Daplas 464bdd33e9 fbdev: ignore VESA modes if framebuffer is disabled
If the option vga=<VESA graphics mode> is added to the boot parameter, it will
activate graphics mode, but without any framebuffer support, the user is left
with an unusable display.

Change the behavior such that the user is instead prompted for another mode
(ala vga=ask).

NOTE: People can always use vbetool to set a graphics mode if this is really
desired, but the number of people doing this approaches zero.

Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:26 -07:00
Bjorn Helgaas 7e92b4fc34 x86, serial: convert legacy COM ports to platform devices
Make x86 COM ports into platform devices and don't probe for them
if we have PNP.

This prevents double discovery, where a device was found both by
the legacy probe and by 8250_pnp, e.g.,

    serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
    00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A

This also means IRDA devices without a UART PNP ID will no longer be
claimed by the serial driver, which might require changes in IRDA
drivers and administration.

In addition to this patch, you may need to configure a setserial init
script, e.g., /etc/init.d/setserial, so it doesn't poke legacy UART
stuff back in.  On Debian, "dpkg-reconfigure setserial" with the "kernel"
option does this.

To force the old legacy probe behavior even when we have PNPBIOS or
ACPI, load the new legacy_serial module (or build 8250 static) with
the "legacy_serial.force" option.

[akpm@linux-foundation.org: fix makefiles]
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Jean Tourrilhes <jt@hpl.hp.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Russell King <rmk+serial@arm.linux.org.uk>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:23 -07:00
Bernhard Walle b34942fed0 Add IRQF_IRQPOLL flag on i386
Add IRQF_IRQPOLL to timer interrupts on i386.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:22 -07:00
Ananth N Mavinakayanahalli bf8f6e5b3e Kprobes: The ON/OFF knob thru debugfs
This patch provides a debugfs knob to turn kprobes on/off

o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
  not (default enabled)
o Echoing 0 to this file will disarm all installed probes
o Any new probe registration when disabled will register the probe but
  not arm it. A message will be printed out in such a case.
o When a value 1 is echoed to the file, all probes (including ones
  registered in the intervening period) will be enabled
o Unregistration will happen irrespective of whether probes are globally
  enabled or not.
o Update Documentation/kprobes.txt to reflect these changes. While there
  also update the doc to make it current.

We are also looking at providing sysrq key support to tie to the disabling
feature provided by this patch.

[akpm@linux-foundation.org: Use bool like a bool!]
[akpm@linux-foundation.org: add printk facility levels]
[cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390]
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Christoph Hellwig 4c4308cb93 kprobes: kretprobes simplifications
- consolidate duplicate code in all arch_prepare_kretprobe instances
   into common code
 - replace various odd helpers that use hlist_for_each_entry to get
   the first elemenet of a list with either a hlist_for_each_entry_save
   or an opencoded access to the first element in the caller
 - inline add_rp_inst into it's only remaining caller
 - use kretprobe_inst_table_head instead of opencoding it

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Ulrich Drepper 1c710c896e utimensat implementation
Implement utimensat(2) which is an extension to futimesat(2) in that it

a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
   of the BSD lutimes(3) functions

For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.

Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.

Also, the completely missing futimensat() functionality is added.  We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).

Test application (the syscall number will need per-arch editing):

#include <errno.h>
#include <fcntl.h>
#include <time.h>
#include <sys/time.h>
#include <stddef.h>
#include <syscall.h>

#define __NR_utimensat 280

#define UTIME_NOW       ((1l << 30) - 1l)
#define UTIME_OMIT      ((1l << 30) - 2l)

int
main(void)
{
  int status = 0;

  int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
  if (fd == -1)
    error (1, errno, "failed to create test file \"ttt\"");

  struct stat64 st1;
  if (fstat64 (fd, &st1) != 0)
    error (1, errno, "fstat failed");

  struct timespec t[2];
  t[0].tv_sec = 0;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 0;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  struct stat64 st2;
  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
    {
      puts ("atim not reset to zero");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim not reset to zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0] = st1.st_atim;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_OMIT;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
      || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
    {
      puts ("atim not set");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim changed from zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_OMIT;
  t[1] = st1.st_mtim;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
      || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
    {
      puts ("mtim changed from original time");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
      || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
    {
      puts ("mtim not set");
      status = 1;
    }
  if (status != 0)
    goto out;

  sleep (2);

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_NOW;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_NOW;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  struct timeval tv;
  gettimeofday(&tv,NULL);

  if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
      || st2.st_atim.tv_sec > tv.tv_sec)
    {
      puts ("atim not set to NOW");
      status = 1;
    }
  if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
      || st2.st_mtim.tv_sec > tv.tv_sec)
    {
      puts ("mtim not set to NOW");
      status = 1;
    }

  if (symlink ("ttt", "tttsym") != 0)
    error (1, errno, "cannot create symlink");

  t[0].tv_sec = 0;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 0;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
    error (1, errno, "utimensat failed");

  if (lstat64 ("tttsym", &st2) != 0)
    error (1, errno, "lstat failed");

  if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
    {
      puts ("symlink atim not reset to zero");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("symlink mtim not reset to zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0].tv_sec = 1;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 1;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
    {
      puts ("atim not reset to one");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim not reset to one");
      status = 1;
    }

  if (status == 0)
     puts ("all OK");

 out:
  close (fd);
  unlink ("ttt");
  unlink ("tttsym");

  return status;
}

[akpm@linux-foundation.org: add missing i386 syscall table entry]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
Guennadi Liakhovetski b247e8aaf2 dma_declare_coherent_memory wrong allocation
dma_declare_coherent_memory() allocates a bitmap 1 bit per page, it
calculates the bitmap size based on size of long, but allocates bytes...
Thanks to James Bottomley for clarifications and corrections.

Signed-off-by: G. Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
Alan Cox c991938018 apm: fix incorrect comment
HZ has not always been 100Hz for some time.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
Bjorn Helgaas 873ec74615 EFI: warn only for pre-1.00 system tables
We used to warn unless the EFI system table major revision was exactly 1.
But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't
affect anything in Linux.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
Ananth N Mavinakayanahalli 0f95b7fc83 Kprobes: print details of kretprobe on assertion failure
In certain cases like when the real return address can't be found or when
the number of tracked calls to a kretprobed function is less than the
number of returns, we may not be able to find the correct return address
after processing a kretprobe.  Currently we just do a BUG_ON, but no
information is provided about the actual failing kretprobe.

Print out details of the kretprobe before calling BUG().

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00
Randy Dunlap e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Christoph Hellwig 1eeb66a1bb move die notifier handling to common code
This patch moves the die notifier handling to common code.  Previous
various architectures had exactly the same code for it.  Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at.  avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Corey Minyard f64da958df ipmi: add new IPMI nmi watchdog handling
Convert over to the new NMI handling for getting IPMI watchdog timeouts via an
NMI.  This add config options to know if there is the ability to receive NMIs
and if it has an NMI post processing call.  Then it modifies the IPMI watchdog
to take advantage of this so that it can know if an NMI comes in.

It also adds testing that the IPMI NMI watchdog works.

Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:58 -07:00
Akinobu Mita 0e6b9c98be use SLAB_PANIC flag cleanup
Use SLAB_PANIC and delete duplicated panic().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Nicolas Boichat 78a62d2c98 i386: Use functions from library in msr driver
Use safe MSR functions provided by arch/*/lib/msr-on-cpu.c in
arch/i386/kernel/msr.c.

Signed-off-by: Nicolas Boichat <nicolas@boichat.ch>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-08 17:22:01 +02:00
Rudolf Marek 4e9baad8f5 i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpu
Add safe (exception handled) variants of rdmsr_on_cpu and wrmsr_on_cpu.
You should use these when the target MSR may not actually exist, as
doing so could trigger an exception which the regular functions do not
handle. The safe variants are slower, though.

The upcoming coretemp hardware monitoring driver will need this.

Signed-off-by: Rudolf Marek <r.marek@assembler.cz>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-08 17:22:01 +02:00
Benjamin Herrenschmidt 5a8130f2b1 get_unmapped_area handles MAP_FIXED on i386
Handle MAP_FIXED in i386 hugetlb_get_unmapped_area(), just call
prepare_hugepage_range.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:56 -07:00
Christoph Lameter 81819f0fc8 SLUB core
This is a new slab allocator which was motivated by the complexity of the
existing code in mm/slab.c. It attempts to address a variety of concerns
with the existing implementation.

A. Management of object queues

   A particular concern was the complex management of the numerous object
   queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
   each allocating CPU and use objects from a slab directly instead of
   queueing them up.

B. Storage overhead of object queues

   SLAB Object queues exist per node, per CPU. The alien cache queue even
   has a queue array that contain a queue for each processor on each
   node. For very large systems the number of queues and the number of
   objects that may be caught in those queues grows exponentially. On our
   systems with 1k nodes / processors we have several gigabytes just tied up
   for storing references to objects for those queues  This does not include
   the objects that could be on those queues. One fears that the whole
   memory of the machine could one day be consumed by those queues.

C. SLAB meta data overhead

   SLAB has overhead at the beginning of each slab. This means that data
   cannot be naturally aligned at the beginning of a slab block. SLUB keeps
   all meta data in the corresponding page_struct. Objects can be naturally
   aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
   boundaries and can fit tightly into a 4k page with no bytes left over.
   SLAB cannot do this.

D. SLAB has a complex cache reaper

   SLUB does not need a cache reaper for UP systems. On SMP systems
   the per CPU slab may be pushed back into partial list but that
   operation is simple and does not require an iteration over a list
   of objects. SLAB expires per CPU, shared and alien object queues
   during cache reaping which may cause strange hold offs.

E. SLAB has complex NUMA policy layer support

   SLUB pushes NUMA policy handling into the page allocator. This means that
   allocation is coarser (SLUB does interleave on a page level) but that
   situation was also present before 2.6.13. SLABs application of
   policies to individual slab objects allocated in SLAB is
   certainly a performance concern due to the frequent references to
   memory policies which may lead a sequence of objects to come from
   one node after another. SLUB will get a slab full of objects
   from one node and then will switch to the next.

F. Reduction of the size of partial slab lists

   SLAB has per node partial lists. This means that over time a large
   number of partial slabs may accumulate on those lists. These can
   only be reused if allocator occur on specific nodes. SLUB has a global
   pool of partial slabs and will consume slabs from that pool to
   decrease fragmentation.

G. Tunables

   SLAB has sophisticated tuning abilities for each slab cache. One can
   manipulate the queue sizes in detail. However, filling the queues still
   requires the uses of the spin lock to check out slabs. SLUB has a global
   parameter (min_slab_order) for tuning. Increasing the minimum slab
   order can decrease the locking overhead. The bigger the slab order the
   less motions of pages between per CPU and partial lists occur and the
   better SLUB will be scaling.

G. Slab merging

   We often have slab caches with similar parameters. SLUB detects those
   on boot up and merges them into the corresponding general caches. This
   leads to more effective memory use. About 50% of all caches can
   be eliminated through slab merging. This will also decrease
   slab fragmentation because partial allocated slabs can be filled
   up again. Slab merging can be switched off by specifying
   slub_nomerge on boot up.

   Note that merging can expose heretofore unknown bugs in the kernel
   because corrupted objects may now be placed differently and corrupt
   differing neighboring objects. Enable sanity checks to find those.

H. Diagnostics

   The current slab diagnostics are difficult to use and require a
   recompilation of the kernel. SLUB contains debugging code that
   is always available (but is kept out of the hot code paths).
   SLUB diagnostics can be enabled via the "slab_debug" option.
   Parameters can be specified to select a single or a group of
   slab caches for diagnostics. This means that the system is running
   with the usual performance and it is much more likely that
   race conditions can be reproduced.

I. Resiliency

   If basic sanity checks are on then SLUB is capable of detecting
   common error conditions and recover as best as possible to allow the
   system to continue.

J. Tracing

   Tracing can be enabled via the slab_debug=T,<slabcache> option
   during boot. SLUB will then protocol all actions on that slabcache
   and dump the object contents on free.

K. On demand DMA cache creation.

   Generally DMA caches are not needed. If a kmalloc is used with
   __GFP_DMA then just create this single slabcache that is needed.
   For systems that have no ZONE_DMA requirement the support is
   completely eliminated.

L. Performance increase

   Some benchmarks have shown speed improvements on kernbench in the
   range of 5-10%. The locking overhead of slub is based on the
   underlying base allocation size. If we can reliably allocate
   larger order pages then it is possible to increase slub
   performance much further. The anti-fragmentation patches may
   enable further performance increases.

Tested on:
i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

SLUB Boot options

slub_nomerge		Disable merging of slabs
slub_min_order=x	Require a minimum order for slab caches. This
			increases the managed chunk size and therefore
			reduces meta data and locking overhead.
slub_min_objects=x	Mininum objects per slab. Default is 8.
slub_max_order=x	Avoid generating slabs larger than order specified.
slub_debug		Enable all diagnostics for all caches
slub_debug=<options>	Enable selective options for all caches
slub_debug=<o>,<cache>	Enable selective options for a certain set of
			caches

Available Debug options
F		Double Free checking, sanity and resiliency
R		Red zoning
P		Object / padding poisoning
U		Track last free / alloc
T		Trace all allocs / frees (only use for individual slabs).

To use SLUB: Apply this patch and then select SLUB as the default slab
allocator.

[hugh@veritas.com: fix an oops-causing locking error]
[akpm@linux-foundation.org: various stupid cleanups and small fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Linus Torvalds e3ebadd95c Revert "[PATCH] x86: __pa and __pa_symbol address space separation"
This was broken.  It adds complexity, for no good reason.  Rather than
separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
and preferably __pa() too - and just use "virt_to_phys()" instead, which
is more readable and has nicer semantics.

However, right now, just undo the separation, and make __pa_symbol() be
the exact same as __pa().  That fixes the bugs this patch introduced,
and we can do the fairly obvious cleanups later.

Do the new __phys_addr() function (which is now the actual workhorse for
the unified __pa()/__pa_symbol()) as a real external function, that way
all the potential issues with compile/link-time optimizations of
constant symbol addresses go away, and we can also, if we choose to, add
more sanity-checking of the argument.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 08:44:24 -07:00
Linus Torvalds ea62ccd00f Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
  [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
  [PATCH] i386: type may be unused
  [PATCH] i386: Some additional chipset register values validation.
  [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
  [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
  [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
  [PATCH] i386: white space fixes in i387.h
  [PATCH] i386: Drop noisy e820 debugging printks
  [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
  [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
  [PATCH] x86-64: Share identical video.S between i386 and x86-64
  [PATCH] x86-64: Remove CONFIG_REORDER
  [PATCH] x86-64: Print type and size correctly for unknown compat ioctls
  [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
  [PATCH] i386: Little cleanups in smpboot.c
  [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
  [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
  [PATCH] i386: Add X86_FEATURE_RDTSCP
  [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
  [PATCH] i386: Implement alternative_io for i386
  ...

Fix up trivial conflict in include/linux/highmem.h manually.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-05 14:55:20 -07:00
Linus Torvalds fabb5c4e4a Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6:
  [VOYAGER] add smp alternatives
  [VOYAGER] Use modern techniques to setup and teardown low identiy mappings.
  [VOYAGER] Convert the monitor thread to use the kthread API
  [VOYAGER] clockevents driver: bring voyager in to line
  [VOYAGER] clockevents: correct boot cpu is zero assumption
  [VOYAGER] add smp_call_function_single
2007-05-05 13:30:23 -07:00
Linus Torvalds 89661adaae Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (59 commits)
  PCI: Free resource files in error path of pci_create_sysfs_dev_files()
  pci-quirks: disable MSI on RS400-200 and RS480
  PCI hotplug: Use menuconfig objects
  PCI: ZT5550 CPCI Hotplug driver fix
  PCI: rpaphp: Remove semaphores
  PCI: rpaphp: Ensure more pcibios_add/pcibios_remove symmetry
  PCI: rpaphp: Use pcibios_remove_pci_devices() symmetrically
  PCI: rpaphp: Document is_php_dn()
  PCI: rpaphp: Document find_php_slot()
  PCI: rpaphp: Rename rpaphp_register_pci_slot() to rpaphp_enable_slot()
  PCI: rpaphp: refactor tail call to rpaphp_register_slot()
  PCI: rpaphp: remove rpaphp_set_attention_status()
  PCI: rpaphp: remove print_slot_pci_funcs()
  PCI: rpaphp: Remove setup_pci_slot()
  PCI: rpaphp: remove a call that does nothing but a pointer lookup
  PCI: rpaphp: Remove another wrappered function
  PCI: rpaphp: Remve another call that is a wrapper
  PCI: rpaphp: remove a function that does nothing but wrap debug printks
  PCI: rpaphp: Remove un-needed goto
  PCI: rpaphp: Fix a memleak; slot->location string was never freed
  ...
2007-05-04 18:04:29 -07:00
Linus Torvalds ded1504dfa Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Report the number of processors in PowerNow-k8 correctly
  [CPUFREQ] do not declare undefined functions
  [CPUFREQ] cleanup kconfig options
  [CPUFREQ] Longhaul - Revert Longhaul ver. 2
  [CPUFREQ] Remove deprecated /proc/acpi/processor/performance write support
  [CPUFREQ] Fix limited cpufreq when booted on battery
  Fix preemption warnings in speedstep-centrino.c
  [CPUFREQ] Longhaul - Correct PCI code
  [CPUFREQ] p4-clockmod: switch to rdmsr_on_cpu/wrmsr_on_cpu
2007-05-04 17:38:48 -07:00
Chuck Ebbert f14e313650 PCI: add debug information to resource collision message
Add more information to PCI resource collision message
to help with debugging.

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 7fe3730de7 MSI: arch must connect the irq and the msi_desc
set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
it at some point in their setup routine, and then the generic code sets up the
reverse mapping from the msi_desc back to the irq.

set_irq_msi() should do both connections, making it the one and only call
required to connect an irq with it's MSI desc and vice versa.

The arch code MUST call set_irq_msi(), and it must do so only once it's sure
it's not going to fail the irq allocation.

Given that there's no need for the arch to return the irq anymore, the return
value from the arch setup routine just becomes 0 for success and anything else
for failure.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Dan Williams f282b97021 msi: introduce ARCH_SUPPORTS_MSI Kconfig option (rev2)
Allows architectures to advertise that they support MSI rather than listing
each architecture as a PCI_MSI dependency.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Jesse Barnes 40ee9e9f8d PCI: fix sysfs rom file creation for BIOS ROM shadows
At one time, if a BIOS ROM shadow was detected for the boot video
device (stored at offset 0xc0000), we'd set a special resource flag,
IORESOURCE_ROM_SHADOW, so that the sysfs ROM file code could handle
it properly.  That broke along the way somewhere though, so current
kernels will be missing 'rom' files in sysfs if the video device
doesn't have an explicit ROM BAR.

This patch fixes the regression by moving the video fixup quirk to a
little later in the boot cycle (to avoid having its work undone by
PCI resource allocation) and checking in the PCI sysfs code whether
a rom file should be created due to a shadow resource, which is also
moved to a little later in the boot cycle so it will occur after the
video fixup.  Tested and works on my i386 test box.

Signed-off-by:  Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Jean Delvare 6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Thomas Renninger 35060b6a9a [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
In arch/i386/cpu/common.c there is:
cpu_devs[X86_VENDOR_INTEL]
cpu_devs[X86_VENDOR_CYRIX]
cpu_devs[X86_VENDOR_AMD]
...
They are all filled with data early.
The data (struct) got set to NULL  for all, but Intel in different
late_initcall (exit_cpu_vendor) calls.
I don't see what sense this makes at all, maybe something that got
forgotten with the HOTPLUG_CPU extenstions?

Please check/review whether initdata, cpuinitdata is still ok and this
still works with HOTPLUG_CPU and without, it should...

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: davej@redhat.com
2007-05-02 19:27:22 +02:00
David Rientjes a3193348d4 [PATCH] i386: type may be unused
In the case of !CONFIG_PCI_DIRECT && !CONFIG_PCI_MMCONFIG, type is
unreferened.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:22 +02:00
Olivier Galibert b5229dbb85 [PATCH] i386: Some additional chipset register values validation.
On i945, a mmconfig range hitting the f0000000-ffffffff zone conflicts
with the APIC registers and others.  Consider it invalid.

On E7520, values 0000 and f000 for the window register are defined
invalid in the documentation.

I haven't seen a bios use these values, but who trusts biosen these
days?

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>

 arch/i386/pci/mmconfig-shared.c |   25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)
2007-05-02 19:27:22 +02:00
Bill Irwin 6c2af35820 [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
Only 1GB-aligned kernel/user splits are now handled for PAE. The
2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
mapping for physical memory by using an actual split of 1.875/2.125
to accommodate 128MB of vmallocspace out of what would otherwise
be a full 2GB for userspace. That attempt disturbs the alignment
required by PAE for 2GB/2GB splits, and furthermore does not provide
a 2GB/2GB split as advertised.

This patch resolves the issues here in two manners. The first is
by providing a true 2GB/2GB split in addition to the 1.875/2.125
split. The second is by renaming the 1.875/2.125 split to
CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
performs a similar manuever to avoid aliasing vmallocspace with
the 1:1 mapping for physical memory around the 3GB boundary. With
the 1.875/2.125 split properly-named, its config option is then
tagged as depending on !HIGHMEM to express the PAE implementation's
current inability to deal with such unaligned splits.

This patch is essentially a combination of two patches, one written
by Eric Biederman and the other by Eric Dumazet. If they could add
their Signed-off-by: to this, I'd be much obliged.

Signed-off-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Mark Lord <lkml@rtr.ca>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:22 +02:00
Andi Kleen 1306383282 [PATCH] i386: Drop noisy e820 debugging printks
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen c812d6c198 [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
access_ok checks this case anyways, no need to check twice.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen ec1180db2c [PATCH] i386: Little cleanups in smpboot.c
- Remove #if that is always set
- Fix warning

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen 3aefbe0746 [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
Syncs up with x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen c7f81c9453 [PATCH] i386: Verify important CPUID bits in real mode
Check some CPUID bits that are needed for compiler generated early in boot.
When the system is still in real mode before changing the VESA BIOS mode
it is possible to still display an visible error message on the screen.

Similar to x86-64.

Includes cleanups from Eric Biederman

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 484ad39365 [PATCH] i386: Drop -traditional in arch/i386/boot
Needed for followon patch

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 09198e6850 [PATCH] i386: Clean up NMI watchdog code
- Introduce a wd_ops structure
- Convert the various nmi watchdogs over to it
- This allows to split the perfctr reservation from the watchdog
setup cleanly.
- Do perfctr reservation globally as it should have always been
- Remove dead code referenced only by unused EXPORT_SYMBOLs

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 889f21ce27 [PATCH] i386: fix wrong comment for syscall stack layout
`ret_from_sys_call' label no longer exist and `syscall_exit' label was
introduced instead.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:19 +02:00
Eric W. Biederman f26d6a2bbc [PATCH] i386: convert to the kthread API
This patch just trivial converts from calling kernel_thread and daemonize
to just calling kthread_run.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:19 +02:00
Daniel Walker df3624aa29 [PATCH] i386: remove xtime_lock'ing around cpufreq notifier
The locking of the xtime_lock around the cpu notifier is unessesary now.
At one time the tsc was used after a frequency change for timekeeping, but
the re-write of timekeeping no longer uses the TSC unless the frequency is
constant.

The variables that are changed in this section of code had also once been
used for timekeeping, but not any longer ..

Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:18 +02:00
Joachim Deguara 2f3c30e6a8 [PATCH] i386: check capability
Currently the i386 architecture checks the family for mce capability and this
removes that and uses the CPUID information.  Tested on a K8 revE and a
family10h processor.

This eliminates checking of a set AMD procesor family if mce is
allowed and relies on the information being in CPUID.

Signed-off-by: Joachim Deguara <joachim.deguara@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:18 +02:00
Keshavamurthy, Anil S 1bdae4583e [PATCH] i386: clean up flush_tlb_others fn
Cleanup flush_tlb_others(), no functional change.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:18 +02:00
Hisashi Hifumi 62dbc210e2 [PATCH] i386: replace spin_lock_irqsave with spin_lock
IRQ is already disabled through local_irq_disable().  So
spin_lock_irqsave() can be replaced with spin_lock().

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:18 +02:00
Keshavamurthy, Anil S e8a72ffa3a [PATCH] i386: avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not defined
Avoid checking for cpu gone in mm hot path when CONFIG_HOTPLUG_CPU is not
defined.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:18 +02:00
Jeremy Fitzhardinge 0260c196c9 [PATCH] i386: PARAVIRT: fix startup_ipi_hook config dependency
startup_ipi_hook depends on CONFIG_X86_LOCAL_APIC, so move it to the
right part of the paravirt_ops initialization.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Randy Dunlap 25c16b992c [PATCH] i386: fix mtrr sections
Fix section mismatch warnings in mtrr code.
Fix line length on one source line.

WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.data: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x103)
WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x180)
WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x199)
WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x1c1)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:18 +02:00
Fernando Luis [** ISO-8859-1 charset **] VzquezCao f5efb41e79 [PATCH] i386: Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386
Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
NMI_VECTOR to avoid potential hangups in the event of crash when kdump
tries to stop the other CPUs.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis [** ISO-8859-1 charset **] VzquezCao 45ae5e968e [PATCH] i386: __send_IPI_dest_field - i386
Implement __send_IPI_dest_field which can be used to send IPIs when the
"destination shorthand" field of the ICR is set to 00 (destination
field). Use it whenever possible.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis VazquezCao 4312fa8157 [PATCH] i386: use safe_apic_wait_icr_idle in smpboot.c
__inquire_remote_apic is used for APIC debugging, so use
safe_apic_wait_icr_idle  instead of apic_wait_icr_idle to avoid possible
lockups when APIC delivery fails.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Fernando Luis VazquezCao ae08e43eec [PATCH] i386: use safe_apic_wait_icr_idle - i386
The functionality provided by the new safe_apic_wait_icr_idle is being
open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
instead to consolidate code and ease maintenance.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Fernando Luis VazquezCao f2b218dd61 [PATCH] i386: safe_apic_wait_icr_idle - i386
apic_wait_icr_idle looks like this:

static __inline__ void apic_wait_icr_idle(void)
{
  while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
    cpu_relax();
}

The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.

Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:

"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.

A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-out”
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."

Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.

Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.

AK: moved both functions out of line
AK: added improved loop from Keith Owens

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl de938c51d5 [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync
If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
we are running on an AMD64/K8 system, the boot CPU must have had
MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
have copied these bits cleared), so we set them on this CPU as well.

This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
across the CPUs of SMP systems in order to fullfill the duty of
system software to "initialize and maintain MTRR consistency
across all processors." as written in the AMD and Intel manuals.

If an WRMSR instruction fails because MtrrFixDramModEn is not
set, I expect that also the Intel-style MTRR bits are not updated.

AK: minor cleanup, moved MSR defines around

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl 3ebad59056 [PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending
Note: This patch didn'nt need an update since it's initial post.

Some BIOSes may modify fixed-range MTRRs in SMM, e.g. when they
transition the system into ACPI mode, which is entered thru an SMI,
triggered by Linux in acpi_enable().

SMIs which cause that Linux is interrupted and BIOS code is
executed (which may change e.g. fixed-range MTRRs) in SMM may
be raised by an embedded system controller which is often found
in notebooks also at other occasions.

If we would not update our copy of the fixed-range MTRRs before
suspending to RAM or to disk, restore_processor_state() would
set the fixed-range MTRRs of the BSP using old backup values
which may be outdated and this could cause the system to fail
later during resume.

This patch ensures that our copy of the fixed-range MTRRs
is updated when saving the boot processor state on suspend
to disk and suspend to RAM.

In combination with other patches this allows to fix s2ram
and s2disk on the Acer Ferrari 1000 notebook and at least
s2disk on the Acer Ferrari 5000 notebook.

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl 2b1f6278d7 [PATCH] x86: Save the MTRRs of the BSP before booting an AP
Applied fix by Andew Morton:
http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.

AMD and Intel x86 CPU manuals state that it is the responsibility of
system software to initialize and maintain MTRR consistency across
all processors in Multi-Processing Environments.

Quote from page 188 of the AMD64 System Programming manual (Volume 2):

7.6.5 MTRRs in Multi-Processing Environments

"In multi-processing environments, the MTRRs located in all processors must
characterize memory in the same way. Generally, this means that identical
values are written to the MTRRs used by the processors." (short omission here)
"Failure to do so may result in coherency violations or loss of atomicity.
Processor implementations do not check the MTRR settings in other processors
to ensure consistency. It is the responsibility of system software to
initialize and maintain MTRR consistency across all processors."

Current Linux MTRR code already implements the above in the case that the
BIOS does not properly initialize MTRRs on the secondary processors,
but the case where the fixed-range MTRRs of the boot processor are changed
after Linux started to boot, before the initialsation of a secondary
processor, is not handled yet.

In this case, secondary processors are currently initialized by Linux
with MTRRs which the boot processor had very early, when mtrr_bp_init()
did run, but not with the MTRRs which the boot processor uses at the
time when that secondary processors is actually booted,
causing differing MTRR contents on the secondary processors.

Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
of the boot processor when it transitions the system into ACPI mode.
The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
code runs acpi_enable().

Other occasions where the SMI handler of the BIOS may change bits in
the MTRRs could occur as well. To initialize newly booted secodary
processors with the fixed-range MTRRs which the boot processor uses
at that time, this patch saves the fixed-range MTRRs of the boot
processor before new secondary processors are started. When the
secondary processors run their Linux initialisation code, their
fixed-range MTRRs will be updated with the saved fixed-range MTRRs.

If CONFIG_MTRR is not set, we define mtrr_save_state
as an empty statement because there is nothing to do.

Possible TODOs:

*) CPU-hotplugging outside of SMP suspend/resume is not yet tested
   with this patch.

*) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
   then the calls to mtrr_save_state() could be replaced by calls to
   mtrr_save_fixed_ranges(NULL) and  mtrr_save_state() would not be
   needed.

   That would need either verification of the CPU-hotplug code or
   at least a test on a >2 CPU machine.

*) The MTRRs of other running processors are not yet checked at this
   time but it might be interesting to syncronize the MTTRs of all
   processors before booting. That would be an incremental patch,
   but of rather low priority since there is no machine known so
   far which would require this.

AK: moved prototypes on x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl 2b3b4835c9 [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches.
In this current implementation which is used in other patches,
mtrr_save_fixed_ranges() accepts a dummy void pointer because
in the current implementation of one of these patches, this
function may be called from smp_call_function_single() which
requires that this function takes a void pointer argument.

This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
which is the element of the static struct which stores our current
backup of the fixed-range MTRR values which all CPUs shall be
using.

Because  mtrr_save_fixed_ranges calls get_fixed_ranges after
kernel initialisation time, __init needs to be removed from
the declaration of get_fixed_ranges().

If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
as an empty statement because there is nothing to do.

AK: Moved prototypes for x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge 03df4f6ee9 [PATCH] i386: Clean up ELF note generation
Three cleanups:

1: ELF notes are never mapped, so there's no need to have any access
flags in their phdr.

2: When generating them from asm, tell the assembler to use a SHT_NOTE
section type.  There doesn't seem to be a way to do this from C.

3: Use ANSI rather than traditional cpp behaviour to stringify the
macro argument.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
2007-05-02 19:27:17 +02:00
Andi Kleen 21564fd2a3 [PATCH] i386: PARAVIRT: Export paravirt_ops for non GPL modules too
Otherwise non GPL modules cannot even do basic operations
like disabling interrupts anymore, which would be excessive.

Longer term should split the single structure up into
internal and external symbols and not export the internal
ones at all.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge 441d40dca0 [PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org>
The other symbols used to delineate the alt-instructions sections have the
form __foo/__foo_end.  Rename parainstructions to match.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:16 +02:00
Zachary Amsden e0bb864397 [PATCH] i386: Convert VMI timer to use clock events
Convert VMI timer to use clock events, making it properly able to use the NO_HZ
infrastructure.  On UP systems, with no local APIC, we just continue to route
these events through the PIT.  On systems with a local APIC, or SMP, we provide
a single source interrupt chip which creates the local timer IRQ.  It actually
gets delivered by the APIC hardware, but we don't want to use the same local
APIC clocksource processing, so we create our own handler here.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Dan Hecht <dhecht@vmware.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
2007-05-02 19:27:16 +02:00
Zachary Amsden eeef9c68aa [PATCH] i386: Implement vmi_kmap_atomic_pte
Implement vmi_kmap_atomic_pte in terms of the backend set_linear_mapping
operation.  The conversion is rather straighforward; call kmap_atomic
and then inform the hypervisor of the page mapping.

The _flush_tlb damage is due to macros being pulled in from highmem.h.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Zachary Amsden 9f53a729db [PATCH] i386: Now that the VDSO can be relocated, we can support it in VMI configurations.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Zachary Amsden 18420001d6 [PATCH] i386: Clean up arch/i386/kernel/cpu/mcheck/p4.c
No, just no.  You do not use goto to skip a code block.  You do not
return an obvious variable from a singly-inlined function and give
the function a return value.  You don't put unexplained comments
about kmalloc in code which doesn't do dynamic allocation.  And
you don't leave stray warnings around for no good reason.

Also, when possible, it is better to use block scoped variables
because gcc can sometime generate better code.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 959b4fdfe7 [PATCH] i386: PARAVIRT: Allow boot-time disable of paravirt_ops patching
Add "noreplace-paravirt" to disable paravirt_ops patching.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:16 +02:00
Zachary Amsden 752783c050 [PATCH] i386: In compat mode, the return value here was uninitialized.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 9ce8c2ed12 [PATCH] i386: map enough initial memory to create lowmem mappings
head.S creates the very initial pagetable for the kernel.  This just
maps enough space for the kernel itself, and an allocation bitmap.
The amount of mapped memory is rounded up to 4Mbytes, and so this
typically ends up mapping 8Mbytes of memory.

When booting, pagetable_init() needs to create mappings for all
lowmem, and the pagetables for these mappings are allocated from the
free pages around the kernel in low memory.  If the number of
pagetable pages + kernel size exceeds head.S's initial mapping, it
will end up faulting on an unmapped page.  This will only happen with
specific combinations of kernel size and memory size.

This patch makes sure that head.S also maps enough space to fit the
kernel pagetables as well as the kernel itself.  It ends up using an
additional two pages of unreclaimable memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge c5413fbe89 [PATCH] i386: Fix UP gdt bugs
Fixes two problems with the GDT when compiling for uniprocessor:
 - There's no percpu segment, so trying to load its selector into %fs fails.
   Use a null selector instead.
 - The real gdt needs to be loaded at some point.  Do it in cpu_init().

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 7c3576d261 [PATCH] i386: Convert PDA into the percpu section
Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
    can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
    special segment at early boot; it can be deferred until the first
    percpu area is allocated (or never for UP).

The result is less code and one less x86-specific concept.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 7a61d35d4b [PATCH] i386: Page-align the GDT
Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
lguest, KVM and native don't care.

Simple transformation to page-aligned "struct gdt_page".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 35c7422649 [PATCH] x86: deflate stack usage in lib/inflate.c
inflate_fixed and huft_build together use around 2.7k of stack.  When
using 4k stacks, I saw stack overflows from interrupts arriving while
unpacking the root initrd:

do_IRQ: stack overflow: 384
 [<c0106b64>] show_trace_log_lvl+0x1a/0x30
 [<c01075e6>] show_trace+0x12/0x14
 [<c010763f>] dump_stack+0x16/0x18
 [<c0107ca4>] do_IRQ+0x6d/0xd9
 [<c010202b>] xen_evtchn_do_upcall+0x6e/0xa2
 [<c0106781>] xen_hypervisor_callback+0x25/0x2c
 [<c010116c>] xen_restore_fl+0x27/0x29
 [<c0330f63>] _spin_unlock_irqrestore+0x4a/0x50
 [<c0117aab>] change_page_attr+0x577/0x584
 [<c0117b45>] kernel_map_pages+0x8d/0xb4
 [<c016a314>] cache_alloc_refill+0x53f/0x632
 [<c016a6c2>] __kmalloc+0xc1/0x10d
 [<c0463d34>] malloc+0x10/0x12
 [<c04641c1>] huft_build+0x2a7/0x5fa
 [<c04645a5>] inflate_fixed+0x91/0x136
 [<c04657e2>] unpack_to_rootfs+0x5f2/0x8c1
 [<c0465acf>] populate_rootfs+0x1e/0xe4

(This was under Xen, but there's no reason it couldn't happen on bare
  hardware.)

This patch mallocs the local variables, thereby reducing the stack
usage to sane levels.

Also, up the heap size for the kernel decompressor to deal with the
extra allocation.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Tim Yamin <plasmaroo@gentoo.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 4cdd9c8931 [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear
In shadow mode hypervisors, ptep_get_and_clear achieves the desired
purpose of keeping the shadows in sync by issuing a native_get_and_clear,
followed by a call to pte_update, which indicates the PTE has been
modified.

Direct mode hypervisors (Xen) have no need for this anyway, and will trap
the update using writable pagetables.

This means no hypervisor makes use of ptep_get_and_clear; there is no
reason to have it in the paravirt-ops structure.  Change confusing
terminology about raw vs. native functions into consistent use of
native_pte_xxx for operations which do not invoke paravirt-ops.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 7b2f27f4e1 [PATCH] i386: PARAVIRT: flush lazy mmu updates on kunmap_atomic
kunmap_atomic should flush any pending lazy mmu updates, mainly to be
consistent with kmap_atomic, and to preserve its normal behaviour.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge ce6234b529 [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages
Xen and VMI both have special requirements when mapping a highmem pte
page into the kernel address space.  These can be dealt with by adding
a new kmap_atomic_pte() function for mapping highptes, and hooking it
into the paravirt_ops infrastructure.

Xen specifically wants to map the pte page RO, so this patch exposes a
helper function, kmap_atomic_prot, which maps the page with the
specified page protections.

This also adds a kmap_flush_unused() function to clear out the cached
kmap mappings.  Xen needs this to clear out any potential stray RW
mappings of pages which will become part of a pagetable.

[ Zach - vmi.c will need some attention after this patch.  It wasn't
  immediately obvious to me what needs to be done. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge a27fe809b8 [PATCH] i386: PARAVIRT: revert map_pt_hook.
Back out the map_pt_hook to clear the way for kmap_atomic_pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge d4c104771a [PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op
This patch adds a pv_op for flush_tlb_others.  Linux running on native
hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
have a particular mm's pagetable entries cached in its TLB.  This is
inefficient in a paravirtualized environment, since the hypervisor
knows which real CPUs actually contain cached mappings, which may be a
small subset of a guest's VCPUs.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 63f70270cc [PATCH] i386: PARAVIRT: add common patching machinery
Implement the actual patching machinery.  paravirt_patch_default()
contains the logic to automatically patch a callsite based on a few
simple rules:

 - if the paravirt_op function is paravirt_nop, then patch nops
 - if the paravirt_op function is a jmp target, then jmp to it
 - if the paravirt_op function is callable and doesn't clobber too much
    for the callsite, call it directly

paravirt_patch_default is suitable as a default implementation of
paravirt_ops.patch, will remove most of the expensive indirect calls
in favour of either a direct call or a pile of nops.

Backends may implement their own patcher, however.  There are several
helper functions to help with this:

paravirt_patch_nop	nop out a callsite
paravirt_patch_ignore	leave the callsite as-is
paravirt_patch_call	patch a call if the caller and callee
			have compatible clobbers
paravirt_patch_jmp	patch in a jmp
paravirt_patch_insns	patch some literal instructions over
			the callsite, if they fit

This patch also implements more direct patches for the native case, so
that when running on native hardware many common operations are
implemented inline.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 42c24fa22e [PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register
Fix a few clobbers to include the return register.  The clobbers set
is the set of all registers modified (or may be modified) by the code
snippet, regardless of whether it was deliberate or accidental.

Also, make sure that callsites which are used in contexts which don't
allow clobbers actually save and restore all clobberable registers.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge d582203578 [PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure
Use patch type identifiers derived from the offset of the operation in
the paravirt_ops structure.  This avoids having to maintain a separate
enum for patch site types.

Also, since the identifier is derived from the offset into
paravirt_ops, the offset can be derived from the identifier.  This is
used to remove replicated information in the various callsite macros,
which has been a source of bugs in the past.

This patch also drops the fused save_fl+cli operation, which doesn't
really add much and makes things more complex - specifically because
it breaks the 1:1 relationship between identifiers and offsets.  If
this operation turns out to be particularly beneficial, then the right
answer is to define a new entrypoint for it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 98de032b68 [PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity
Rename struct paravirt_patch to paravirt_patch_site, so that it
clearly refers to a callsite, and not the patch which may be applied
to that callsite.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge d6dd61c831 [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction
Add hooks to allow a paravirt implementation to track the lifetime of
an mm.  Paravirtualization requires three hooks, but only two are
needed in common code.  They are:

arch_dup_mmap, which is called when a new mmap is created at fork

arch_exit_mmap, which is called when the last process reference to an
  mm is dropped, which typically happens on exit and exec.

The third hook is activate_mm, which is called from the arch-specific
activate_mm() macro/function, and so doesn't need stub versions for
other architectures.  It's called when an mm is first used.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: linux-arch@vger.kernel.org
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 5311ab62cd [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing
Normally when running in PAE mode, the 4th PMD maps the kernel address space,
which can be shared among all processes (since they all need the same kernel
mappings).

Xen, however, does not allow guests to have the kernel pmd shared between page
tables, so parameterize pgtable.c to allow both modes of operation.

There are several side-effects of this.  One is that vmalloc will update the
kernel address space mappings, and those updates need to be propagated into
all processes if the kernel mappings are not intrinsically shared.  In the
non-PAE case, this is done by maintaining a pgd_list of all processes; this
list is used when all process pagetables must be updated.  pgd_list is
threaded via otherwise unused entries in the page structure for the pgd, which
means that the pgd must be page-sized for this to work.

Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
pgd to page aligned anyway, so this patch forces the pgd to be page
aligned+sized when the kernel pmd is unshared, to accomodate both these
requirements.

Also, since there may be several distinct kernel pmds (if the user/kernel
split is below 3G), there's no point in allocating them from a slab cache;
they're just allocated with get_free_page and initialized appropriately.  (Of
course the could be cached if there is just a single kernel pmd - which is the
default with a 3G user/kernel split - but it doesn't seem worthwhile to add
yet another case into this code).

[ Many thanks to wli for review comments. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge b239fb2501 [PATCH] i386: PARAVIRT: Hooks to set up initial pagetable
This patch introduces paravirt_ops hooks to control how the kernel's
initial pagetable is set up.

In the case of a native boot, the very early bootstrap code creates a
simple non-PAE pagetable to map the kernel and physical memory.  When
the VM subsystem is initialized, it creates a proper pagetable which
respects the PAE mode, large pages, etc.

When booting under a hypervisor, there are many possibilities for what
paging environment the hypervisor establishes for the guest kernel, so
the constructon of the kernel's pagetable depends on the hypervisor.

In the case of Xen, the hypervisor boots the kernel with a fully
constructed pagetable, which is already using PAE if necessary.  Also,
Xen requires particular care when constructing pagetables to make sure
all pagetables are always mapped read-only.

In order to make this easier, kernel's initial pagetable construction
has been changed to only allocate and initialize a pagetable page if
there's no page already present in the pagetable.  This allows the Xen
paravirt backend to make a copy of the hypervisor-provided pagetable,
allowing the kernel to establish any more mappings it needs while
keeping the existing ones.

A slightly subtle point which is worth highlighting here is that Xen
requires all kernel mappings to share the same pte_t pages between all
pagetables, so that updating a kernel page's mapping in one pagetable
is reflected in all other pagetables.  This makes it possible to
allocate a page and attach it to a pagetable without having to
explicitly enumerate that page's mapping in all pagetables.

And:

+From: "Eric W. Biederman" <ebiederm@xmission.com>

If we don't set the leaf page table entries it is quite possible that
will inherit and incorrect page table entry from the initial boot
page table setup in head.S.  So we need to redo the effort here,
so we pick up PSE, PGE and the like.

Hypervisors like Xen require that their page tables be read-only,
which is slightly incompatible with our low identity mappings, however
I discussed this with Jeremy he has modified the Xen early set_pte
function to avoid problems in this area.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge 3dc494e86d [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries
Add a set of accessors to pack, unpack and modify page table entries
(at all levels).  This allows a paravirt implementation to control the
contents of pgd/pmd/pte entries.  For example, Xen uses this to
convert the (pseudo-)physical address into a machine address when
populating a pagetable entry, and converting back to pphys address
when an entry is read.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge 4587623360 [PATCH] i386: PARAVIRT: use paravirt_nop to consistently mark no-op operations
Add a _paravirt_nop function for use as a stub for no-op operations,
and paravirt_nop #defined void * version to make using it easier
(since all its uses are as a void *).

This is useful to allow the patcher to automatically identify noop
operations so it can simply nop out the callsite.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
[mingo] but only as a cleanup of the current open-coded (void *) casts.
My problem with this is that it loses the types. Not that there is much
to check for, but still, this adds some assumptions about how function
calls look like
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge 7f63c41c6c [PATCH] i386: PARAVIRT: Remove CONFIG_DEBUG_PARAVIRT
Remove CONFIG_DEBUG_PARAVIRT.  When inlining code, this option
attempts to trash registers in the patch-site's "clobber" field, on
the grounds that this should find bugs with incorrect clobbers.
Unfortunately, the clobber field really means "registers modified by
this patch site", which includes return values.

Because of this, this option has outlived its usefulness, so remove
it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02 19:27:13 +02:00
Rusty Russell a75c54f933 [PATCH] i386: i386 separate hardware-defined TSS from Linux additions
On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote:
> Please clean it up properly with two structs.

Not sure about this, now I've done it.  Running it here.

If you like it, I can do x86-64 as well.

==
lguest defines its own TSS struct because the "struct tss_struct"
contains linux-specific additions.  Andi asked me to split the struct
in processor.h.

Unfortunately it makes usage a little awkward.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge b7fb4af06c [PATCH] i386: Allow boot-time disable of SMP altinstructions
Add "noreplace-smp" to disable SMP instruction replacement.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge d0175ab644 [PATCH] i386: Remove smp_alt_instructions
The .smp_altinstructions section and its corresponding symbols are
completely unused, so remove them.

Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge b6e3590f81 [PATCH] x86: Allow percpu variables to be page-aligned
Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
Ingo suggested KVM as well).

Because larger alignments can use more room, we increase the max per-cpu
memory to 64k rather than 32k: it's getting a little tight.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:12 +02:00
Andi Kleen de90c5ce83 [PATCH] i386: Enable bank 0 on non K7 Athlon
As a bug workaround bank 0 on K7s is normally disabled, but no need
to do that on other AMD CPUs.

Cc: davej@redhat.com

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge d479d2cc08 [PATCH] i386: Update smp_call_function* comments
Update documentation for i386 smp_call_function* functions.

As reported by Randy Dunlap <rdunlap@xenotime.net>

[ I've posted this before but it seems to have been lost along the way. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Randy Dunlap <rdunlap@xenotime.net>
2007-05-02 19:27:12 +02:00
Jan Engelhardt 7946331856 [PATCH] i386: Use menuconfig objects - APM
(I hope Andi is the right one to Cc, otherwise please add, thanks!)

Use menuconfigs instead of menus, so the whole menu can be disabled at
once instead of going through all options.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Andi Kleen f039b75471 [PATCH] x86: Don't use MWAIT on AMD Family 10
It doesn't put the CPU into deeper sleep states, so it's better to use the standard
idle loop to save power. But allow to reenable it anyways for benchmarking.

I also removed the obsolete idle=halt on i386

Cc: andreas.herrmann@amd.com

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge 1dbf527c51 [PATCH] i386: Make COMPAT_VDSO runtime selectable.
Now that relocation of the VDSO for COMPAT_VDSO users is done at
runtime rather than compile time, it is possible to enable/disable
compat mode at runtime.

This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the
kernel command line, or via sysctl.  (Switching on a running system
shouldn't be done lightly; any process which was relying on the compat
VDSO will be upset if it goes away.)

The COMPAT_VDSO config option still exists, but if enabled it just
makes vdso_enabled default to VDSO_COMPAT.

+From: Hugh Dickins <hugh@veritas.com>

Fix oops from i386-make-compat_vdso-runtime-selectable.patch.

Even mingetty at system startup finds it easy to trigger an oops
while reading /proc/PID/maps: though it has a good hold on the mm
itself, that cannot stop exit_mm() from resetting tsk->mm to NULL.

(It is usually show_map()'s call to get_gate_vma() which oopses,
and I expect we could change that to check priv->tail_vma instead;
but no matter, even m_start()'s call just after get_task_mm() is racy.)

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge d4f7a2c18e [PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO
Some versions of libc can't deal with a VDSO which doesn't have its
ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
a specific system-wide fixed address.  Previously this was all done at
build time, on the grounds that the fixed VDSO address is always at
the top of the address space.  However, a hypervisor may reserve some
of that address space, pushing the fixmap address down.

This patch does the adjustment dynamically at runtime, depending on
the runtime location of the VDSO fixmap.

[ Patch has been through several hands: Jan Beulich wrote the orignal
  version; Zach reworked it, and Jeremy converted it to relocate phdrs
  as well as sections. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge a6c4e076ee [PATCH] i386: clean up identify_cpu
identify_cpu() is used to identify both the boot CPU and secondary
CPUs, but it performs some actions which only apply to the boot CPU.
Those functions are therefore really __init functions, but because
they're called by identify_cpu(), they must be marked __cpuinit.

This patch splits identify_cpu() into identify_boot_cpu() and
identify_secondary_cpu(), and calls the appropriate init functions
from each.  Also, identify_boot_cpu() and all the functions it
dominates are marked __init.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge 1353ebb4b4 [PATCH] i386: Clean up asm-i386/bugs.h
Most of asm-i386/bugs.h is code which should be in a C file, so put it there.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-02 19:27:12 +02:00
Jan Beulich b92e9fac40 [PATCH] x86: fix amd64-agp aperture validation
Under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid() implies all
subsequent pfn-s are also invalid is wrong. Thus replace this by
explicitly checking against the E820 map.

AK: make e820 on x86-64 not initdata

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
2007-05-02 19:27:11 +02:00
Andi Kleen bbba11c35b [PATCH] i386: Remove unneeded externs in nmi.c
All were already in some header
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge 07f3331c6b [PATCH] i386: Add machine_ops interface to abstract halting and rebooting
machine_ops is an interface for the machine_* functions defined in
<linux/reboot.h>.  This is intended to allow hypervisors to intercept
the reboot process, but it could be used to implement other x86
subarchtecture reboots.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge 01a2f43556 [PATCH] i386: Add smp_ops interface
Add a smp_ops interface.  This abstracts the API defined by
<linux/smp.h> for use within arch/i386.  The primary intent is that it
be used by a paravirtualizing hypervisor to implement SMP, but it
could also be used by non-APIC-using sub-architectures.

This is related to CONFIG_PARAVIRT, but is implemented unconditionally
since it is simpler that way and not a highly performance-sensitive
interface.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-02 19:27:11 +02:00
Rusty Russell 4fbb596881 [PATCH] i386: cleanup GDT Access
Now we have an explicit per-cpu GDT variable, we don't need to keep the
descriptors around to use them to find the GDT: expose cpu_gdt directly.

We could go further and make load_gdt() pack the descriptor for us, or even
assume it means "load the current cpu's GDT" which is what it always does.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:11 +02:00
Adrian Bunk bd8559c38e [PATCH] x86: remove UNEXPECTED_IO_APIC()
Many years ago, UNEXPECTED_IO_APIC() contained printk()'s (but nothing more).

Now that it's completely empty for years, we can as well remove it.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:11 +02:00
Adrian Bunk ca906e4231 [PATCH] x86: sys_ioperm() prototype cleanup
- there's no reason for duplicating the prototype from
  include/linux/syscalls.h in include/asm-x86_64/unistd.h
- every file should #include the headers containing the prototypes for
  it's global functions

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Parag Warudkar 05f36927ed [PATCH] i386: remove the APM_RTC_IS_GMT config option.
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Parag Warudkar 78eea47ac3 [PATCH] i386: get rid of unused variables
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Jan Beulich 6fb14755a6 [PATCH] x86: tighten kernel image page access rights
On x86-64, kernel memory freed after init can be entirely unmapped instead
of just getting 'poisoned' by overwriting with a debug pattern.

On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
can also be write-protected.

Compared to the first version, this one prevents re-creating deleted
mappings in the kernel image range on x86-64, if those got removed
previously. This, together with the original changes, prevents temporarily
having inconsistent mappings when cacheability attributes are being
changed on such pages (e.g. from AGP code). While on i386 such duplicate
mappings don't exist, the same change is done there, too, both for
consistency and because checking pte_present() before using various other
pte_XXX functions is a requirement anyway. At once, i386 code gets
adjusted to use pte_huge() instead of open coding this.

AK: split out cpa() changes

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Jan Beulich d01ad8dd56 [PATCH] x86: Improve handling of kernel mappings in change_page_attr
Fix various broken corner cases in i386 and x86-64 change_page_attr.

AK: split off from tighten kernel image access rights

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Rusty Russell 90a0a06aa8 [PATCH] i386: rationalize paravirt wrappers
paravirt.c used to implement native versions of all low-level
functions.  Far cleaner is to have the native versions exposed in the
headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply
#define XXX native_XXX.

There are several nice side effects:

1) write_dt_entry() now takes the correct "struct Xgt_desc_struct *"
   not "void *".

2) load_TLS is reintroduced to the for loop, not manually unrolled
   with a #error in case the bounds ever change.

3) Macros become inlines, with type checking.

4) Access to the native versions is trivial for KVM, lguest, Xen and
   others who might want it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Sebastien Dugue 52de74dd39 [PATCH] i386: Rename boot_gdt_table to boot_gdt
Rename boot_gdt_table to boot_gdt to avoid the duplicate T(able).

Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Rusty Russell d2cbcc49e2 [PATCH] i386: clean up cpu_init()
We now have cpu_init() and secondary_cpu_init() doing nothing but calling
_cpu_init() with the same arguments.  Rename _cpu_init() to cpu_init() and use
it as a replcement for secondary_cpu_init().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Rusty Russell bf50467204 [PATCH] i386: Use per-cpu GDT immediately upon boot
Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
per-cpu GDT.  This means initializing the cpu_gdt array in C.

The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
switches to the per-cpu copy just allocated.  For secondary CPUs, the
early_gdt_descr is set to point directly to their per-cpu copy.

For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
but we never have to move.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Rusty Russell ae1ee11be7 [PATCH] i386: Use per-cpu variables for GDT, PDA
Allocating PDA and GDT at boot is a pain.  Using simple per-cpu variables adds
happiness (although we need the GDT page-aligned for Xen, which we do in a
followup patch).

[akpm@linux-foundation.org: build fix]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Bernhard Walle 8f9aeca7a0 [PATCH] x86: add command line length to boot protocol
Because the command line is increased to 2048 characters after 2.6.21, it's
not possible for boot loaders and userspace tools to determine the length
of the command line the kernel can understand.  The benefit of knowing the
length is that users can be warned if the command line size is too long
which prevents surprise if things don't work after bootup.

This patch updates the boot protocol to contain a field called
"cmdline_size" that contain the length of the command line (excluding the
terminating zero).

The patch also adds missing fields (of protocol version 2.05) to the x86_64
setup code.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alon Bar-Lev <alon.barlev@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Andrew Morton 1b523fb549 [PATCH] i386: VDSO_PRELINK warning fix
The lguest patches somehow managed to trigger this:

In file included from arch/i386/lguest/lguest.c:38:
include/asm/asm-offsets.h:67:1: warning: "VDSO_PRELINK" redefined
In file included from include/linux/elf.h:7,
                 from include/linux/module.h:15,
                 from include/linux/device.h:21,
                 from include/linux/interrupt.h:15,
                 from arch/i386/lguest/lguest.c:27:
include/asm/elf.h:140:1: warning: this is the location of the previous definition

I assume that using the same identifier twice was a bad idea..

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:09 +02:00
Joerg Roedel d824395c59 [PATCH] x86: remove constant_tsc reporting from /proc/cpuinfo' power flags
remove the reporting of the constant_tsc flag from the "power management"
field in /proc/cpuinfo.  The NULL value there was replaced by "" because
the former would result in a printout of [8] if the flag is set.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:09 +02:00
Ahmed S. Darwish 8280c0c58e [PATCH] i386: fix GDT's number of quadwords in comment
Fix comments to represent the true number of quadwords in GDT.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:09 +02:00
Adrian Bunk 8eb68faed9 [PATCH] i386: vmi_pmd_clear() static
This patch makes the needlessly global vmi_pmd_clear() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:09 +02:00
john stultz 5a90cf205c [PATCH] x86: Log reason why TSC was marked unstable
Change mark_tsc_unstable() so it takes a string argument, which holds the
reason the TSC was marked unstable.

This is then displayed the first time mark_tsc_unstable is called.

This should help us better debug why the TSC was marked unstable on certain
systems and allow us to make sure we're not being overly paranoid when
throwing out this troublesome clocksource.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:08 +02:00
Adrian Bunk 2714221985 [PATCH] i386: workaround for a -Wmissing-prototypes warning
Work around a warning with -Wmissing-prototypes in
arch/i386/kernel/asm-offsets.c

The warning isn't gcc's fault - asm-offsets.c is simply a special file.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:08 +02:00
Ken Chen e48b30c189 [PATCH] i386: type cast clean up for find_next_zero_bit
clean up unneeded type cast by properly declare data type.

Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:08 +02:00
Adrian Bunk 30a1528d3b [PATCH] i386: make struct vmi_ops static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:08 +02:00
Vivek Goyal 1833d6bc72 [PATCH] i386: modpost apic related warning fixes
o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y

WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check'
WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present'

o Some functions which are inline (acpi_madt_oem_check) are not inlined by
  compiler as these functions are accessed using function pointer. These
  functions are put in .text section and they in-turn access __init type
  functions hence modpost generates warnings.

o Do not iniline acpi_madt_oem_check, instead make it __init.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:08 +02:00
Vivek Goyal 0dbf7028c0 [PATCH] x86: __pa and __pa_symbol address space separation
Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.

__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces.  This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.

Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one.  A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.

swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd.  The
physical address changed.

Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization.  So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.

As this is technically a semantic change we need to be on the lookout
for anything I missed.  But it works for me (tm).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Vivek Goyal 49c3df6aaa [PATCH] x86: Move swsusp __pa() dependent code to arch portion
o __pa() should be used only on kernel linearly mapped virtual addresses
  and not on kernel text and data addresses.

o Hibernation code needs to determine the physical address associated
  with kernel symbol to mark a section boundary which contains pages which
  don't have to be saved and restored during hibernate/resume operation.

o Move this piece of code in arch dependent section. So that architectures
  which don't have kernel text/data mapped into kernel linearly mapped
  region can come up with their own ways of determining physical addresses
  associated with a kernel text.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Jeremy Fitzhardinge 19d1743315 [PATCH] i386: Simplify smp_call_function*() by using common implementation
smp_call_function and smp_call_function_single are almost complete
duplicates of the same logic.  This patch combines them by
implementing them in terms of the more general
smp_call_function_mask().

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:06 +02:00
Lasse Collin b0b1ff653a [PATCH] i386: Fix usage of -mtune when X86_GENERIC=y or CONFIG_MCORE2=y
Hi!

I sent this simple patch to lkml about two weeks ago and also cc'ed
to Linus, but seems that the patch got ignored. I decided to write to
you, because you have modified the relevant file most recently.

Below is a copy of the mail that is also available at
<http://lkml.org/lkml/2007/2/28/230>.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Jeremy Fitzhardinge 973efae21b [PATCH] i386: clean up mach_reboot_fixups
The reboot_fixups stuff seems to be a bit of a mess, specifically the
header is in linux/ when its a purely i386-specific piece of code.  I'm
not sure why it has its config option; its only currently needed for
"geode-gx1/cs5530a", so perhaps whatever config option controls that
hardware should enable this?

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Andi Kleen c8fdd24725 [PATCH] x86: Drop cc-options call for all options supported in gcc 3.2+
The kernel only supports gcc 3.2+ now so it doesn't make sense
anymore to explicitely check for options this compiler version
already has.

This actually fixes a bug. The -mprefered-stack-boundary check
never worked because gcc rightly complains

  CC      arch/i386/kernel/asm-offsets.s
cc1: -mpreferred-stack-boundary=2 is not between 4 and 12

We just never saw the error because of cc-options.
I changed it to 4 to actually work.

Tested by compiling i386 and x86-64 defconfig with gcc 3.2.

Should speed up the build time a tiny bit and improve
stack usage on i386 slightly.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Andi Kleen 2a12652c03 [PATCH] i386: Support Oprofile for AMD Family 10 CPUs
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Prarit Bhargava 86c0baf123 [PATCH] i386: Change sysenter_setup to __cpuinit & improve __INIT, __INITDATA
Change sysenter_setup to __cpuinit.
Change __INIT & __INITDATA to be cpu hotplug aware.

Resolve MODPOST warnings similar to:

WARNING: vmlinux - Section mismatch: reference to .init.text:sysenter_setup from
 .text between 'identify_cpu' (at offset 0xc040a380) and 'detect_ht'

and

WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_end
from .text between 'sysenter_setup' (at offset 0xc041a269) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_int80_start from .text between 'sysenter_setup' (at offset
0xc041a26e) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_sysenter_end from .text between 'sysenter_setup' (at offset
0xc041a275) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_sysenter_start from .text between 'sysenter_setup' (at
offset 0xc041a27a) and 'enable_sep_cpu'

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Prarit Bhargava e319af1d87 [PATCH] i386: Add __init to probe_bigsmp
Add __init to probe_bigsmp.  All callers are __init and data being examined
is __initdata.

Resolves MODPOST warning similar to:

WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'probe_bigsmp' (at offset 0xc0401e56) and 'init_apic_ldr'

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Stephane Eranian bf8696ed6d [PATCH] i386: i386 make NMI use PERFCTR1 for architectural perfmon (take 2)
Hello,

This patch against 2.6.20-git14 makes the NMI watchdog use PERFSEL1/PERFCTR1
instead of PERFSEL0/PERFCTR0 on processors supporting Intel architectural
perfmon, such as Intel Core 2. Although all PMU events can work on
both counters, the Precise Event-Based Sampling (PEBS) requires that the
event be in PERFCTR0 to work correctly (see section 18.14.4.1 in the
IA32 SDM Vol 3b).

A similar patch for x86-64 is to follow.

Changelog:
        - make the i386 NMI watchdog use PERFSEL1/PERFCTR1 instead of PERFSEL0/PERFCTR0
          on processors supporting the Intel architectural perfmon (e.g. Core 2 Duo).
          This allows PEBS to work when the NMI watchdog is active.

signed-off-by: stephane eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Andi Kleen d189518342 [PATCH] x86: Fix i386 and x86_64 fault information pollution
a userspace fault or a kernelspace fault which will result in the
immediate death of the process.  They should not be filled in as a
result of a kernelspace fault which can be fixed up.

Otherwise, if the process is handling SIGSEGV and examining the fault
information, this can result in the kernel space fault trashing the
previously stored fault information if it arrives between the
userspace fault happening and the SIGSEGV being delivered to the process.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
--
 arch/i386/kernel/traps.c   |   24 ++++++++++++++++++------
 arch/x86_64/kernel/traps.c |   30 +++++++++++++++++++++++-------
 2 files changed, 41 insertions(+), 13 deletions(-)
2007-05-02 19:27:05 +02:00
Jan Beulich 00e065ea58 [PATCH] i386: Add dwarf2 annotations to *_user and checksum functions
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Rene Herman 0adad171c2 [PATCH] i386: probe_roms() cleanup
Remove the assumption that if the first page of a legacy ROM is mapped,
it'll all be mapped. This'll also stop people reading this code from
wondering if they're looking at a bug...

Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Martin Murray <murrayma@citi.umich.edu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:05 +02:00
Simon Arlott 0949be3509 [PATCH] i386: Add an option for the VIA C7 which sets appropriate L1 cache
The VIA C7 is a 686 (with TSC) that supports MMX, SSE and SSE2, it also has
a cache line length of 64 according to
http://www.digit-life.com/articles2/cpu/rmma-via-c7.html.  This patch sets
gcc to -march=686 and select s the correct cache shift.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:05 +02:00
takada f5e8861583 [PATCH] i386: pit_latch_buggy has no effect
Eliminated the arch/i386/kernel/timers in 2.6.18, use clocksoures instead.
pit_latch_buggy was referred in timers/timer_tsc.c, and currently removed.
Therefore nobody refer it.

Until 2.6.17, MediaGX's TSC works correctly.  after 2.6.18, warned "TSC
appears to be running slowly.  Marking it as unstable".  So marked unstable
TSC when CS55x0.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Jan Beulich 9215da3320 [PATCH] i386: mtrr range check correction
Whether a region is below 1Mb is determined by its start rather than
its end.

This hunk got erroneously dropped from a previous patch.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:05 +02:00
Jeremy Fitzhardinge f76c392380 [PATCH] i386: No need to use -traditional for processing asm in i386/kernel/
No need to use -traditional for processing asm in i386/kernel/

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Jan Beulich 9964cf7d77 [PATCH] x86: consolidate smp_send_stop()
Synchronize i386's smp_send_stop() with x86-64's in only try-locking
the call lock to prevent deadlocks when called from panic().
In both version, disable interrupts before clearing the CPU off the
online map to eliminate races with IRQ handlers inspecting this map.
Also in both versions, save/restore interrupts rather than disabling/
enabling them.
On x86-64, eliminate one function used here by folding it into its
single caller, convert to static, and rename for consistency with i386
(lkcd may like this).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Jan Beulich 28609f6e49 [PATCH] i386: adjustments to page table dump during oops (v4)
- make the page table contents printing PAE capable
- make sure the address stored in current->thread.cr2 is unmodified
  from what was read from CR2
- don't call oops_may_print() multiple times, when one time suffices
- print pte even in highpte case, as long as the pte page isn't in
  actually in high memory (which is specifically the case for all page
  tables covering kernel space)

(Changes to v3: Use sizeof()*2 rather than the suggested sizeof()*4 for
printing width, use fixed 16-nibble width for PAE, and also apply the
max_low_pfn range check to the middle level lookup on PAE.)

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Ingo Molnar 3c43f03908 [PATCH] x86: default to physical mode on hotplug CPU kernels
Default to physical mode on hotplug CPU kernels.  Furher simplify and clean up
the APIC initialization code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-05-02 19:27:04 +02:00
Andrew Morton a86f34b49f [PATCH] x86: revert x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525
Obsoleted by Ingo's genapic stuff.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Andrew Morton fb27145d6a [PATCH] i386: revert i386-fix-the-verify_quirk_intel_irqbalance
This is unneeded with Ingo's genapic rework.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Andi Kleen bdd0dc5210 [PATCH] i386: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
James Bottomley d6444514b8 [VOYAGER] add smp alternatives
It's about time voyager had them

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-01 10:13:46 -05:00
Eric W. Biederman 9d0e59a341 [VOYAGER] Use modern techniques to setup and teardown low identiy mappings.
This is a trivial and hopefully obviously correct patch to setup
and teardown the identity mappings the way the rest of arch/i386
does.

My new page table setup code will break some assumptions below so
this is my attempt to keep voyager working.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-01 10:11:32 -05:00
Christoph Hellwig f3402a4e52 [VOYAGER] Convert the monitor thread to use the kthread API
full kthread conversion on the voyager power switch handling thread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-01 10:09:29 -05:00
James Bottomley 9f483519be [VOYAGER] clockevents driver: bring voyager in to line
The irq0 timer interrupt should be initiallised identically with
mach-default.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-01 10:07:20 -05:00
James Bottomley 2feae2158a [VOYAGER] clockevents: correct boot cpu is zero assumption
This isn't true for voyager, so alter setup_pit_timer() to initialise
the cpumask from the current processor id (which should be the boot
processor) rather than defaulting to zero.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-01 10:06:42 -05:00
Mark Langsdorf 2e49762063 [CPUFREQ] Report the number of processors in PowerNow-k8 correctly
The PowerNow! driver for Opteron reports the number of cores
in the system, but claims to report the number of processors.
Fix this minor cosmetic bug.

Signed-off-by: Bhavana Nagendra <bhavana.nagendra@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-04-30 15:48:29 -04:00
David Rientjes b96e80e323 [CPUFREQ] do not declare undefined functions
fill_powernow_table_pstate() and fill_powernow_table_fidvid() are only
defined and used for X86_POWERNOW_K8_ACPI.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-04-30 15:42:48 -04:00
James Bottomley 0293ca814b [VOYAGER] add smp_call_function_single
This apparently has msr users now, so add it to the voyager HAL

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-04-30 11:24:05 -05:00
Linus Torvalds f73b0a08ea Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (86 commits)
  SPIN_LOCK_UNLOCKED cleanup in drivers/ata/pata_winbond.c
  drivers/ata/pata_cmd640.c: fix build with CONFIG_PM=n
  pata_hpt37x: Further small fixes
  pata_hpt3x2n: Add HPT371N support and other bits
  ata: printk warning fixes
  libata: separate ATA_EHI_DID_RESET into DID_SOFTRESET and DID_HARDRESET
  ahci: consolidate common port flags
  ata_timing: ensure t->cycle is always correct
  libata: add missing call to ->cable_detect() in new EH path
  pata_amd: remove contamination added during cable_detect conversion
  libata: Handle drives that require a spin-up command before first access
  libata: HPA support
  libata: kill probe_ent and related helpers
  libata: convert the remaining PATA drivers to new init model
  libata: convert the remaining SATA drivers to new init model
  libata: convert ata_pci_init_native_mode() users to new init model
  libata: convert drivers with combined SATA/PATA ports to new init model
  libata: add init helpers including ata_pci_prepare_native_host()
  libata: convert native PCI host handling to new init model
  libata: convert legacy PCI host handling to new init model
  ...
2007-04-29 10:48:21 -07:00
Jeff Garzik 8cdfb29c0c libata/IDE: remove combined mode quirk
Both old-IDE and libata should be able handle all controllers and
devices found using normal resource reservation methods.

This eliminates the awful, low-performing split-driver configuration
where old-IDE drove the PATA portion of a PCI device, in PIO-only mode,
and libata drove the SATA portion of the /same/ PCI device, in DMA mode.
Typically vendors would ship SATA hard drive / PATA optical
configuration, which would lend itself to slow (PIO-only) CD-ROM
performance.

For Intel users running in combined mode, it is now wholly dependent on
your driver choice (potentially link order, if you compile both drivers
in) whether old-IDE or libata will drive your hardware.

In either case, you will get full performance from both SATA and PATA
ports now, without having to pass a kernel command line parameter.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-04-28 14:15:59 -04:00
Rafal Bilski 07844252ff [CPUFREQ] Longhaul - Revert Longhaul ver. 2
There is something wrong with this code. It needs more
testing. It is better to disable it for now because support
for some machines will be broken.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-04-26 14:32:03 -04:00
Dave Jones e8e49190f6 Fix preemption warnings in speedstep-centrino.c
BUG: using smp_processor_id() in preemptible [00000001] code:
kondemand/0/2473
caller is centrino_target+0xfb/0x600
[<401e3646>] debug_smp_processor_id+0x9e/0xb0
[<40112afb>] centrino_target+0xfb/0x600
[<40112a00>] centrino_target+0x0/0x600
[<40305bd9>] __cpufreq_driver_target+0x5c/0x6b
[<f897a537>] do_dbs_timer+0x1bc/0x208 [cpufreq_ondemand]
[<40134a46>] run_workqueue+0x85/0x125
[<40374f7f>] _spin_lock_irqsave+0x18/0x66
[<f897a37b>] do_dbs_timer+0x0/0x208 [cpufreq_ondemand]
[<401353fb>] worker_thread+0xf9/0x124
[<401213b9>] default_wake_function+0x0/0xc
[<40135302>] worker_thread+0x0/0x124
[<40137b37>] kthread+0xb0/0xd9
[<40137a87>] kthread+0x0/0xd9
[<40104b2f>] kernel_thread_helper+0x7/0x10

Signed-off-by: Dave Jones <davej@redhat.com>
2007-04-26 14:32:02 -04:00
Rafał Bilski fb48e15645 [CPUFREQ] Longhaul - Correct PCI code
Replace obsolete pci_find_device with pci_get_device.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-04-26 14:32:02 -04:00
Alexey Dobriyan 551948bc44 [CPUFREQ] p4-clockmod: switch to rdmsr_on_cpu/wrmsr_on_cpu
Dances with cpumasks go away.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-04-26 14:32:02 -04:00
Zachary Amsden afd3810d9b ACPI: Remove a warning about unused variable in !CONFIG_ACPI compilation.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-04-25 15:32:23 -04:00
Thierry Vignaud b2983f10f8 ACPI: prevent ACPI quirk warning mass spamming in logs
The following patch prevent this warning to be displayed again & again (eg:
nine times on my NForce2 motherboard) and thus improve signal to noise
ratio in logs.

The ATI quirk below probably needs a similar "fix" but I don't have
the hardware to test.

Btw arch/x86_64/kernel/early-quirks.c::nvidia_bugs() would probably need to
be synced (but I don't have an x86_64 NVidia motherboard to boot test it).
Still it shows the usefullity of the recent x86 merge thread.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Thierry Vignaud <tvignaud@mandriva.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-04-25 15:31:30 -04:00
Andi Kleen 8689b517be [PATCH] i386: Fix some warnings added by earlier patch
Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-24 13:05:37 +02:00
Andi Kleen 9ce883becb [PATCH] x86: Remove noreplacement option
noreplacement is dangerous on modern systems because it will not replace the
context switch FNSAVE with SSE aware FXSAVE. But other places in the kernel still assume
SSE and do FXSAVE and the CPU will then access FXSAVE information with
FNSAVE and cause corruption.

Easiest way to avoid this is to remove the option. It was mostly for paranoia
reasons anyways and alternative()s have been stable for some time.

Thanks to Jeremy F. for reporting and helping debug it.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-24 13:05:37 +02:00
Dave Jones 7ab77e03c1 Longhaul - Revert ACPI C3 on Longhaul ver. 2
Support for Longhaul ver.  2 broke driver for VIA C3 Eden 600MHz with
Samuel 2 core.  Processor is not able to switch frequency anymore.  I
don't know much about this issue at the moment, but until (if ever) I
will know why, this part should be reversed.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Andi Kleen 1714f9bfc9 [PATCH] x86: Fix potential overflow in perfctr reservation
While reviewing this code again I found a potential overflow of the bitmap.
The p4 oprofile can theoretically set bits beyond the reservation bitmap for
specific configurations. Avoid that by sizing the bitmaps properly.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-16 10:30:27 +02:00
Andi Kleen 08269c6d38 [PATCH] x86: Fix gcc 4.2 _proxy_pda workaround
Due to an over aggressive optimizer gcc 4.2 cannot optimize away _proxy_pda
in all cases (counter intuitive, but true).  This breaks loading of some
modules.

The earlier workaround to just export a dummy symbol didn't work unfortunately
because the module code ignores exports with 0 value.

Make it 1 instead.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-16 10:30:27 +02:00
Zachary Amsden 0492c37137 Fix VMI relocation processing logic error
Fix logic error in VMI relocation processing.  NOPs would always cause
a BUG_ON to fire because the != RELOCATION_NONE in the first if clause
precluding the == VMI_RELOCATION_NOP in the second clause.  Make these
direct equality tests and just warn for unsupported relocation types
(which should never happen), falling back to native in that case.

Thanks to Anthony Liguori for noting this!

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:48:36 -07:00
Andrew Morton c2481cc4a8 [PATCH] i386: irqbalance_disable() section fix
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:irqbalance_disable from .text between 'quirk_intel_irqbalance' (at offset 0x80a5) and 'i8237A_suspend'

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-08 19:47:55 -07:00
Zachary Amsden 49f1971051 [PATCH] Proper fix for highmem kmap_atomic functions for VMI for 2.6.21
Since lazy MMU batching mode still allows interrupts to enter, it is
possible for interrupt handlers to try to use kmap_atomic, which fails when
lazy mode is active, since the PTE update to highmem will be delayed.  The
best workaround is to issue an explicit flush in kmap_atomic_functions
case; this is the only way nested PTE updates can happen in the interrupt
handler.

Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix.

This patch gets reverted again when we start 2.6.22 and the bug gets fixed
differently.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-08 19:47:55 -07:00
Linus Torvalds 9a5ee4cc9e Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
  [PATCH] x86: Don't probe for DDC on VBE1.2
  [PATCH] x86-64: Increase NMI watchdog probing timeout
  [PATCH] x86-64: Let oprofile reserve MSR on all CPUs
  [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
2007-04-02 11:41:55 -07:00
Thomas Gleixner b6a8b316c6 [PATCH] i386: fix file_read_actor() and pipe_read() for original i386 systems
The __copy_to_user_inatomic() calls in file_read_actor() and pipe_read()
are broken on original i386 machines, where WP-works-ok == false, as
__copy_to_user_inatomic() on such systems calls functions which might
sleep and/or contain cond_resched() calls inside of a kmap_atomic()
region.

The original check for WP-works-ok was in access_ok(), but got moved
during the 2.5 series to fix a race vs. swap.

Return the number of bytes to copy in the case where we are in an atomic
region, so the non atomic code pathes in file_read_actor() and
pipe_read() are taken.

This could be optimized to avoid the kmap_atomicby moving the check for
WP-works-ok into fault_in_pages_writeable(), but this is more intrusive
and can be done later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:07:25 -07:00
Rafael J. Wysocki 1d64b9cb1d [PATCH] Fix microcode-related suspend problem
Fix the regression resulting from the recent change of suspend code
ordering that causes systems based on Intel x86 CPUs using the microcode
driver to hang during the resume.

The problem occurs since the microcode driver uses request_firmware() in
its CPU hotplug notifier, which is called after tasks has been frozen and
hangs.  It can be fixed by telling the microcode driver to use the
microcode stored in memory during the resume instead of trying to load it
from disk.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Adrian Bunk <bunk@stusta.de>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Maxim <maximlevitsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:09 -07:00
Zwane Mwaikambo a369a7100d [PATCH] x86: Don't probe for DDC on VBE1.2
VBE1.2 doesn't support function 15h (DDC) resulting in a 'hang' whilst
uncompressing kernel with some video cards. Make sure we check VBE version
before fiddling around with DDC.

http://bugzilla.kernel.org/show_bug.cgi?id=1458

Opened: 2003-10-30 09:12 Last update: 2007-02-13 22:03

Much thanks to Tobias Hain for help in testing and investigating the bug.
Tested on;

i386, Chips & Technologies 65548 VESA VBE 1.2
CONFIG_VIDEO_SELECT=Y
CONFIG_FIRMWARE_EDID=Y

Untested on x86_64.

Signed-off-by: Zwane Mwaikambo <zwane@infradead.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-02 12:14:12 +02:00
Andi Kleen 0fb2ebfcb5 [PATCH] x86-64: Increase NMI watchdog probing timeout
A 4 core Opteron needs longer than 10 ticks for this.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-02 12:14:12 +02:00
Andi Kleen 89e07569e4 [PATCH] x86-64: Let oprofile reserve MSR on all CPUs
The MSR reservation is per CPU and oprofile would only allocate them
on the CPU it was initialized on. Change this to handle all CPUs.

This also fixes a warning about unprotected use of smp_processor_id()
in preemptible kernels.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-02 12:14:12 +02:00
Andi Kleen 3556ddfa92 [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
AMD dual core laptops with C1E do not run the APIC timer correctly
when they go idle. Previously the code assumed this only happened
on C2 or deeper.  But not all of these systems report support C2.

Use a AMD supplied snippet to detect C1E being enabled and then disable
local apic timer use.

This supercedes an earlier workaround using DMI detection of specific systems.

Thanks to Mark Langsdorf for the detection snippet.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-02 12:14:12 +02:00
Maxim Levitsky 399afa4fc9 [PATCH] Add suspend/resume for HPET
This adds support of suspend/resume on i386 for HPET, which fixes a
number of timer-related failures around STR.

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-29 10:25:32 -07:00
Eric W. Biederman bba6f6fc68 [PATCH] MSI-X: fix resume crash
So I think the right solution is to simply make pci_enable_device just
flip enable bits and move the rest of the work someplace else.

However a thorough cleanup is a little extreme for this point in the
release cycle, so I think a quick hack that makes the code not stomp the
irq when msi irq's are enabled should be the first fix.  Then we can
later make the code not change the irqs at all.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-28 13:59:37 -07:00
Thomas Gleixner c7f6d15ff2 [PATCH] i386: Fix bogus return value in hpet_next_event()
The clockevents / tick management code expects an error value, when the
event is already expired. hpet_next_event() returns 1 in that case.

Fix it to return the proper -ETIME error code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:08:08 -07:00
Matt Domsch f7a9dae7c4 pci: set pci=bfsort for PowerEdge R900
This patch automatically enables pci=bfsort for the Dell PowerEdge
R900.  This is necessary to ensure the onboard NICs enumerate in the
proper order, similar to the other systems already on the list.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-03-26 14:13:07 -07:00
Thomas Gleixner d9a5c0a4e0 [PATCH] i386: Prevent early access to TSC to avoid crash on TSCless systems
commit f9690982b8 removed the check for
cpu_khz from sched_clock(), which prevented early access to the TSC by
non obvious magic.

This is harmless as long as the CPU has a TSC. On TSCless systems this
results in an illegal instruction trap.

Replace tsc_disabled and tsc_unstable by tsc_enabled, which is only set
when the tsc is available and not unstable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-24 15:45:53 -07:00
Thomas Gleixner e585bef815 [PATCH] i386: add command line option "local_apic_timer_c2_ok"
It turned out that it is almost impossible to trust ACPI, BIOS & Co.
regarding the C states. This was the reason to switch the local apic
timer off in C2 state already. OTOH there are sane and well behaving
systems, which get punished by that decision.

Allow the user to confirm that the local apic timer is trustworthy in C2
state. This keeps the default behaviour on the safe side.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-23 10:21:02 -07:00
Ingo Molnar 4edc5db83f [PATCH] setup_boot_APIC_clock() irq-enable fix
latest -git triggers an irqtrace/lockdep warning of a leaked
irqs-off condition:

  BUG: at kernel/fork.c:1033 copy_process()

after some debugging it turns out that commit ca1b940c accidentally left
interrupts disabled - which trickled down all the way to the first time
we fork a kernel thread and triggered the warning.

the fix is to re-enable interrupts in the 'else' branch of
setup_boot_APIC_clock()'s pmtimers calibration path.

Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@brown.paperbag.linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-22 19:42:31 -07:00
Thomas Gleixner ad62ca2bd8 [PATCH] i386: disable local apic timer via command line or dmi quirk
The local APIC timer stops to work in deeper C-States.  This is handled by
the ACPI code and a broadcast mechanism in the clockevents / tick managment
code.

Some systems do not expose the deeper C-States to the kernel, but switch
into deeper C-States behind the kernels back.  This delays the local apic
timer interrupts for ever and makes the systems unusable.

Add a command line option to disable the local apic timer and a dmi
quirk for known broken systems.

Andi sayeth:

  While not wrong by itself i think it is still better to use some heuristic
  -- like "has battery in ACPI" With the DMI table if the problem is more wide
  spread we will just continue extending it.

  But anyways should be ok now for .21 although I'm not really happy with
  it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Grudgingly-acked-by: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-22 19:39:05 -07:00
Thomas Gleixner 6b3964cde7 [PATCH] i386: clockevents fix breakage on Geode/Cyrix PIT implementations
The PIT has no dedicated mode for shut down. The only way to disable PIT
is to put it into one shot mode. AMD implementations of PIT on Geode
(also observed on Cyrix) are confused by an "empty" transition from
CLOCK_EVT_MODE_UNUSED to CLOCK_EVT_MODE_SHUTDOWN, which puts the PIT
into one shot mode momentarily.

I realized after staring helpless at the bug report
http://bugzilla.kernel.org/show_bug.cgi?id=8027 for quite a while, that
the only change, which might influence the bogomips calibration, is the
above transition during the PIT initialization.

Avoiding the unnecessary switch to oneshot and later to periodic mode
fixes the weird bogomips value and also the resulting slowness.

The fix is confirmed on OLPC and another Geode based box.

Note: this is unrelated to the Dual Core problem discussed here:
http://lkml.org/lkml/2007/3/17/48

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-22 19:33:30 -07:00
Thomas Gleixner ca1b940ce6 [PATCH] i386: trust the PM-Timer calibration of the local APIC timer
When PM-Timer is available for local APIC timer calibration we can skip the
verification of the calibrated time value.  The resulting error is quite
small on a bunch of evaluated platforms and is less harming than the
observed false positives.

We need to keep the verification on systems, which have no PM-Timer to
avoid bogus local APIC timer calibrations in the range of factor 2-10,
which can be observed when swicthing off the PM-timer support in the kernel
configuration.

The wrong calibration values are probably caused by SMM code trying to
emulate a PS/2 keyboard from a (maybe connected or not) USB keyboard.  This
prohibits the accurate delivery of PIT interrupts, which are used to
calibrate the local APIC timer.  Unfortunately we have no way to disable
this BIOS misfeature in the early boot process.

Add also the dropped cpu_relax() back to the wait loops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-18 11:35:08 -07:00
Andi Kleen 92b35e910f [PATCH] x86: Export _proxy_pda for gcc 4.2
The symbol is not actually used, but the compiler unforunately generates
a (unused) reference to it. This can happen even in modules. So export it.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-03-16 21:07:36 +01:00
Guillaume Chazarain 28f36f8fbf [PATCH] i386: Don't use the TSC in sched_clock if unstable
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f9690982b8c2f9a2c65acdc113e758ec356676a3
caused a regression by letting sched_clock use the TSC even when cpufreq
disabled it. This caused scheduling weirdnesses.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-03-16 21:07:36 +01:00
Andi Kleen 302cf930cb [PATCH] i386: Enforce GPLness of VMI ROM
VMI ROMs are pretty intimate to the kernel, so enforce their GPLness.

No \0 tricks checking for now

This rules out BSD/MIT modules for now, sorry -- the trouble is those
could come without source.

Acked-by: Zachary Amsden <zach@vmware.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-03-16 21:07:36 +01:00
Andi Kleen 33a40bfd4f [PATCH] i386: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2007-03-16 21:07:36 +01:00
Linus Torvalds 8ce5e3e45e Disable NMI watchdog by default properly
This reverts commit 6ebf622b25 and
replaces it with one that actually works.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 17:53:43 -07:00