Commit Graph

13278 Commits

Author SHA1 Message Date
Sridhar Samudrala 827bf12236 [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 13:36:30 -07:00
Jamal Hadi Salim 5a6d34162f [XFRM] SPD info TLV aggregation
Aggregate the SPD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:39 -07:00
Jamal Hadi Salim af11e31609 [XFRM] SAD info TLV aggregationx
Aggregate the SAD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:13 -07:00
Jennifer Hunt 561e036006 [AF_IUCV]: Implementation of a skb backlog queue
With the inital implementation we missed to implement a skb backlog
queue . The result is that socket receive processing tossed packets.
Since AF_IUCV connections are working synchronously it leads to
connection hangs. Problems with read, close and select also occured.

Using a skb backlog queue is fixing all of these problems .

Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:22:07 -07:00
Martin Schwidefsky cf8ba7a955 [S390] add hardware capability support (ELF_HWCAP).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-04 18:48:35 +02:00
Cornelia Huck 52706ec903 [S390] cio: Deprecate read_dev_chars() and read_conf_data{,_lpm}().
These helper functions are a leftover from 2.4 sync I/O and are a
notorious source for bugs. They lead to device driver specific code
creeping into cio, and some issues can't really be fixed at all.

Device drivers can easily implement those functions themselves in a
more robust manner, so let's get rid of them.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-04 18:48:25 +02:00
Christoph Hellwig 33464e3b57 [S390] get rid of kprobes notifier call chain.
And here's a port of the powerpc patch to get rid of the notifier
chain completely to s390.  It's ontop of Martins patch as that one
is in mainline already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-04 18:48:24 +02:00
Eric Dumazet db3459d1a7 [IPV6]: Some cleanups in include/net/ipv6.h
1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits
   holes for 64bit arches. Shrinks by 8 bytes sizeof(struct
   ip6_flowlabel)

2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void *) casts :
   Compiler might take into account natural alignement of in6_addr
   structs to emit better code for memcpy()/memcmp() Casts to (void *)
   force byte accesses.

3) ipv6_addr_prefix() optimization :

Better to clear whole struct, as compiler can emit better code for
memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional
branches.

# size vmlinux.after vmlinux.before
   text    data     bss     dec     hex filename
5262262  647612  557432 6467306  62aeea vmlinux.after
5262550  647612  557432 6467594  62b00a vmlinux.before

thats 288 bytes saved.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 17:39:04 -07:00
Pavel Emelianov 7562f876cd [NET]: Rework dev_base via list_head (v3)
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 15:13:45 -07:00
Michael Chan 27a005b883 [BNX2]: Add support for 5709 Serdes.
Add PCI ID and code to support the 5709 Serdes PHY.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 13:23:41 -07:00
Michael Chan 427c2196b9 [ETHTOOL]: Add 2.5G bit definitions.
Add 2.5G supported and advertising bit definitions.  2.5G is supported
by the bnx2 driver.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 13:17:25 -07:00
Sascha Hauer ff4bfb2163 [ARM] 4328/1: Move i.MX UART regs to driver
This patch moves the i.MX UART register descriptions from
include/asm-arm/arch-imx/imx-regs.h to the serial driver itself.
This helps using the driver on other architectures like mx31

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 20:24:21 +01:00
Sascha Hauer fe7fdb80e9 [ARM] 4329/1: fix position of NETX_SYSTEM_REG
This patch fixes the position of the netx reset control register

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 20:22:49 +01:00
Russell King 5559bca8e6 [ARM] ecard: Convert card type enum to a flag
'type' in the struct expansion_card is only used to indicate
whether this card is an EASI card or not.  Therefore, having
it as an enum is wasteful (and introduces additional noise
when we come to remove the enum.)  Convert it to a mere flag
instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:16:56 +01:00
Russell King c0b04d1b2c [ARM] ecard: Move private ecard junk out of asm/ecard.h
Move ecard.c private junk from asm/ecard.h to a local header file.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:16:56 +01:00
Andrew Victor 7776a94c31 [ARM] 4352/1: AT91: Platform data for LCD and AC97.
Define resources, platform_device and device registration functions for
the LCD and AC97 controllers on the AT91SAM9263.
Also update the AT91SAM9261 to use the common atmel_lcdfb driver.

Signed-off-by: Nicolas Ferre <nicolas.ferre@rfo.atmel.com>
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:10:22 +01:00
Andrew Victor ce813b97e5 [ARM] 4350/1: AT91: Hardware header for ADC peripheral
Definitions for Analog-to-Digital Converter (ADC) found on the Atmel
AT91SAM9260 processor.

Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:10:20 +01:00
Dan Williams d2dd8b1fed [ARM] 4342/2: iop13xx: add resource definitions for the tpmi units
The tpmi units interface with the SAS controller on iop348.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:03:54 +01:00
Dan Williams e90ddd813d [ARM] 4348/4: iop3xx: Give Linux control over PCI initialization
Currently the iop3xx platform support code assumes that RedBoot is the
bootloader and has already initialized the ATU.  Linux should handle this
initialization for three reasons:

1/ The memory map that RedBoot sets up is not optimal (page_to_dma and
virt_to_phys return different addresses).  The effect of this is that using
the dma mapping API for the internal bus dma units generates pci bus
addresses that are incorrect for the internal bus.

2/ Not all iop platforms use RedBoot

3/ If the ATU is already initialized it indicates that the iop is an add-in
card in another host, it does not own the PCI bus, and should not be
re-initialized.

Changelog:
* rather than change nr_controllers to zero, simply do not call
  pci_common_init

Cc: Lennert Buytenhek <kernel@wantstofly.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:02:48 +01:00
Patrick McHardy fc38582db9 [NETFILTER]: bridge netfilter: consolidate header pushing/pulling code
Consolidate the common push/pull sequences into a few helper functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:36:16 -07:00
Jorge Boncompte c2a1910b06 [NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.

The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.

Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:34:42 -07:00
Ilpo Järvinen 0ec96822d5 [TCP]: Use S+L catcher only with SACK for now
TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:30:34 -07:00
Patrick McHardy 4e9cac2ba4 [NET]: Add __dev_getfirstbyhwtype
Add __dev_getfirstbyhwtype for callers that don't want a reference but
some data from the device and thus need to take the rtnl anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:28:13 -07:00
Randy Dunlap be52178b9f [NET] skbuff: fix kernel-doc
Fix skbuff.h kernel-doc:
linux-2.6.21-git4//include/linux/skbuff.h:316): No description found for parameter 'transport_header'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:16:20 -07:00
David Howells ef4533f8af [AFS]: Make the match_*() functions take const options.
Make the match_*() functions take a const pointer to the options table
and make strings pointers in the options table const too.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:10:39 -07:00
Eric Dumazet 709525fad8 [IPV6]: Get rid of __HAVE_ARCH_ADDR_SET.
__HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:08:43 -07:00
Avi Kivity 2ff81f70b5 KVM: Remove unused 'instruction_length'
As we no longer emulate in userspace, this is meaningless.  We don't
compute it on SVM anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity 02c8320972 KVM: Don't require explicit indication of completion of mmio or pio
It is illegal not to return from a pio or mmio request without completing
it, as mmio or pio is an atomic operation.  Therefore, we can simplify
the userspace interface by avoiding the completion indication.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity b8836737d9 KVM: Add fpu get/set operations
These are really helpful when migrating an floating point app to another
machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity e8207547d2 KVM: Add physical memory aliasing feature
With this, we can specify that accesses to one physical memory range will
be remapped to another.  This is useful for the vga window at 0xa0000 which
is used as a movable window into the (much larger) framebuffer.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity 039576c03c KVM: Avoid guest virtual addresses in string pio userspace interface
The current string pio interface communicates using guest virtual addresses,
relying on userspace to translate addresses and to check permissions.  This
interface cannot fully support guest smp, as the check needs to take into
account two pages at one in case an unaligned string transfer straddles a
page boundary.

Change the interface not to communicate guest addresses at all; instead use
a buffer page (mmaped by userspace) and do transfers there.  The kernel
manages the virtual to physical translation and can perform the checks
atomically by taking the appropriate locks.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity 07c45a366d KVM: Allow kernel to select size of mmap() buffer
This allows us to store offsets in the kernel/user kvm_run area, and be
sure that userspace has them mapped.  As offsets can be outside the
kvm_run struct, userspace has no way of knowing how much to mmap.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 1961d276c8 KVM: Add guest mode signal mask
Allow a special signal mask to be used while executing in guest mode.  This
allows signals to be used to interrupt a vcpu without requiring signal
delivery to a userspace handler, which is quite expensive.  Userspace still
receives -EINTR and can get the signal via sigwait().

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 1b19f3e61d KVM: Add a special exit reason when exiting due to an interrupt
This is redundant, as we also return -EINTR from the ioctl, but it
allows us to examine the exit_reason field on resume without seeing
old data.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 8eb7d334bd KVM: Fold kvm_run::exit_type into kvm_run::exit_reason
Currently, userspace is told about the nature of the last exit from the
guest using two fields, exit_type and exit_reason, where exit_type has
just two enumerations (and no need for more).  So fold exit_type into
exit_reason, reducing the complexity of determining what really happened.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity b4e63f560b KVM: Allow userspace to process hypercalls which have no kernel handler
This is useful for paravirtualized graphics devices, for example.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 5d308f4550 KVM: Add method to check for backwards-compatible API extensions
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 739872c56f KVM: Renumber ioctls
The recent changes have left the ioctl numbers in complete disarray.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 2a4dac3952 KVM: Remove minor wart from KVM_CREATE_VCPU ioctl
That ioctl does not transfer any data, so it should be an _IO rather than an
_IOW.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 106b552b43 KVM: Remove the 'emulated' field from the userspace interface
We no longer emulate single instructions in userspace.  Instead, we service
mmio or pio requests.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 06465c5a3a KVM: Handle cpuid in the kernel instead of punting to userspace
KVM used to handle cpuid by letting userspace decide what values to
return to the guest.  We now handle cpuid completely in the kernel.  We
still let userspace decide which values the guest will see by having
userspace set up the value table beforehand (this is necessary to allow
management software to set the cpu features to the least common denominator,
so that live migration can work).

The motivation for the change is that kvm kernel code can be impacted by
cpuid features, for example the x86 emulator.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 46fc147788 KVM: Do not communicate to userspace through cpu registers during PIO
Currently when passing the a PIO emulation request to userspace, we
rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx
(on string instructions).  This (a) requires two extra ioctls for getting
and setting the registers and (b) is unfriendly to non-x86 archs, when
they get kvm ports.

So fix by doing the register fixups in the kernel and passing to userspace
only an abstract description of the PIO to be done.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 9a2bb7f486 KVM: Use a shared page for kernel/user communication when runing a vcpu
Instead of passing a 'struct kvm_run' back and forth between the kernel and
userspace, allocate a page and allow the user to mmap() it.  This reduces
needless copying and makes the interface expandable by providing lots of
free space.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity ff42697436 KVM: Export <linux/kvm.h>
This allows users to actually build prgrams that use kvm without
the entire source tree.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:22 +03:00
Avi Kivity bbe4432e66 KVM: Use own minor number
Use the minor number (232) allocated to kvm by lanana.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:22 +03:00
Adrian Bunk ecf36501bc PCI: the overdue removal of pci_module_init()
Unless we finally completely remove it, people will always add new users.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Adrian Bunk 5adc55da4a PCI: remove the broken PCI_MULTITHREAD_PROBE option
This patch removes the PCI_MULTITHREAD_PROBE option that had already 
been marked as broken.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 032de8e2fe MSI: Give archs the option to free all MSI/Xs at once.
This patch introduces an optional function, arch_teardown_msi_irqs(),
which gives an arch the opportunity to do per-device teardown for
MSI/X. If that's not required, the default version simply calls
arch_teardown_msi_irq() for each msi irq required.

arch_teardown_msi_irqs() is simply passed a pdev, attached to the pdev
is a list of msi_descs, it is up to the arch to free the irq associated
with each of these as appropriate.

For archs that _don't_ implement arch_teardown_msi_irqs(), all msi_descs
with irq == 0 are considered unallocated, and the arch teardown routine
is not called on them.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 9c8313343c MSI: Give archs the option to allocate all MSI/Xs at once.
This patch introduces an optional function, arch_setup_msi_irqs(),
(note the plural) which gives an arch the opportunity to do per-device
setup for MSI/X and then allocate all the requested MSI/Xs at once.

If that's not required by the arch, the default version simply calls
arch_setup_msi_irq() for each MSI irq required.

arch_setup_msi_irqs() is passed a pdev, attached to the pdev is a list
of msi_descs with irq == 0, it is up to the arch to connect these up to
an irq (via set_irq_msi()) or return an error. For convenience the number
of vectors and the type are passed also.

All msi_descs with irq != 0 are considered allocated, and the arch
teardown routine will be called on them when necessary.

The existing semantics of pci_enable_msix() are that if the requested
number of irqs can not be allocated, the maximum number that _could_ be
allocated is returned. To support that, we define that in case of an
error from arch_setup_msi_irqs(), the number of msi_descs with irq != 0
are considered allocated, and are counted toward the "max that could be
allocated".


Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 314e77b3ee MSI: Remove dev->first_msi_irq
Now that we keep a list of msi descriptors, we don't need first_msi_irq
in the pci dev.

If we somehow have zero MSIs configured list_entry() will give us weird
oopes or nice memory corruption bugs. So be paranoid. Add BUG_ONs and also
a check in pci_msi_check_device() to make sure nvec > 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman 4aa9bc955d MSI: Use a list instead of the custom link structure
The msi descriptors are linked together with what looks a lot like
a linked list, but isn't a struct list_head list. Make it one.

The only complication is that previously we walked a list of irqs, and
got the descriptor for each with get_irq_msi(). Now we have a list of
descriptors and need to get the irq out of it, so it needs to be in the
actual struct msi_desc. We use 0 to indicate no irq is setup.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman 65891215e6 PCI: Create alloc_pci_dev(), the one true way to create a struct pci_dev
There are currently several places in the kernel where we kmalloc()
a struct pci_dev and start initialising it. It'd be preferable to
have an allocator so we can ensure the pci_dev is correctly initialised
in one place.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman c9953a73e9 MSI: Add an arch_msi_check_device()
Add an arch_check_device(), which gives archs a chance to check the input
to pci_enable_msi/x. The arch might be interested in the value of nvec so
pass it in. Propagate the error value returned from the arch routine out
to the caller.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Sergei Shtylyov 0da0ead901 PCI: define pci_request/release_regions() for CONFIG_PCI=n
Balance declarations of pci_request_regions() and pci_release_regions() with
empty inline definitions for the CONFIG_PCI=n case -- otherwise my patch to
drivers/net/3c59x.c in the -mm tree doesn't compile. :-)

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Jean Delvare 6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Jean Delvare a9dfd281a7 PCI: scatterlist.h needs types.h
Most architectures' scatterlist.h use the type dma_addr_t, but omit to
include <asm/types.h> which defines it.  This could lead to build failures,
so let's add the missing includes.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:34 -07:00
Brian King f7bdd12d23 pci: New PCI-E reset API
Adds a new API which can be used to issue various types
of PCI-E reset, including PCI-E warm reset and PCI-E hot reset.
This is needed for an ipr PCI-E adapter which does not properly
implement BIST. Running BIST on this adapter results in PCI-E
errors. The only reliable reset mechanism that exists on this
hardware is PCI Fundamental reset (warm reset). Since driving
this type of reset is architecture unique, this provides the
necessary hooks for architectures to add this support.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:34 -07:00
Greg Kroah-Hartman 823bccfc40 remove "struct subsystem" as it is no longer needed
We need to work on cleaning up the relationship between kobjects, ksets and
ktypes.  The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 18:57:59 -07:00
Sam Ravnborg dc24f0e708 kbuild: remove dependency on input.h from file2alias
Almost all definitions used by file2alias was already
present in mod_devicetable.h.
Added the last definition and killed the input.h usage.

The errornous include was pointed out
by: Jan Engelhardt <jengelh@linux01.gwdg.de>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Deepak Saxena <dsaxena@plexity.net>
2007-05-02 20:58:08 +02:00
Jan Kiszka c41bf8fa5e [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and
none of them appear to require additional preempt_disable/enable here.
Let's open-code save_init_fpu in __unlazy_fpu to save a few ops.

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Jan Kiszka 02b64dab56 [PATCH] i386: white space fixes in i387.h
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen c5bcb5635a [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
RDTSCP is already synchronous and doesn't need an explicit CPUID.
This is a little faster and more importantly avoids VMEXITs on Hypervisors.

Original patch from Joerg Roedel, but reworked by AK
Also includes miscompilation fix by Eric Biederman

Cc: "Joerg Roedel" <joerg.roedel@amd.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen 9bccb23dc5 [PATCH] i386: Add X86_FEATURE_RDTSCP
Following x86-64
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 3aefbe0746 [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
Syncs up with x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen e859dc553c [PATCH] i386: Implement alternative_io for i386
Ported from x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 3671df8572 [PATCH] i386: Evaluate constant cpu features at runtime
Redefine cpu_has() to evaluate cpu features already checked in early
boot at compile time.  This way the compiler might eliminate some dead code.
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen c7f81c9453 [PATCH] i386: Verify important CPUID bits in real mode
Check some CPUID bits that are needed for compiler generated early in boot.
When the system is still in real mode before changing the VESA BIOS mode
it is possible to still display an visible error message on the screen.

Similar to x86-64.

Includes cleanups from Eric Biederman

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 05cb007dac [PATCH] x86-64: Use the 32bit wd_ops for 64bit too.
This mainly removes a lot of code, replacing it with calls into the new 32bit
perfctr-watchdog.c

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 09198e6850 [PATCH] i386: Clean up NMI watchdog code
- Introduce a wd_ops structure
- Convert the various nmi watchdogs over to it
- This allows to split the perfctr reservation from the watchdog
setup cleanly.
- Do perfctr reservation globally as it should have always been
- Remove dead code referenced only by unused EXPORT_SYMBOLs

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Zachary Amsden 9e5e3162b2 [PATCH] i386: pte simplify ops
Add comment and condense code to make use of native_local_ptep_get_and_clear
function.  Also, it turns out the 2-level and 3-level paging definitions were
identical, so move the common definition into pgtable.h

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:19 +02:00
Zachary Amsden 142dd97591 [PATCH] i386: pte xchg optimization
In situations where page table updates need only be made locally, and there is
no cross-processor A/D bit races involved, we need not use the heavyweight
xchg instruction to atomically fetch and clear page table entries.  Instead,
we can just read and clear them directly.

This introduces a neat optimization for non-SMP kernels; drop the atomic xchg
operations from page table updates.

Thanks to Michel Lespinasse for noting this potential optimization.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:19 +02:00
Zachary Amsden c2c1accd4b [PATCH] i386: pte clear optimization
When exiting from an address space, no special hypervisor notification of page
table updates needs to occur; direct page table hypervisors, such as Xen,
switch to another address space first (init_mm) and unprotects the page tables
to avoid the cost of trapping to the hypervisor for each pte_clear.  Shadow
mode hypervisors, such as VMI and lhype don't need to do the extra work of
calling through paravirt-ops, and can just directly clear the page table
entries without notifiying the hypervisor, since all the page tables are about
to be freed.

So introduce native_pte_clear functions which bypass any paravirt-ops
notification.  This results in a significant performance win for VMI and
removes some indirect calls from zap_pte_range.

Note the 3-level paging already had a native_pte_clear function, thus
demanding argument conformance and extra args for the 2-level definition.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:19 +02:00
Andi Kleen 57a4f91ae5 [PATCH] x86-64: Auto compute __NR_syscall_max at compile time
No need to maintain it anymore

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis [** ISO-8859-1 charset **] VzquezCao 70ae77f497 [PATCH] x86-64: Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64
Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
NMI_VECTOR to avoid potential hangups in the event of crash when kdump
tries to stop the other CPUs.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis [** ISO-8859-1 charset **] VzquezCao 9062d888aa [PATCH] x86-64: __send_IPI_dest_field - x86_64
Implement __send_IPI_dest_field which can be used to send IPIs when the
"destination shorthand" field of the ICR is set to 00 (destination
field). Use it whenever possible.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis VazquezCao 8339e9fba3 [PATCH] x86-64: safe_apic_wait_icr_idle - x86_64
apic_wait_icr_idle looks like this:

static __inline__ void apic_wait_icr_idle(void)
{
  while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
    cpu_relax();
}

The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.

Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:

"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.

A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-out”
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."

Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.

Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.

AK: moved both functions out of line
AK: Added improved loop from Keith Owens

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Fernando Luis VazquezCao f2b218dd61 [PATCH] i386: safe_apic_wait_icr_idle - i386
apic_wait_icr_idle looks like this:

static __inline__ void apic_wait_icr_idle(void)
{
  while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
    cpu_relax();
}

The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.

Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:

"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.

A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-out”
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."

Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.

Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.

AK: moved both functions out of line
AK: added improved loop from Keith Owens

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl de938c51d5 [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync
If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
we are running on an AMD64/K8 system, the boot CPU must have had
MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
have copied these bits cleared), so we set them on this CPU as well.

This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
across the CPUs of SMP systems in order to fullfill the duty of
system software to "initialize and maintain MTRR consistency
across all processors." as written in the AMD and Intel manuals.

If an WRMSR instruction fails because MtrrFixDramModEn is not
set, I expect that also the Intel-style MTRR bits are not updated.

AK: minor cleanup, moved MSR defines around

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl 2b1f6278d7 [PATCH] x86: Save the MTRRs of the BSP before booting an AP
Applied fix by Andew Morton:
http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.

AMD and Intel x86 CPU manuals state that it is the responsibility of
system software to initialize and maintain MTRR consistency across
all processors in Multi-Processing Environments.

Quote from page 188 of the AMD64 System Programming manual (Volume 2):

7.6.5 MTRRs in Multi-Processing Environments

"In multi-processing environments, the MTRRs located in all processors must
characterize memory in the same way. Generally, this means that identical
values are written to the MTRRs used by the processors." (short omission here)
"Failure to do so may result in coherency violations or loss of atomicity.
Processor implementations do not check the MTRR settings in other processors
to ensure consistency. It is the responsibility of system software to
initialize and maintain MTRR consistency across all processors."

Current Linux MTRR code already implements the above in the case that the
BIOS does not properly initialize MTRRs on the secondary processors,
but the case where the fixed-range MTRRs of the boot processor are changed
after Linux started to boot, before the initialsation of a secondary
processor, is not handled yet.

In this case, secondary processors are currently initialized by Linux
with MTRRs which the boot processor had very early, when mtrr_bp_init()
did run, but not with the MTRRs which the boot processor uses at the
time when that secondary processors is actually booted,
causing differing MTRR contents on the secondary processors.

Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
of the boot processor when it transitions the system into ACPI mode.
The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
code runs acpi_enable().

Other occasions where the SMI handler of the BIOS may change bits in
the MTRRs could occur as well. To initialize newly booted secodary
processors with the fixed-range MTRRs which the boot processor uses
at that time, this patch saves the fixed-range MTRRs of the boot
processor before new secondary processors are started. When the
secondary processors run their Linux initialisation code, their
fixed-range MTRRs will be updated with the saved fixed-range MTRRs.

If CONFIG_MTRR is not set, we define mtrr_save_state
as an empty statement because there is nothing to do.

Possible TODOs:

*) CPU-hotplugging outside of SMP suspend/resume is not yet tested
   with this patch.

*) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
   then the calls to mtrr_save_state() could be replaced by calls to
   mtrr_save_fixed_ranges(NULL) and  mtrr_save_state() would not be
   needed.

   That would need either verification of the CPU-hotplug code or
   at least a test on a >2 CPU machine.

*) The MTRRs of other running processors are not yet checked at this
   time but it might be interesting to syncronize the MTTRs of all
   processors before booting. That would be an incremental patch,
   but of rather low priority since there is no machine known so
   far which would require this.

AK: moved prototypes on x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl 2b3b4835c9 [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches.
In this current implementation which is used in other patches,
mtrr_save_fixed_ranges() accepts a dummy void pointer because
in the current implementation of one of these patches, this
function may be called from smp_call_function_single() which
requires that this function takes a void pointer argument.

This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
which is the element of the static struct which stores our current
backup of the fixed-range MTRR values which all CPUs shall be
using.

Because  mtrr_save_fixed_ranges calls get_fixed_ranges after
kernel initialisation time, __init needs to be removed from
the declaration of get_fixed_ranges().

If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
as an empty statement because there is nothing to do.

AK: Moved prototypes for x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Andi Kleen 856f44ff4a [PATCH] x86-64: Move mtrr prototypes from proto.h to mtrr.h
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge 03df4f6ee9 [PATCH] i386: Clean up ELF note generation
Three cleanups:

1: ELF notes are never mapped, so there's no need to have any access
flags in their phdr.

2: When generating them from asm, tell the assembler to use a SHT_NOTE
section type.  There doesn't seem to be a way to do this from C.

3: Use ANSI rather than traditional cpp behaviour to stringify the
macro argument.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge 441d40dca0 [PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org>
The other symbols used to delineate the alt-instructions sections have the
form __foo/__foo_end.  Rename parainstructions to match.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:16 +02:00
Zachary Amsden e0bb864397 [PATCH] i386: Convert VMI timer to use clock events
Convert VMI timer to use clock events, making it properly able to use the NO_HZ
infrastructure.  On UP systems, with no local APIC, we just continue to route
these events through the PIT.  On systems with a local APIC, or SMP, we provide
a single source interrupt chip which creates the local timer IRQ.  It actually
gets delivered by the APIC hardware, but we don't want to use the same local
APIC clocksource processing, so we create our own handler here.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Dan Hecht <dhecht@vmware.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 57decbda6a [PATCH] x86: update for i386 and x86-64 check_bugs
Remove spurious comments, headers and keywords from x86-64 bugs.[ch].

Use identify_boot_cpu()

AK: merged with other patch

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge c5413fbe89 [PATCH] i386: Fix UP gdt bugs
Fixes two problems with the GDT when compiling for uniprocessor:
 - There's no percpu segment, so trying to load its selector into %fs fails.
   Use a null selector instead.
 - The real gdt needs to be loaded at some point.  Do it in cpu_init().

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 1956c73bb5 [PATCH] i386: Define per_cpu_offset
Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
asm-generic/percpu.h does for UP.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 978c038ec9 [PATCH] i386: cleanups to help using per-cpu variables from asm
This patch does a few small cleanups:
 - use PER_CPU_NAME to generate the names of per-cpu variables
 - use lea to add the per_cpu offset in PER_CPU(), because it doesn't
   affect condition flags
 - add PER_CPU_VAR which allows direct access to pre-cpu variables
   with the %fs: prefix on SMP.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 7c3576d261 [PATCH] i386: Convert PDA into the percpu section
Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
    can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
    special segment at early boot; it can be deferred until the first
    percpu area is allocated (or never for UP).

The result is less code and one less x86-specific concept.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 7a61d35d4b [PATCH] i386: Page-align the GDT
Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
lguest, KVM and native don't care.

Simple transformation to page-aligned "struct gdt_page".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 4cdd9c8931 [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear
In shadow mode hypervisors, ptep_get_and_clear achieves the desired
purpose of keeping the shadows in sync by issuing a native_get_and_clear,
followed by a call to pte_update, which indicates the PTE has been
modified.

Direct mode hypervisors (Xen) have no need for this anyway, and will trap
the update using writable pagetables.

This means no hypervisor makes use of ptep_get_and_clear; there is no
reason to have it in the paravirt-ops structure.  Change confusing
terminology about raw vs. native functions into consistent use of
native_pte_xxx for operations which do not invoke paravirt-ops.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 1a45b7aaa5 [PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers
Replace all the open-coded macros for generating calls with a pair of
more general macros (__PVOP_CALL/VCALL), and redefine all the
PVOP_V?CALL[0-4] in terms of them.

[ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h"
  (paravirt_ops-document-asm-i386-paravirth.patch) ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 4e0fa85602 [PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi
Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge ce6234b529 [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages
Xen and VMI both have special requirements when mapping a highmem pte
page into the kernel address space.  These can be dealt with by adding
a new kmap_atomic_pte() function for mapping highptes, and hooking it
into the paravirt_ops infrastructure.

Xen specifically wants to map the pte page RO, so this patch exposes a
helper function, kmap_atomic_prot, which maps the page with the
specified page protections.

This also adds a kmap_flush_unused() function to clear out the cached
kmap mappings.  Xen needs this to clear out any potential stray RW
mappings of pages which will become part of a pagetable.

[ Zach - vmi.c will need some attention after this patch.  It wasn't
  immediately obvious to me what needs to be done. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge a27fe809b8 [PATCH] i386: PARAVIRT: revert map_pt_hook.
Back out the map_pt_hook to clear the way for kmap_atomic_pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge d4c104771a [PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op
This patch adds a pv_op for flush_tlb_others.  Linux running on native
hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
have a particular mm's pagetable entries cached in its TLB.  This is
inefficient in a paravirtualized environment, since the hypervisor
knows which real CPUs actually contain cached mappings, which may be a
small subset of a guest's VCPUs.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 63f70270cc [PATCH] i386: PARAVIRT: add common patching machinery
Implement the actual patching machinery.  paravirt_patch_default()
contains the logic to automatically patch a callsite based on a few
simple rules:

 - if the paravirt_op function is paravirt_nop, then patch nops
 - if the paravirt_op function is a jmp target, then jmp to it
 - if the paravirt_op function is callable and doesn't clobber too much
    for the callsite, call it directly

paravirt_patch_default is suitable as a default implementation of
paravirt_ops.patch, will remove most of the expensive indirect calls
in favour of either a direct call or a pile of nops.

Backends may implement their own patcher, however.  There are several
helper functions to help with this:

paravirt_patch_nop	nop out a callsite
paravirt_patch_ignore	leave the callsite as-is
paravirt_patch_call	patch a call if the caller and callee
			have compatible clobbers
paravirt_patch_jmp	patch in a jmp
paravirt_patch_insns	patch some literal instructions over
			the callsite, if they fit

This patch also implements more direct patches for the native case, so
that when running on native hardware many common operations are
implemented inline.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 294688c028 [PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h
Clean things up, and broadly document:
 - the paravirt_ops functions themselves
 - the patching mechanism

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge f8822f4201 [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable
Wrap a set of interesting paravirt_ops calls in a wrapper which makes
the callsites available for patching.  Unfortunately this is pretty
ugly because there's no way to get gcc to generate a function call,
but also wrap just the callsite itself with the necessary labels.

This patch supports functions with 0-4 arguments, and either void or
returning a value.  64-bit arguments must be split into a pair of
32-bit arguments (lower word first).  Small structures are returned in
registers.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 42c24fa22e [PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register
Fix a few clobbers to include the return register.  The clobbers set
is the set of all registers modified (or may be modified) by the code
snippet, regardless of whether it was deliberate or accidental.

Also, make sure that callsites which are used in contexts which don't
allow clobbers actually save and restore all clobberable registers.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:14 +02:00