Commit Graph

31331 Commits

Author SHA1 Message Date
Grant Likely aafbf16b89 powerpc/5200: Add 'simple-bus' to the of_platform probe list.
To better match the ePAPR specification, device nodes which claim
"simple-bus" compatibility should be probed by default.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-02-26 23:19:36 -07:00
Grzegorz Bernacki 86f5a4a7d7 powerpc/5200: On the digsy-mtc, configure PSC4 and PSC5 as UARTs
On digsy MTC PSC4 and PSC5 should be configured as UART, not PSC3 and PSC4.

Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-02-26 22:55:29 -07:00
Grzegorz Bernacki 652b2db16f powerpc/5200: Add digsy-mtc support to mpc5200_defconfig
The following options are enabled to support the digsy-mtc.
 - LXT phy
 - AT24 eeprom
 - RTC (DS1337)
 - MTD partitioning based on OF description

Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-02-26 22:55:03 -07:00
Benjamin Herrenschmidt 1ac00cc213 powerpc/44x: Fix address decoding setup of PCI 2.x cells
The PCI 2.x cells used on some 44x SoCs only let us configure the decode
for the low 32-bit of the incoming PLB addresses. The top 4 bits (this
is a 36-bit bus) are hard wired to different values depending on the
specific SoC in use. Our code used to work "by accident" until I added
support for the ISA memory holes and while at it added more validity
checking of the addresses.

This patch should bring it back to working condition. It still relies
on the device-tree being correct but that's somewhat a pre-requisite
for anything to work anyway.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-27 09:30:17 +11:00
Kyle McMartin f6be37fdc6 x86: enable DMAR by default
Now that the obvious bugs have been worked out, specifically
the iwlagn issue, and the write buffer errata, DMAR should be safe
to turn back on by default. (We've had it on since those patches were
first written a few weeks ago, without any noticeable bug reports
(most have been due to the dma-api debug patchset.))

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 20:59:47 +01:00
Herbert Xu a760a6656e crypto: api - Fix module load deadlock with fallback algorithms
With the mandatory algorithm testing at registration, we have
now created a deadlock with algorithms requiring fallbacks.
This can happen if the module containing the algorithm requiring
fallback is loaded first, without the fallback module being loaded
first.  The system will then try to test the new algorithm, find
that it needs to load a fallback, and then try to load that.

As both algorithms share the same module alias, it can attempt
to load the original algorithm again and block indefinitely.

As algorithms requiring fallbacks are a special case, we can fix
this by giving them a different module alias than the rest.  Then
it's just a matter of using the right aliases according to what
algorithms we're trying to find.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-02-26 14:06:31 +08:00
Mark Nelson f72b728bf1 powerpc: Fix 64bit __copy_tofrom_user() regression
This fixes a regression introduced by commit
a4e22f02f5 ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").

The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().

This stops us reading beyond the end of the source region we were told
to copy.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:54 +11:00
Mark Nelson e423b9ecd6 powerpc: Fix 64bit memcpy() regression
This fixes a regression introduced by commit
25d6e2d7c5 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").

This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,

The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.

In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.

Below is an example of the regression, as reported by Sachin Sant:

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
    pc: c000000000039574: .memcpy+0x74/0x244
    lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
    sp: c00000003baf32a0
   msr: 8000000000009032
   dar: c00000003f380000
 dsisr: 40000000
  current = 0xc00000003e54b010
  paca    = 0xc000000000a53680
    pid   = 1840, comm = readahead
enter ? for help
[link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:53 +11:00
Michael Neuling 49f297f8df powerpc: Fix load/store float double alignment handler
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct.  Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.

Below fixes this and merges the little/big endian case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:53 +11:00
Roel Kluin 5b5923975f [IA64] Don't go beyond iosapic_intr_info's arraysize
vi arch/ia64/kernel/iosapic.c +142
static struct iosapic_intr_info {
	...
} iosapic_intr_info[NR_IRQS];

But at line 510 we have:
	for (i = 0; i <= NR_IRQS; i++) {

s/<=/</

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-25 11:50:53 -08:00
Roel Kluin aa2f63c954 [IA64] Do not go beyond ARRAY_SIZE of unw.hash
static struct {

... :114
        unsigned short hash[UNW_HASH_SIZE];

... :2152
	for (index = 0; index <= UNW_HASH_SIZE; ++index) {

This is a bug, isn't it?

s/<=/</

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-25 11:48:04 -08:00
Kyle McMartin 6b1ff036d4 [IA64] enable setting DMAR on by default
The previous commit which introduced the DMAR_DEFAULT_ON setting in
drivers/pci/dmar.c neglected to add the ability for ia64 to enable
the IOMMU by default. Rectify that mistake, doh!

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-25 11:40:27 -08:00
Jeremy Fitzhardinge 55d8085671 xen: disable interrupts early, as start_kernel expects
This avoids a lockdep warning from:
	if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled)))
		return;
in trace_hardirqs_on_caller();

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:51:57 +01:00
Linus Torvalds f8dacde8c0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix crashes in jbusmc_print_dimm()
2009-02-25 09:31:56 -08:00
Venkatesh Pallipadi 4ab0d47d0a gpu/drm, x86, PAT: io_mapping_create_wc and resource_size_t
io_mapping_create_wc should take a resource_size_t parameter in place of
unsigned long. With unsigned long, there will be no way to map greater than 4GB
address in i386/32 bit.

On x86, greater than 4GB addresses cannot be mapped on i386 without PAE. Return
error for such a case.

Patch also adds a structure for io_mapping, that saves the base, size and
type on HAVE_ATOMIC_IOMAP archs, that can be used to verify the offset on
io_mapping_map calls.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:51 +01:00
James Bottomley ddf9499b3d x86, Voyager: fix compile by lifting the degeneracy of phys_cpu_present_map
This was changed to a physmap_t giving a clashing symbol redefinition,
but actually using a physmap_t consumes rather a lot of space on x86,
so stick with a private copy renamed with a voyager_ prefix and made
static.  Nothing outside of the Voyager code uses it, anyway.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 12:50:11 -08:00
Mark Brown c8532db7f2 [ARM] 5411/1: S3C64XX: Fix EINT unmask
Currently the unmask function for EINT interrupts was setting the mask
bit rather than clearing it.  This was also previously reported and
fixed by Kyungmin Park <kyungmin.park@samsung.com> and others.

Acked-By: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-24 19:12:31 +00:00
Russell King 531660ef56 Add i2c_board_info for RiscPC PCF8583
Add the necessary i2c_board_info structure to fix the lack of PCF8583
RTC on RiscPC.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
2009-02-24 19:19:50 +01:00
Anton Blanchard 501cb16d3c powerpc: Randomise PIEs
Randomise ELF_ET_DYN_BASE, which is used when loading position independent
executables.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard 002b0ec73d powerpc: Increase stack gap on 64bit binaries
On 64bit there is a possibility our stack and mmap randomisation will put
the two close enough such that we can't expand our stack to match the ulimit
specified.

To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
address space, so in the worst case we end up with the same ~128MB hole as in
32bit. This works because we randomise the stack over a 1GB range.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard a5adc91a4b powerpc: Ensure random space between stack and mmaps
get_random_int() returns the same value within a 1 jiffy interval. This means
that the mmap and stack regions will almost always end up the same distance
apart, making a relative offset based attack possible.

To fix this, shift the randomness we use for the mmap region by 1 bit.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard 912f9ee21c powerpc: Randomise the brk region
Randomize the heap.

before:
tundro2:~ # sleep 1 & cat /proc/${!}/maps | grep heap
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]

after
tundro2:~ # sleep 1 & cat /proc/${!}/maps | grep heap
19419000-1951a000 rw-p 19419000 00:00 0                                  [heap]
325ff000-32700000 rw-p 325ff000 00:00 0                                  [heap]
1a97c000-1aa7d000 rw-p 1a97c000 00:00 0                                  [heap]
1cc60000-1cd61000 rw-p 1cc60000 00:00 0                                  [heap]
1afa9000-1b0aa000 rw-p 1afa9000 00:00 0                                  [heap]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:20 +11:00
Anton Blanchard d839088cae powerpc: Randomise lower bits of stack address
Randomise the lower bits of the stack address. More randomisation is good for
security but the scatter can also help with SMT threads that share an L1. A
quick test case shows this working:

int main()
{
	int sp;
	printf("%x\n", (unsigned long)&sp & 4095);
}

before:
80
80
80
80
80

after:
610
490
300
6b0
d80

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:20 +11:00
Anton Blanchard 2dadb987e0 powerpc: More stack randomisation for 64bit binaries
At the moment we randomise the stack by 8MB on 32bit and 64bit tasks. Since we
have a lot more address space to play with on 64bit, lets do what x86 does and
increase that randomisation to 1GB:

before:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
fffffebc000-fffffed1000 rw-p ffffffeb000 00:00 0       [stack]
ffffff5a000-ffffff6f000 rw-p ffffffeb000 00:00 0       [stack]
fffffdb2000-fffffdc7000 rw-p ffffffeb000 00:00 0       [stack]
fffffd3e000-fffffd53000 rw-p ffffffeb000 00:00 0       [stack]
fffffad9000-fffffaee000 rw-p ffffffeb000 00:00 0       [stack]

after:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
ffff5c27000-ffff5c3c000 rw-p ffffffeb000 00:00 0       [stack]
fffebe5e000-fffebe73000 rw-p ffffffeb000 00:00 0       [stack]
fffcb298000-fffcb2ad000 rw-p ffffffeb000 00:00 0       [stack]
fffc719d000-fffc71b2000 rw-p ffffffeb000 00:00 0       [stack]
fffe01af000-fffe01c4000 rw-p ffffffeb000 00:00 0       [stack]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:07 +11:00
Anton Blanchard 9f14c42d75 powerpc: Randomise mmap start address
Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks.
Until ppc32 uses the mmap.c functionality, this is ppc64 specific.

Before:

# ./test & cat /proc/${!}/maps|tail -2|head -1
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0

After:
# ./test & cat /proc/${!}/maps|tail -2|head -1
f718b000-f7b8c000 rw-p f718b000 00:00 0
f7551000-f7f52000 rw-p f7551000 00:00 0
f6ee7000-f78e8000 rw-p f6ee7000 00:00 0
f74d4000-f7ed5000 rw-p f74d4000 00:00 0
f6e9d000-f789e000 rw-p f6e9d000 00:00 0

Similar for 64bit, but with 1GB of scatter:
# ./test & cat /proc/${!}/maps|tail -2|head -1
fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0
fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0
fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0
fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0
fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:07 +11:00
Anton Blanchard 13a2cb3694 powerpc: Rearrange mmap.c
Rearrange mmap.c to better match the x86 version.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:06 +11:00
Anton Blanchard a465f9b694 powerpc: Move is_32bit_task
Move is_32bit_task into asm/thread_info.h, that allows us to test for
32/64bit tasks without an ugly CONFIG_PPC64 ifdef.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:06 +11:00
Giuliano Pochini 4c4ece3cf8 powerpc/powermac: Hotplug /sys entries are missing
On Wed, 18 Feb 2009 22:18:21 +0100
Giuliano Pochini <pochini@shiny.it> wrote:

Since 2.6.28, /sys/devices/system/cpu/cpu*/online don't exist anymore
on 32-bit PowerMacs due to change in the generic powerpc code.

Signed-off-by: Giuliano Pochini <pochini@shiny.it>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:06 +11:00
Benjamin Krill 41fd81cc56 powerpc/cell: Add rtas rtc calls for the QPACE platform
The new firmware release exports further RTC calls. This
patch adds these calls to the QPACE platform setup file.

Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:05 +11:00
Michael Neuling 553631e25f powerpc: Fix load/store float double alignment handler
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct.  Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.

Below fixes this and merges the little/big endian case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:05 +11:00
Michael Neuling 545bba1824 powerpc: Add alignment handler for new lfiwzx instruction
lfiwzx is a new floating point load instruction in 2.06 that needs an
alignment handler for Linux.

Turns out to be the worlds easiest handler to add.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Nathan Fontenot 0f16ef7fd3 powerpc/numa: Cleanup hot_add_scn_to_nid
This patch reworks the hot_add_scn_to_nid and its supporting functions
to make them easier to understand.  There are no functional changes in
this patch and has been tested on machine with memory represented in the
device tree as memory nodes and in the ibm,dynamic-memory property.

My previous patch that introduced support for hotplug memory add on
systems whose memory was represented by the ibm,dynamic-memory property
of the device tree only left the code more unintelligible.  This
will hopefully makes things easier to understand.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Brian King f52862f407 powerpc/pseries: Fix partition migration hang under load
While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Michael Ellerman 448e2ca0e3 powerpc/pseries: Implement a quota system for MSIs
There are hardware limitations on the number of available MSIs,
which firmware expresses using a property named "ibm,pe-total-#msi".
This property tells us how many MSIs are available for devices below
the point in the PCI tree where we find the property.

For old firmwares which don't have the property, we assume there are
8 MSIs available per "partitionable endpoint" (PE). The PE can be
found using existing EEH code, which uses the methods described in
PAPR. For our purposes we want the parent of the node that's
identified using this method.

When a driver requests n MSIs for a device, we first establish where
the "ibm,pe-total-#msi" property above that device is, or we find the
PE if the property is not found. In both cases we call this node
the "pe_dn".

We then count all non-bridge devices below the pe_dn, to establish
how many devices in total may need MSIs. The quota is then simply the
total available divided by the number of devices, if the request is
less than or equal to the quota, the request is fine and we're done.

If the request is greater than the quota, we try to determine if there
are any "spare" MSIs which we can give to this device. Spare MSIs are
found by looking for other devices which can never use their full
quota, because their "req#msi(-x)" property is less than the quota.

If we find any spare, we divide the spares by the number of devices
that could request more than their quota. This ensures the spare
MSIs are spread evenly amongst all over-quota requestors.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:03 +11:00
Michael Ellerman d523cc379d powerpc/pseries: Return req#msi(-x) if request is larger
If a driver asks for more MSIs than the devices "req#msi(-x)" property,
we currently return -ENOSPC. This doesn't give the driver any chance to
make a new request with a number that might work.

So if "req#msi(-x)" is less than the request, return its value. To be
100% safe, make sure we return an error if req_msi == 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:03 +11:00
Kumar Gala 620165f971 powerpc: Add support for using doorbells for SMP IPI
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture.  We use the normal level doorbell for
doing SMP IPIs at this point.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:03 +11:00
Arnd Bergmann 6ed8d12849 powerpc/cell: Fix dependency in cpufreq
cbe_cpufreq has a partial dependency on cbe_cpufreq_pmi, which cannot
be easily expressed in Kconfig. This fixes it by introducing an
extra Kconfig symbol CBE_CPUFREQ_PMI_ENABLE. To make the dependency
clearer, turn PPC_PMI into an automatic symbol.

Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:02 +11:00
Jeremy Kerr 74254647e0 powerpc/spufs: Constify context contents and coredump callback constants
The spufs context directory contents definitions are not changed after
initialisation, so we can declare them as const. We can do the same
with the spu coredump reader callbacks too.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:59 +11:00
Jeremy Kerr 3688b46b89 powerpc/spufs: Clear purge status before setting up isolated mode
Currently, we may setup the MFC for isolated mode initilaisation with
the purge still active. This means that DMAs required to perform the
init do not happen.

This change clears the purge status after doing the purge, so that
the isolated init can proceed.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:59 +11:00
Jeremy Kerr 60ee031940 powerpc/spufs: Use correct return value for spu_handle_mm_fault
Currently, spu_handle_mm_fault disregards the 'ret' variable and always
returns -EFAULT on error.

This change refactos spu_handle_mm_fault a little, to return the
ret variable as appropriate. This allows us to combine the error and
sucess paths.

Also, remove the #if-0-ed IS_VALID_EA() check, it has never been
used.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:58 +11:00
Anton Blanchard 13870b6575 powerpc/mm: Reduce hashtable size when using 64kB pages
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.

Grab the real page size and size the hashtable based on that

Note: This only has effect on non hypervisor machines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:58 +11:00
Ilya Yanok 33f00dcedb powerpc: Rework dma-noncoherent to use generic vmalloc layer
This patch rewrites consistent dma allocations support to use vmalloc
layer to allocate virtual memory space from vmalloc pool and get rid
of CONFIG_CONSISTENT_{START,SIZE}.

This greatly simplifies the code by effectively removing a custom
allocator we had for virtual space.

Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:57 +11:00
Kumar Gala 812d904e39 powerpc: Fix warnings from make headers_check
include/asm/bootx.h:12: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/bootx.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/elf.h:5: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:23: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:26: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/ps3fb.h:33: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/spu_info.h:27: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/swab.h:11: include of <linux/types.h> is preferred over <asm/types.h>

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:57 +11:00
Tom Arbuckle f81786913a powerpc/pci: Fix PCI<->OF matching of old style multifunc devices
Old OF variants used to create a 'dummy' parent node "multifunc-device"
for devices with more than one PCI function. Our code that matches OF
nodes to PCI devices dealt with that in one place but not in another,
this fixes it.

This has the practical effect of fixing interrupt routing of multifunction
PCI cards on some older PowerMac machines.

Signed-off-by: Tom Arbuckle <tom.d.arbuckle@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:57 +11:00
Kumar Gala 16c57b3620 powerpc: Unify opcode definitions and support
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.

We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.

Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:56 +11:00
Steven Rostedt bb9b903527 powerpc, ftrace: use create_branch lib function
Impact: clean up, remove duplicate code

When ftrace was first ported to PowerPC, there existed a
create_function_call that would create the instruction to make a call
to a given address. Unfortunately, this call expected to write to
the address it was given, and since it used the address to calculate
the offset, it could not be faked.

ftrace needed a way to create the instruction without actually writing
that instruction to the text section. So ftrace had to implement its
own code.

Now we have create_branch in the code patching library, which does
exactly what ftrace needs. This patch replaces ftrace's implementation
with the library function.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:56 +11:00
Steven Rostedt b54dcfe108 powerpc, ftrace: use unsigned int for instruction manipulation
The original port of ftrace to PowerPC kept a lot of the code used
by x86. Some of this code was to handle x86's 5 byte instruction.
This was handled by using character arrays to manipulate the
code.

PowerPC has a consistent 4 byte instruction. Using unsigned ints
makes the code more efficient as well as more readable.
By converting to use unsigned ints to represent instructions,
I was able to remove the side effects that were needed for
manipulating character strings.

  i.e. memcpy and memcmp

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:55 +11:00
Steven Rostedt 60ce8f7260 powerpc32, ftrace: dynamic function graph tracer
This patch gets function graph tracing working with dynamic function
tracer on PowerPC32.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:55 +11:00
Steven Rostedt fad4f47cc8 powerpc32, ftrace: port function graph tracer to ppc32, static only
This patch ports the function graph tracer for PowerPC, but only
for static function tracing.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:55 +11:00
Steven Rostedt bf528a3a9b powerpc32, ftrace: save and restore mcount regs with macro
Impact: clean up

Use a macro to save and restore the registers for PowerPC32,
since that code is duplicated.

This is similar to the work done by Cyrill Gorcunov for the
mcount code in x86_64.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt bb7253403f powerpc64, ftrace: save toc only on modules for function graph
The TOCS used by modules are different than the one used by
the core kernel code. The function graph tracer must save and
restore the TOC whenever it traces a module call. But this
is an added overhead to burden the majority of core kernel
code being traced.

Benjamin Herrenschmidt suggested in testing the entry of
the call to tell if it is a core kernel function or a module.
He recommended using the REGION_ID() macro to perform this test.

This patch implements Benjamin's idea, and uses a different
return_to_handler routine dependent on if the entry is a core
kernel function or not. The module version saves the TOC, where as
the core kernel version does not.

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt 4654288847 powerpc64, tracing: add function graph tracer with dynamic tracing
This is the port of the function graph tracer to PowerPC with
dynamic tracing.

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt 6794c78243 powerpc64: port of the function graph tracer
This is a port of the function graph tracer that was written by
Frederic Weisbecker for the x86.

This only works for PPC64 at the moment and only for static tracing.
PPC32 and dynamic function graph tracing support will come later.

The trace produces a visual calling of functions:

 # tracer: function_graph
 #
 # CPU  DURATION                  FUNCTION CALLS
 # |     |   |                     |   |   |   |
  0)   2.224 us    |                        }
  0) ! 271.024 us  |                      }
  0) ! 320.080 us  |                    }
  0) ! 324.656 us  |                  }
  0) ! 329.136 us  |                }
  0)               |                .put_prev_task_fair() {
  0)               |                  .update_curr() {
  0)   2.240 us    |                    .update_min_vruntime();
  0)   6.512 us    |                  }
  0)   2.528 us    |                  .__enqueue_entity();
  0) + 15.536 us   |                }
  0)               |                .pick_next_task_fair() {
  0)   2.032 us    |                  .__pick_next_entity();
  0)   2.064 us    |                  .__clear_buddies();
  0)               |                  .set_next_entity() {
  0)   2.672 us    |                    .__dequeue_entity();
  0)   6.864 us    |                  }

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:53 +11:00
Steven Rostedt 17be5b3ddf powerpc, ftrace: fix compile error when modules not configured
Michael Neuling reported a compile bug when dynamic ftrace was
configured in and modules were not. This was due to the ftrace
code referencing module specific structures.

Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:53 +11:00
Steven Rostedt 44e1d064b9 ftrace, powerpc: replace debug macro with proper pr_deug
Impact: cleanup

The PowerPC ftrace code uses a hacked up DEBUGP macro for prints.
This patch converts it to the standard pr_debug.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:52 +11:00
Benjamin Herrenschmidt 35f88e6b06 Merge commit 'ftrace/function-graph' into next 2009-02-23 10:47:23 +11:00
Andrei Birjukov d82ad6d683 [ARM] at91: fix for Atmel AT91 powersaving
We've discovered that our AT91SAM9260 board consumed too much power when
returning from a slowclock low-power mode.  RAM self-refresh is enabled in
a bootloader in our case, this is how we saw a difference.  Estimated ca.
30mA more on 4V battery than the same state before powersaving.

After a small research we found that there seems to be a bogus
sdram_selfrefresh_disable() call at the end of at91_pm_enter() call, which
overwrites the LPR register with uninitialized value.  Please find the
suggested patch attached.

This patch fixes correct restoring of LPR register of the Atmel AT91 SDRAM
controller when returning from a power saving mode.

Signed-off-by: Andrei Birjukov <andrei.birjukov@artecdesign.ee>
Acked-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-22 22:37:21 +00:00
Rafael J. Wysocki 770824bdc4 PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.

This is based on an earlier patch from Linus.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 10:33:44 -08:00
Linus Torvalds 936577c61d x86: Add IRQF_TIMER to legacy x86 timer interrupt descriptors
Right now nobody cares, but the suspend/resume code will eventually want
to suspend device interrupts without suspending the timer, and will
depend on this flag to know.

The modern x86 timer infrastructure uses the local APIC timers and never
shows up as a device interrupt at all, so it isn't affected and doesn't
need any of this.

Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 10:27:49 -08:00
Linus Torvalds 7c24af498f Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: remove CONFIG_ACPI_SYSTEM
  fujitsu-laptop: Use RFKILL support bitmask from firmware
  x86_64: Fix S3 fail path
  x86_64: acpi/wakeup_64 cleanup
  battery: don't assume we are fully charged when not charging or discharging
  ACPI: EC: Add delay for slow MSI controller
2009-02-22 09:28:46 -08:00
Geert Uytterhoeven 3d92e8f3ae m68k: atari - Rename "mfp" to "st_mfp"
http://kisskb.ellerman.id.au/kisskb/buildresult/72115/:
| net/mac80211/ieee80211_i.h:327: error: syntax error before 'volatile'
| net/mac80211/ieee80211_i.h:350: error: syntax error before '}' token
| net/mac80211/ieee80211_i.h:455: error: field 'sta' has incomplete type
| distcc[19430] ERROR: compile net/mac80211/main.c on sprygo/32 failed

This is caused by

| # define mfp ((*(volatile struct MFP*)MFP_BAS))

in arch/m68k/include/asm/atarihw.h, which conflicts with the new "mfp" enum in
net/mac80211/ieee80211_i.h.

Rename "mfp" to "st_mfp", as it's a way too generic name for a global #define.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 09:23:02 -08:00
Jiri Slaby 6defa2fe20 x86_64: Fix S3 fail path
As acpi_enter_sleep_state can fail, take this into account in
do_suspend_lowlevel and don't return to the do_suspend_lowlevel's
caller. This would break (currently) fpu status and preempt count.

Technically, this means use `call' instead of `jmp' and `jmp' to
the `resume_point' after the `call' (i.e. if
acpi_enter_sleep_state returns=fails). `resume_point' will handle
the restore of fpu and preempt count gracefully.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-02-21 21:58:18 -05:00
Jiri Slaby e6bd6760c9 x86_64: acpi/wakeup_64 cleanup
- remove %ds re-set, it's already set in wakeup_long64
- remove double labels and alignment (ENTRY already adds both)
- use meaningful resume point labelname
- skip alignment while jumping from wakeup_long64 to the resume point
- remove .size, .type and unused labels
[v2]
- added ENDPROCs

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-02-21 21:58:18 -05:00
Linus Torvalds 460c1338fc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: remove incorrect __cpuinit for mce_cpu_features()
  MAINTAINERS: paravirt-ops maintainers update
2009-02-21 09:15:39 -08:00
H. Peter Anvin cc3ca22063 x86, mce: remove incorrect __cpuinit for mce_cpu_features()
Impact: Bug fix on UP

Checkin 6ec68bff3c81e776a455f6aca95c8c5f1d630198:
    x86, mce: reinitialize per cpu features on resume

introduced a call to mce_cpu_features() in the resume path, in order
for the MCE machinery to get properly reinitialized after a resume.
However, this function (and its successors) was flagged __cpuinit,
which becomes __init on UP configurations (on SMP suspend/resume
requires CPU hotplug and so this would not be seen.)

Remove the offending __cpuinit annotations for mce_cpu_features() and
its successor functions.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-20 23:40:40 -08:00
Linus Torvalds be71cb5b52 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: use the right protections for split-up pagetables
  x86, vmi: TSC going backwards check in vmi clocksource
2009-02-20 18:03:07 -08:00
Wei Yongjun d9190913b7 mn10300: fix typo && -> || in arch/mn10300/unit-asb2305/pci.c
Fix the typo && -> ||.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-20 17:57:48 -08:00
David Howells 58bafe72ad mn10300: fix oprofile
oprofile for MN10300 seems to have been broken by the advent of the new
tracing framework.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-20 17:57:48 -08:00
Luca Bigliardi 41a9e64ca4 uml: fix vde network backend in user mode linux
* Replace kmalloc() with uml_kmalloc() (fix build failure)

* Remove unnecessary UM_KERN_INFO in printk() (don't display '<6>' while
  printing info)

Signed-off-by: Luca Bigliardi <shammash@artha.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: WANG Cong <wangcong@zeuux.org>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-20 17:57:48 -08:00
Ingo Molnar 07a66d7c53 x86: use the right protections for split-up pagetables
Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().

The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.

The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.

With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.

Also update the comment.

This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.

[ Huang Ying also experienced problems in this area when writing
  the EFI code, but the real bug in split_large_page() was not
  realized back then. ]

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Huang Ying <ying.huang@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 08:35:03 +01:00
Alok N Kataria 48ffc70b67 x86, vmi: TSC going backwards check in vmi clocksource
Impact: fix time warps under vmware

Similar to the check for TSC going backwards in the TSC clocksource,
we also need this check for VMI clocksource.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
2009-02-20 07:53:08 +01:00
Linus Torvalds a5e7536388 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] xen_domu build fix
  [IA64] fixes configs and add default config for ia64 xen domU
  [IA64] Remove redundant cpu_clear() in __cpu_disable path
  [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
  [IA64] bte_copy of BTE_MAX_XFER trips BUG_ON.
  [IA64] Build fix for __early_pfn_to_nid() undefined link error
2009-02-19 13:09:20 -08:00
Tony Luck ec8148de85 [IA64] xen_domu build fix
arch/ia64/xen/xen_pv_ops.c:156: error: xen_init_ops causes a section type conflict
arch/ia64/xen/xen_pv_ops.c:340: error: xen_iosapic_ops causes a section type conflict

Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-19 12:05:00 -08:00
Isaku Yamahata 1d5b20f490 [IA64] fixes configs and add default config for ia64 xen domU
This patch fixes xen related Kconfigs and add default config
file for ia64 xen domU.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:39:06 -08:00
Alex Chiang c0acdea214 [IA64] Remove redundant cpu_clear() in __cpu_disable path
The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:32:50 -08:00
Alex Chiang 66db2e6331 [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
This reverts commit e7b140365b.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:32:26 -08:00
Robin Holt 39d481cba2 [IA64] bte_copy of BTE_MAX_XFER trips BUG_ON.
BTE_MAX_XFER is wrong.  It is one greater than the number of cache
lines the BTE is actually able to transfer.  If you request a transfer
of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
should certainly be made more clear.

This patch fixes that constant and also cleans up the BUG_ON()s in
arch/ia64/sn/kernel/bte.c to test one condition per line.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:29:31 -08:00
Tony Luck 334f85b647 [IA64] Build fix for __early_pfn_to_nid() undefined link error
ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
so the recent:

	commit: f2dbcfa738
	mm: clean up for early_pfn_to_nid()

ends up with some link problems for certain configuration files.

Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
cases where we do provide this function.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-19 11:22:36 -08:00
Linus Torvalds 402a917aca Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header
  [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC
  [ARM] omap: fix clock reparenting in omap2_clk_set_parent()
  [ARM] 5403/1: pxa25x_ep_fifo_flush() *ep->reg_udccs always set to 0
  [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()
  [ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support
  [ARM] 5400/1: Add support for inverted rdy_busy pin for Atmel nand device controller
  [ARM] 5391/1: AT91: Enable GPIO clocks earlier
  [ARM] 5390/1: AT91: Watchdog fixes
  [ARM] 5398/1: Add Wan ZongShun to MAINTAINERS for W90P910
  [ARM] omap: fix _omap2_clksel_get_src_field()
  [ARM] omap: fix omap2_divisor_to_clksel() error return value
2009-02-19 09:52:12 -08:00
Linus Torvalds bcf8951fc2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
  x86, mce: use force_sig_info to kill process in machine check
  x86, mce: reinitialize per cpu features on resume
  x86, rcu: fix strange load average and ksoftirqd behavior
2009-02-19 09:14:35 -08:00
Hartley Sweeten 9dd446f657 [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header
Remove the gesbc9312.h header since it is unused.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 16:13:02 +00:00
Makito SHIOKAWA 9da616fb99 [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC
READ_IMPLIES_EXEC must be set when:
o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X)
o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6)

Signed-off-by: Makito SHIOKAWA <lkhmkt@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 14:45:27 +00:00
Heiko Carstens 23d75d9cad [S390] fix "mem=" handling in case of standby memory
Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-19 15:19:19 +01:00
Christian Borntraeger d5cd0343d2 [S390] Fix timeval regression on s390
commit aa5e97ce4b
[PATCH] improve precision of process accounting.

Introduced a timing regression:
-bash-3.2# time ls
real    0m0.006s
user    0m1.754s
sys     0m1.094s

The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-19 15:19:19 +01:00
Russell King 41f3103fcf [ARM] omap: fix clock reparenting in omap2_clk_set_parent()
When changing the parent of a clock, it is necessary to keep the
clock use counts balanced otherwise things the parent state will
get corrupted.  Since we already disable and re-enable the clock,
we might as well use the recursive versions instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 13:25:16 +00:00
Nicolas Pitre 3fd9825c42 [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()
In the non highmem case, if two memory banks of 1GB each are provided,
the second bank would evade suppression since its virtual base would
be 0.  Fix this by disallowing any memory bank which virtual base
address is found to be lower than PAGE_OFFSET.

Reported-by: Lennert Buytenhek <buytenh@marvell.com>

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 09:49:45 +00:00
KAMEZAWA Hiroyuki cc2559bccc mm: fix memmap init for handling memory hole
Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.

To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
KAMEZAWA Hiroyuki f2dbcfa738 mm: clean up for early_pfn_to_nid()
What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
Steven Rostedt 712406a6bf tracing/function-graph-tracer: make arch generic push pop functions
There is nothing really arch specific of the push and pop functions
used by the function graph tracer. This patch moves them to generic
code.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18 13:43:04 -05:00
Benjamin Herrenschmidt 3b7faeb49e Merge commit 'kumar/next' into next 2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt 82a0a1cc8f Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/pgtable-ppc32.h
2009-02-18 13:19:25 +11:00
Andi Kleen 07db1c140e x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
Impact: Bugfix

The ifdef for the apic clear on shutdown for the 64bit intel thermal
vector was incorrect and never triggered. Fix that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:34 -08:00
Andi Kleen 380851bc6b x86, mce: use force_sig_info to kill process in machine check
Impact: bug fix (with tolerant == 3)

do_exit cannot be called directly from the exception handler because
it can sleep and the exception handler runs on the exception stack.
Use force_sig() instead.

Based on a earlier patch by Ying Huang who debugged the problem.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:31 -08:00
Andi Kleen 6ec68bff3c x86, mce: reinitialize per cpu features on resume
Impact: Bug fix

This fixes a long standing bug in the machine check code. On resume the
boot CPU wouldn't get its vendor specific state like thermal handling
reinitialized. This means the boot cpu wouldn't ever get any thermal
events reported again.

Call the respective initialization functions on resume

v2: Remove ancient init because they don't have a resume device anyways.
    Pointed out by Thomas Gleixner.
v3: Now fix the Subject too to reflect v2 change

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:28 -08:00
Nicolas Pitre fd4b9b3650 [ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support
The GPIO interrupts can be configured as either level triggered or edge
triggered, with a default of level triggered.  When an edge triggered
interrupt is requested, the gpio_irq_set_type method is called which
currently switches the given IRQ descriptor between two struct irq_chip
instances: orion_gpio_irq_level_chip and orion_gpio_irq_edge_chip. This
happens via __setup_irq() which also calls irq_chip_set_defaults() to
assign default methods to uninitialized ones.  The problem is that
irq_chip_set_defaults() is called before the irq_chip reference is
switched, leaving the new irq_chip (orion_gpio_irq_edge_chip in this
case) with uninitialized methods such as chip->startup() causing a kernel
oops.

Many solutions are possible, such as making irq_chip_set_defaults() global
and calling it from gpio_irq_set_type(), or calling __irq_set_trigger()
before irq_chip_set_defaults() in __setup_irq().  But those require
modifications to the generic IRQ code which might have adverse effect on
other architectures, and that would still be a fragile arrangement.
Manually copying the missing methods from within gpio_irq_set_type()
would be really ugly and it would break again the day new methods with
automatic defaults are added.

A better solution is to have a single irq_chip instance which can deal
with both edge and level triggered interrupts.  It is also a good idea
to switch the IRQ handler instead, as the edge IRQ handler allows for
one edge IRQ event to be queued as the IRQ is actually masked only when
that second IRQ is received, at which point the hardware can queue an
additional IRQ event, making edge triggered interrupts a bit more
reliable.

Tested-by: Martin Michlmayr <tbm@cyrius.com>

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-17 22:37:09 +00:00
Linus Torvalds f8effd1a4a Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  doc: mmiotrace.txt, buffer size control change
  trace: mmiotrace to the tracer menu in Kconfig
  mmiotrace: count events lost due to not recording
2009-02-17 14:29:15 -08:00
Linus Torvalds 35010334aa Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vm86: fix preemption bug
  x86, olpc: fix model detection without OFW
  x86, hpet: fix for LS21 + HPET = boot hang
  x86: CPA avoid repeated lazy mmu flush
  x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context
  x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption
  x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem
  x86/cpa: make sure cpa is safe to call in lazy mmu mode
  x86, ptrace, mm: fix double-free on race
2009-02-17 14:27:39 -08:00
Linus Torvalds b30b774930 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/vsx: Fix VSX alignment handler for regs 32-63
  powerpc/ps3: Move ps3_mm_add_memory to device_initcall
  powerpc/mm: Fix numa reserve bootmem page selection
  powerpc/mm: Fix _PAGE_CHG_MASK to protect _PAGE_SPECIAL
2009-02-17 14:23:49 -08:00
Linus Torvalds 39a65762d4 Merge branch 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: Flush volatile msrs before emulating rdmsr
  KVM: Fix assigned devices circular locking dependency
  KVM: x86: fix LAPIC pending count calculation
  KVM: Fix INTx for device assignment
  KVM: MMU: Map device MMIO as UC in EPT
  KVM: x86: disable kvmclock on non constant TSC hosts
  KVM: PIT: fix i8254 pending count read
  KVM: Fix racy in kvm_free_assigned_irq
  KVM: Add kvm_arch_sync_events to sync with asynchronize events
  KVM: mmu_notifiers release method
  KVM: Avoid using CONFIG_ in userspace visible headers
  KVM: ia64: fix fp fault/trap handler
2009-02-17 14:04:32 -08:00
Paul E. McKenney bf51935f3e x86, rcu: fix strange load average and ksoftirqd behavior
Damien Wyart reported high ksoftirqd CPU usage (20%) on an
otherwise idle system.

The function-graph trace Damien provided:

>   799.521187 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521371 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521555 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521738 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521934 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522068 |   1)  ksoftir-2324  |               |                rcu_check_callbacks() {
>   799.522208 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522392 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522575 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522759 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522956 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523074 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
>   799.523214 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523397 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523579 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523762 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523960 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524079 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
>   799.524220 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524403 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524587 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524770 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
> [ . . . ]

Shows rcu_check_callbacks() being invoked way too often. It should be called
once per jiffy, and here it is called no less than 22 times in about
3.5 milliseconds, meaning one call every 160 microseconds or so.

Why do we need to call rcu_pending() and rcu_check_callbacks() from the
idle loop of 32-bit x86, especially given that no other architecture does
this?

The following patch removes the call to rcu_pending() and
rcu_check_callbacks() from the x86 32-bit idle loop in order to
reduce the softirq load on idle systems.

Reported-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 22:47:45 +01:00
Gregory CLEMENT 744f659272 [ARM] 5400/1: Add support for inverted rdy_busy pin for Atmel nand device controller
Add support for inverted rdy_busy pin for Atmel nand device controller
It will fix building error on NeoCore926 board.

Acked-by: Andrew Victor <linux@maxim.org.za>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Gregory CLEMENT <gclement@adeneo.adetelgroup.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-16 21:40:39 +00:00
Rusty Russell 1371be0f7c cpumask: Use cpu_*_mask accessors code: alpha
Impact: use new API, fix SMP bug.

Use the new accessors rather than frobbing bits directly.

This also removes the bug introduced in ee0c468b (alpha: compile
fixes) which had Alpha setting bits on an on-stack cpumask, not the
cpu_online_map.

Cc: Richard Henderson <rth@twiddle.net>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-02-16 17:32:00 +10:30
Rusty Russell a0abd520fd cpumask: fix powernow-k8: partial revert of 2fdf66b491
Impact: fix powernow-k8 when acpi=off (or other error).

There was a spurious change introduced into powernow-k8 in this patch:
so that we try to "restore" the cpus_allowed we never saved.  We revert
that file.

See lkml "[PATCH] x86/powernow: fix cpus_allowed brokage when
acpi=off" from Yinghai for the bug report.

Cc: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-02-16 17:31:59 +10:30
Pekka Paalanen 6bc5c366b1 trace: mmiotrace to the tracer menu in Kconfig
Impact: cosmetic change in Kconfig menu layout

This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.

CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.

Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-15 20:03:28 +01:00
Thomas Gleixner be716615fe x86, vm86: fix preemption bug
Commit 3d2a71a596 ("x86, traps: converge
do_debug handlers") changed the preemption disable logic of do_debug()
so vm86_handle_trap() is called with preemption disabled resulting in:

 BUG: sleeping function called from invalid context at include/linux/kernel.h:155
 in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
 Pid: 3005, comm: dosemu.bin Tainted: G        W  2.6.29-rc1 #51
 Call Trace:
  [<c050d669>] copy_to_user+0x33/0x108
  [<c04181f4>] save_v86_state+0x65/0x149
  [<c0418531>] handle_vm86_trap+0x20/0x8f
  [<c064e345>] do_debug+0x15b/0x1a4
  [<c064df1f>] debug_stack_correct+0x27/0x2c
  [<c040365b>] sysenter_do_call+0x12/0x2f
 BUG: scheduling while atomic: dosemu.bin/3005/0x10000001

Restore the original calling convention and reenable preemption before
calling handle_vm86_trap().

Reported-by: Michal Suchanek <hramrach@centrum.cz>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-15 10:46:13 +01:00
Avi Kivity 516a1a7e9d KVM: VMX: Flush volatile msrs before emulating rdmsr
Some msrs (notable MSR_KERNEL_GS_BASE) are held in the processor registers
and need to be flushed to the vcpu struture before they can be read.

This fixes cygwin longjmp() failure on Windows x64.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:39 +02:00
Marcelo Tosatti b682b814e3 KVM: x86: fix LAPIC pending count calculation
Simplify LAPIC TMCCT calculation by using hrtimer provided
function to query remaining time until expiration.

Fixes host hang with nested ESX.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:38 +02:00
Sheng Yang 2aaf69dcee KVM: MMU: Map device MMIO as UC in EPT
Software are not allow to access device MMIO using cacheable memory type, the
patch limit MMIO region with UC and WC(guest can select WC using PAT and
PCD/PWT).

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:37 +02:00
Marcelo Tosatti abe6655dd6 KVM: x86: disable kvmclock on non constant TSC hosts
This is better.

Currently, this code path is posing us big troubles,
and we won't have a decent patch in time. So, temporarily
disable it.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:36 +02:00
Marcelo Tosatti d2a8284e8f KVM: PIT: fix i8254 pending count read
count_load_time assignment is bogus: its supposed to contain what it
means, not the expiration time.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:36 +02:00
Sheng Yang ba4cef31d5 KVM: Fix racy in kvm_free_assigned_irq
In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq
handler and interrupt_work, in order to prevent cancel_work_sync() in
kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done.
But it's tricky and still got two problems:

1. A bug ignored two conditions that cancel_work_sync() would return true result
in a additional kvm_put_kvm().

2. If interrupt type is MSI, we would got a window between cancel_work_sync()
and free_irq(), which interrupt would be injected again...

This patch discard the reference count used for irq handler and interrupt_work,
and ensure the legal state by moving the free function at the very beginning of
kvm_destroy_vm(). And the patch fix the second bug by disable irq before
cancel_work_sync(), which may result in nested disable of irq but OK for we are
going to free it.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:36 +02:00
Sheng Yang ad8ba2cd44 KVM: Add kvm_arch_sync_events to sync with asynchronize events
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.

For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:36 +02:00
Avi Kivity 7a0eb1960e KVM: Avoid using CONFIG_ in userspace visible headers
Kconfig symbols are not available in userspace, and are not stripped by
headers-install.  Avoid their use by adding #defines in <asm/kvm.h> to
suit each architecture.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:35 +02:00
Yang Zhang d39123a486 KVM: ia64: fix fp fault/trap handler
The floating-point registers f6-f11 is used by vmm and
saved in kvm-pt-regs, so should set the correct bit mask
and the pointer in fp_state, otherwise, fpswa may touch
vmm's fp registers instead of guests'.

In addition, for fp trap handling,  since the instruction
which leads to fp trap is completely executed, so can't
use retry machanism to re-execute it, because it may
pollute some registers.

Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:35 +02:00
Chris Ball e49590b6dd x86, olpc: fix model detection without OFW
Impact: fix "garbled display, laptop is unusable" bug

Commit e51a1ac2df ("x86, olpc: fix endian
bug in openfirmware workaround") breaks model comparison on OLPC; the value
0xc2 needs to be scaled up by olpc_board().

The pre-patch version was wrong, but accidentally worked anyway
(big-endian 0xc2 is big enough to satisfy all other board revisions,
but little endian 0xc2 is not).

Signed-off-by: Chris Ball <cjb@laptop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andres Salomon <dilinger@queued.net>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-14 23:05:25 +01:00
Madhulika Madishetty 6c71209023 AMCC PPC 460SX redwood SoC platform initial framework
This patch contains initial framework for the AMCC Redwood board.

Signed-off-by: Madhulika Madishetty <mmadishetty@amcc.com>
Signed-off-by: Tirumala Marri <tmarri@amcc.com>
Signed-off-by: Feng Kan <fkan@amcc.com>
Signed-off-by: Vidhyananth Venkatasamy <vvenkatasamy@amcc.com>
Signed-off-by: Preetesh Parekh <pparekh@amcc.com>
Acked-by: Loc Ho <lho@amcc.com>
Acked-by: Feng Kan <fkan@amcc.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-14 14:41:29 -05:00
Benjamin Herrenschmidt 41b6a085e4 powerpc/4xx: Enable PCI domains on 4xx
4xx chips commonly now have multiple PHBs, there is no reason to not
enable PCI domains on them. The main issue with PCI domains is X but
currently its already somewhat busted for other reasons such as the
36-bit physical address space, which I'm fixing separately.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-14 14:40:11 -05:00
Benjamin Herrenschmidt 018f76ec51 powerpc/4xx: Add missing USB and i2c devices to Canyonlands
This adds the device-tree entries for a handful of devices on the
Canyonlands board, such as the EHCI and OHCI controllers, the real
time clock and the AD7414 thermal monitor.

I also updated the defconfig to enable various options related to
these devices.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-14 14:40:08 -05:00
Yuri Tikhonov e12401222f powerpc/44x: Support for 256KB PAGE_SIZE
This patch adds support for 256KB pages on ppc44x-based boards.

For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).

Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).

With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.

Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:

	--- binutils/bfd/elf32-ppc.c.orig
	+++ binutils/bfd/elf32-ppc.c

	-#define ELF_MAXPAGESIZE                0x10000
	+#define ELF_MAXPAGESIZE                0x40000

One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
 http://lkml.org/lkml/2008/12/19/20

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-14 14:40:04 -05:00
Andrew Victor 2b768b6cdb [ARM] 5391/1: AT91: Enable GPIO clocks earlier
Enable the GPIO clocks earlier in the initialization sequence.  This
allow the board-setup code to read and set GPIO pins.

Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-14 16:03:36 +00:00
Andrew Victor 2af29b7861 [ARM] 5390/1: AT91: Watchdog fixes
The recently merged AT91SAM9 watchdog driver uses the
AT91SAM9X_WATCHDOG config variable, whereas the original version of
the driver (and the platform support code) used AT91SAM9_WATCHDOG.
This causes the watchdog platform_device to never be registered, and
therefore the driver not to be initialized.

This patch:
- updates the platform support code to use AT91SAM9X_WATCHDOG.
- includes <linux/io.h> to fix compile error (same fix as was applied
to at91rm9200_wdt.c)
- fixes comment regarding watchdog clock-rates in at91rm9200.

Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-14 16:01:57 +00:00
Russell King abf239657b [ARM] omap: fix _omap2_clksel_get_src_field()
_omap2_clksel_get_src_field() was returning the first entry which was
either the default _or_ applicable to the SoC.  This is wrong - we
should be returning the first default which is applicable to the SoC.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-14 13:25:38 +00:00
Russell King 9132f1b453 [ARM] omap: fix omap2_divisor_to_clksel() error return value
The error checks for omap2_divisor_to_clksel() and comment disagree with
the actual value returned on error.  Fix this to return the correct error
value.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-14 13:24:10 +00:00
john stultz b13e24644c x86, hpet: fix for LS21 + HPET = boot hang
Between 2.6.23 and 2.6.24-rc1 a change was made that broke IBM LS21
systems that had the HPET enabled in the BIOS, resulting in boot hangs
for x86_64.

Specifically commit b8ce335906, which
merges the i386 and x86_64 HPET code.

Prior to this commit, when we setup the HPET timers in x86_64, we did
the following:

	hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
                    HPET_TN_32BIT, HPET_T0_CFG);

However after the i386/x86_64 HPET merge, we do the following:

	cfg = hpet_readl(HPET_Tn_CFG(timer));
	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC |
			HPET_TN_SETVAL | HPET_TN_32BIT;
	hpet_writel(cfg, HPET_Tn_CFG(timer));

However on LS21s with HPET enabled in the BIOS, the HPET_T0_CFG register
boots with Level triggered interrupts (HPET_TN_LEVEL) enabled. This
causes the periodic interrupt to be not so periodic, and that results in
the boot time hang I reported earlier in the delay calibration.

My fix: Always disable HPET_TN_LEVEL when setting up periodic mode.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-13 09:15:46 +01:00
Michael Neuling 26456dcfb8 powerpc/vsx: Fix VSX alignment handler for regs 32-63
Fix the VSX alignment handler for VSX registers > 32.  32-63 are stored
in the VMX part of the thread_struct not the FPR part.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: stable@kernel.org (2.6.27 & .28 please)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:45 +11:00
Geoff Levand 0047656e2a powerpc/ps3: Move ps3_mm_add_memory to device_initcall
Change the PS3 hotplug memory routine ps3_mm_add_memory() from
a core_initcall to a device_initcall.

core_initcall routines run before the powerpc topology_init()
startup routine, which is a subsys_initcall, resulting in
failure of ps3_mm_add_memory() when CONFIG_NUMA=y.  When
ps3_mm_add_memory() fails the system will boot with just the
128 MiB of boot memory

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:45 +11:00
Dave Hansen 06eccea6c3 powerpc/mm: Fix numa reserve bootmem page selection
Fix the powerpc NUMA reserve bootmem page selection logic.

commit 8f64e1f2d1 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions.  As the folowing discussion reports,
the new logic was not correct.

mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area.  It searches for
active regions that intersect with that LMB and are on the
specified node.  It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect.  We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.

We base the size of the bootmem reservation on two possible
things.  Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.

However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions.  Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions.  So, the bootmem reservation must be trimmed to
fit inside the current active region.

That's all fine and dandy, but we trim the reservation
in a page-aligned fashion.  That's bad because we start the
reservation at a non-page-aligned address: physbase.

The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.

Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000.  You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
and then call

  reserve_bootmem_node(node, physbase=0xfff, size=0x1000);

0x1000 may not be on the same node as 0xfff.  Oops.

In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range.  Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.

We also need to do math for the reserved size with physbase
instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node.  However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area.  That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.

From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:45 +11:00
Philippe Gerum fbc78b07ba powerpc/mm: Fix _PAGE_CHG_MASK to protect _PAGE_SPECIAL
Fix _PAGE_CHG_MASK so that pte_modify() does not affect the _PAGE_SPECIAL bit.

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:44 +11:00
Kumar Gala 96a8bac589 powerpc/fsl-booke: Fix compile warning
arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:54:53 -06:00
Kumar Gala 70fe3af840 powerpc/book-3e: Introduce concept of Book-3e MMU
The Power ISA 2.06 spec introduces a standard MMU programming model that
is based on the Freescale Book-E MMU programing model.  The Freescale
version is pretty backwards compatiable with the ISA 2.06 definition so
we are starting to refactor some of the Freescale code so it can be
easily shared.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:51:33 -06:00
Kumar Gala d66c82ea45 powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture.  Its done it such a way to be code compatiable with the
existing HW.  Made the minor code changes to support both power of two
and power of four page sizes.  Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA.  Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.

Note, its still invalid to try and use a page size that isn't supported
by cpu.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:37:11 -06:00
Thomas Gleixner 7ad9de6ac8 x86: CPA avoid repeated lazy mmu flush
Impact: Flush the lazy MMU only once

Pending mmu updates only need to be flushed once to bring the
in-memory pagetable state up to date.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-02-12 23:11:58 +01:00
Thomas Gleixner 34b0900d32 x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context
Impact: Catch cases where lazy MMU state is active in a preemtible context

arch_flush_lazy_mmu_cpu() has been changed to disable preemption so
the checks in enter/leave will never trigger. Put the preemtible()
check into arch_flush_lazy_mmu_cpu() to catch such cases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-02-12 23:11:58 +01:00
Jeremy Fitzhardinge d85cf93da6 x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption
Impact: avoid access to percpu vars in preempible context

They are intended to be used whenever there's the possibility
that there's some stale state which is going to be overwritten
with a queued update, or to force a state change when we may be
in lazy mode.  Either way, we could end up calling it with
preemption enabled, so wrap the functions in their own little
preempt-disable section so they can be safely called in any
context (though preemption should never be enabled if we're actually
in a lazy state).

(Move out of line to avoid #include dependencies.)
    
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-02-12 23:11:58 +01:00
Tobias Klauser 270c5609e2 sh: Storage class should be before const qualifier
The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-02-12 17:26:09 +09:00
Suresh Siddha be03d9e802 x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem
Jeff Mahoney reported:

> With Suse's hwinfo tool, on -tip:
> WARNING: at arch/x86/mm/pat.c:637 reserve_pfn_range+0x5b/0x26d()

reserve_pfn_range() is not tracking the memory range below 1MB
as non-RAM and as such is inconsistent with similar checks in
reserve_memtype() and free_memtype()

Rename the pagerange_is_ram() to pat_pagerange_is_ram() and add the
"track legacy 1MB region as non RAM" condition.

And also, fix reserve_pfn_range() to return -EINVAL, when the pfn
range is RAM. This is to be consistent with this API design.

Reported-and-tested-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-12 08:27:27 +01:00
Jeremy Fitzhardinge 4f06b0436b x86/cpa: make sure cpa is safe to call in lazy mmu mode
Impact: fix race leading to crash under KVM and Xen

The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update.  In this
case, the in-memory pagetable state may be out of date due to pending
queued updates.  We need to flush any pending updates before inspecting
the page table.  Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-12 08:27:26 +01:00
Linus Torvalds 94dba89533 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers: fix TIMER_ABSTIME for process wide cpu timers
  timers: split process wide cpu clocks/timers, fix
  x86: clean up hpet timer reinit
  timers: split process wide cpu clocks/timers, remove spurious warning
  timers: split process wide cpu clocks/timers
  signal: re-add dead task accumulation stats.
  x86: fix hpet timer reinit for x86_64
  sched: fix nohz load balancer on cpu offline
2009-02-11 08:24:32 -08:00
Linus Torvalds 9ce04f9238 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ptrace, x86: fix the usage of ptrace_fork()
  i8327: fix outb() parameter order
  x86: fix math_emu register frame access
  x86: math_emu info cleanup
  x86: include correct %gs in a.out core dump
  x86, vmi: put a missing paravirt_release_pmd in pgd_dtor
  x86: find nr_irqs_gsi with mp_ioapic_routing
  x86: add clflush before monitor for Intel 7400 series
  x86: disable intel_iommu support by default
  x86: don't apply __supported_pte_mask to non-present ptes
  x86: fix grammar in user-visible BIOS warning
  x86/Kconfig.cpu: make Kconfig help readable in the console
  x86, 64-bit: print DMI info in the oops trace
2009-02-11 08:23:22 -08:00
Linus Torvalds b3f2caaaa8 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing, x86: fix constraint for parent variable
  tracing, x86: fix fixup section to return to original code
  profiling: fix broken profiling regression
2009-02-11 08:22:26 -08:00
Linus Torvalds 93431dd7af Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] Update default configuration.
  [S390] dasd: fix race in dasd timer handling
  [S390] dasd: bus_id -> dev_name() conversion.
  [S390] Fix init irq proc build break.
  [S390] vdso: fix per cpu vdso pointer in lowcore
2009-02-11 08:21:29 -08:00
Linus Torvalds da8dbb88db Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW
2009-02-11 08:21:11 -08:00
Markus Metzger 9f339e7028 x86, ptrace, mm: fix double-free on race
Ptrace_detach() races with __ptrace_unlink() if the traced task is
reaped while detaching. This might cause a double-free of the BTS
buffer.

Change the ptrace_detach() path to only do the memory accounting in
ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
which will be called from __ptrace_unlink().

The fix follows a proposal from Oleg Nesterov.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-11 15:44:20 +01:00
Martin Schwidefsky 95ec807e0a [S390] Update default configuration.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-11 10:37:40 +01:00
Sachin Sant 0addff8151 [S390] Fix init irq proc build break.
Embed init_irq_proc(s390) within CONFIG_PROC_FS to fix a build break.

Signed-off-by : Sachin Sant <sachinp@in.ibm.com>
2009-02-11 10:37:39 +01:00
Martin Schwidefsky d5e842c4b7 [S390] vdso: fix per cpu vdso pointer in lowcore
The vdso_per_cpu_data entry in the lowcore structure uses __u32
instead of __u64. If the data page is above 4GB the pointer is
truncated and the kernel crashes.

Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-11 10:37:39 +01:00
Steven Rostedt f47a454db9 tracing, x86: fix constraint for parent variable
The constraint used for retrieving and restoring the parent function
pointer is incorrect. The parent variable is a pointer, and the
address of the pointer is modified by the asm statement and not
the pointer itself. It is incorrect to pass it in as an output
constraint since the asm will never update the pointer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-11 10:06:13 +01:00
David S. Miller 1b0e235cc9 sparc64: Fix crashes in jbusmc_print_dimm()
Return was missing for the case where there is no dimm
info match.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-11 00:54:07 -08:00
Kumar Gala a2404746f1 powerpc/85xx: Added 36-bit physical device tree for mpc8572ds board
Added a device tree that should be identical to mpc8572ds.dtb except
the physical addresses for all IO are above the 4G boundary.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-11 00:18:26 -06:00
Kumar Gala ca34040c40 powerpc/85xx: Fixed PCI IO region sizes in mpc8572ds*.dts
The PCI IO region sizes where incorrectly set to 1M instead of 64k.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-11 00:18:24 -06:00