Commit Graph

93 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt 27f574c223 memblock: Expose MEMBLOCK_ALLOC_ANYWHERE
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05 12:56:06 +10:00
Benjamin Herrenschmidt 28be7072ce memblock/powerpc: Use new accessors
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04 14:39:01 +10:00
Benjamin Herrenschmidt e3239ff92a memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04 14:21:49 +10:00
Benjamin Herrenschmidt 4b8692c022 powerpc/mm: Add some debug output when hash insertion fails
This adds some debug output to our MMU hash code to print out some
useful debug data if the hypervisor refuses the insertion (which
should normally never happen).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2010-07-23 12:56:56 +10:00
Benjamin Herrenschmidt ca91e6c09d powerpc/mm: Move around testing of _PAGE_PRESENT in hash code
Instead of adding _PAGE_PRESENT to the access permission mask
in each low level routine independently, we add it once from
hash_page().

We also move the preliminary access check (the racy one before
the PTE is locked) up so it applies to the huge page case. This
duplicates code in __hash_page_huge() which we'll remove in a
subsequent patch to fix a race in there.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23 08:53:23 +10:00
Yinghai Lu 95f72d1ed4 lmb: rename to memblock
via following scripts

      FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

      sed -i \
        -e 's/lmb/memblock/g' \
        -e 's/LMB/MEMBLOCK/g' \
        $FILES

      for N in $(find . -name lmb.[ch]); do
        M=$(echo $N | sed 's/lmb/memblock/g')
        mv $N $M
      done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/

Suggested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14 17:14:00 +10:00
David Gibson a1128f8f0f powerpc/mm: Fix stupid bug in subpge protection handling
Commit d28513bc7f ("Fix bug in pagetable
cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for
breakage caused by an earlier clean up patch of mine, contains a
stupid bug.  I changed the parameters of the subpage_protection()
function, but failed to update one of the callers.

This patch fixes it, and replaces a void * with a typed pointer so
that the compiler will warn on such an error in future.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:55:44 +11:00
Sachin P. Sant 5c33991987 powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled.
This time without the funny characters.

Fix following build errors generated with DEBUG=1

cc1: warnings being treated as errors
arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes':
arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize':
arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
... SNIP ...

Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:54:27 +11:00
David Gibson d28513bc7f powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.

Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-08 15:59:33 +11:00
Benjamin Herrenschmidt 5a7b4193e5 Revert "powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"
This reverts commit c045256d14.

It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will
commit a fixed version separately

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-02 09:28:35 +11:00
David Gibson c045256d14 powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.

Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-27 14:24:29 +11:00
Alexander Graf 4ab79aa801 Export symbols for KVM module
We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:24 +11:00
David Gibson 0895ecda79 powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal accessors
The hugepage arch code provides a number of hook functions/macros
which mirror the functionality of various normal page pte access
functions.  Various changes in the normal page accessors (in
particular BenH's recent changes to the handling of lazy icache
flushing and PAGE_EXEC) have caused the hugepage versions to get out
of sync with the originals.  In some cases, this is a bug, at least on
some MMU types.

One of the reasons that some hooks were not identical to the normal
page versions, is that the fact we're dealing with a hugepage needed
to be passed down do use the correct dcache-icache flush function.
This patch makes the main flush_dcache_icache_page() function hugepage
aware (by checking for the PageCompound flag).  That in turn means we
can make set_huge_pte_at() just a call to set_pte_at() bringing it
back into sync.  As a bonus, this lets us remove the
hash_huge_page_do_lazy_icache() function, replacing it with a call to
the hash_page_do_lazy_icache() function it was based on.

Some other hugepage pte access hooks - huge_ptep_get_and_clear() and
huge_ptep_clear_flush() - are not so easily unified, but this patch at
least brings them back into sync with the current versions of the
corresponding normal page functions.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:21:23 +11:00
David Gibson d1837cba5d powerpc/mm: Cleanup initialization of hugepages on powerpc
This patch simplifies the logic used to initialize hugepages on
powerpc.  The somewhat oddly named set_huge_psize() is renamed to
add_huge_page_size() and now does all necessary verification of
whether it's given a valid hugepage sizes (instead of just some) and
instantiates the generic hstate structure (but no more).

hugetlbpage_init() now steps through the available pagesizes, checks
if they're valid for hugepages by calling add_huge_page_size() and
initializes the kmem_caches for the hugepage pagetables.  This means
we can now eliminate the mmu_huge_psizes array, since we no longer
need to pass the sizing information for the pagetable caches from
set_huge_psize() into hugetlbpage_init()

Determination of the default huge page size is also moved from the
hash code into the general hugepage code.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:58 +11:00
David Gibson a4fe3ce769 powerpc/mm: Allow more flexible layouts for hugepage pagetables
Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level.  Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly.  Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.

This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity.  In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask.  This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.

This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out.  All other (hugepage)
pagetable walking paths can just interpret the structure as they go.

There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug.  We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping).  This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.

While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:58 +11:00
Michael Ellerman 24f1ce803c powerpc: Fix crash on CPU hotplug
early_init_mmu_secondary() is called at CPU hotplug time, so it
must be marked as __cpuinit, not __init.

Caused by 757c74d2 ("powerpc/mm: Introduce early_init_mmu() on 64-bit").

Tested-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-22 14:56:34 +10:00
Benjamin Herrenschmidt 757c74d298 powerpc/mm: Introduce early_init_mmu() on 64-bit
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Rusty Russell 56aa4129e8 cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask
Makes code futureproof against the impending change to mm->cpu_vm_mask.

It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:29 +11:00
Anton Blanchard 13870b6575 powerpc/mm: Reduce hashtable size when using 64kB pages
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.

Grab the real page size and size the hashtable based on that

Note: This only has effect on non hypervisor machines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:58 +11:00
Jon Tollefson 4792adbac9 powerpc: Don't use a 16G page if beyond mem= limits
If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available.

Thanks to Michael Ellerman for finding the problem.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-22 15:01:21 +11:00
David Gibson f5ea64dcba powerpc: Get USE_STRICT_MM_TYPECHECKS working again
The typesafe version of the powerpc pagetable handling (with
USE_STRICT_MM_TYPECHECKS defined) has bitrotted again.  This patch
makes a bunch of small fixes to get it back to building status.

It's still not enabled by default as gcc still generates worse
code with it for some reason.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-14 10:35:27 +11:00
Paul Mackerras 549e8152de powerpc: Make the 64-bit kernel as a position-independent executable
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
a position-independent executable (PIE) when it is set.  This involves
processing the dynamic relocations in the image in the early stages of
booting, even if the kernel is being run at the address it is linked at,
since the linker does not necessarily fill in words in the image for
which there are dynamic relocations.  (In fact the linker does fill in
such words for 64-bit executables, though not for 32-bit executables,
so in principle we could avoid calling relocate() entirely when we're
running a 64-bit kernel at the linked address.)

The dynamic relocations are processed by a new function relocate(addr),
where the addr parameter is the virtual address where the image will be
run.  In fact we call it twice; once before calling prom_init, and again
when starting the main kernel.  This means that reloc_offset() returns
0 in prom_init (since it has been relocated to the address it is running
at), which necessitated a few adjustments.

This also changes __va and __pa to use an equivalent definition that is
simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
constants (for 64-bit) whereas PHYSICAL_START is a variable (and
KERNELBASE ideally should be too, but isn't yet).

With this, relocatable kernels still copy themselves down to physical
address 0 and run there.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:38 -07:00
Paul Mackerras 7e392f8c29 Merge branch 'linux-2.6' 2008-09-10 11:36:13 +10:00
Paul Mackerras 9e88ba4e45 powerpc: Only make kernel text pages of linear mapping executable
Commit bc033b63bb ("powerpc/mm: Fix
attribute confusion with htab_bolt_mapping()") moved the check for
whether we should make pages of the linear mapping executable from
htab_bolt_mapping into its callers, including htab_initialize.
A side-effect of this is that the decision is now made once for
each contiguous section in the LMB array rather than for each page
individually.  This can often mean that the whole of the linear
mapping ends up being executable.

This reverts to the previous behaviour, where individual pages are
checked for being part of the kernel text or not, by moving the check
back down into htab_bolt_mapping.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-03 20:53:22 +10:00
Tony Breeds e16a9c0990 powerpc: Guard htab_dt_scan_hugepage_blocks appropriately
htab_dt_scan_hugepage_blocks is only used when CONFIG_HUGETLB_PAGE is
defined, so guard the declaration likewise.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-20 16:34:57 +10:00
Benjamin Herrenschmidt bc033b63bb powerpc/mm: Fix attribute confusion with htab_bolt_mapping()
The function htab_bolt_mapping() is used to create permanent
mappings in the MMU hash table, for example, in order to create
the linear mapping of vmemmap.  It's also used by early boot
ioremap (before mem_init_done).

However, the way ioremap uses it is incorrect as it passes it the
protection flags in the "linux PTE" form while htab_bolt_mapping()
expects them in the hash table format.  This is made more confusing by
the fact that some of those flags are actually in the same position in
both cases.

This fixes it all by making htab_bolt_mapping() take normal linux
protection flags instead, and use a little helper to convert them to
htab flags. Callers can now use the usual PAGE_* definitions safely.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

 arch/powerpc/include/asm/mmu-hash64.h |    2 -
 arch/powerpc/mm/hash_utils_64.c       |   65 ++++++++++++++++++++--------------
 arch/powerpc/mm/init_64.c             |    9 +---
 3 files changed, 44 insertions(+), 32 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-11 10:09:56 +10:00
Jon Tollefson 0d9ea75443 powerpc: support multiple hugepage sizes
Instead of using the variable mmu_huge_psize to keep track of the huge
page size we use an array of MMU_PAGE_* values.  For each supported huge
page size we need to know the hugepte_shift value and have a
pgtable_cache.  The hstate or an mmu_huge_psizes index is passed to
functions so that they know which huge page size they should use.

The hugepage sizes 16M and 64K are setup(if available on the hardware) so
that they don't have to be set on the boot cmd line in order to use them.
The number of 16G pages have to be specified at boot-time though (e.g.
hugepagesz=16G hugepages=5).

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson 658013e93e powerpc: scan device tree for gigantic pages
The 16G huge pages have to be reserved in the HMC prior to boot.  The
location of the pages are placed in the device tree.  This patch adds code
to scan the device tree during very early boot and save these page
locations until hugetlbfs is ready for them.

Acked-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Paul Mackerras 3a8247cc2c powerpc: Only demote individual slices rather than whole process
At present, if we have a kernel with a 64kB page size, and some
process maps something that has to be mapped with 4kB pages (such as a
cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair
pages), we change the process to use 4kB pages everywhere.  This hurts
the performance of HPC programs that access eHCA from userspace.

With this patch, the kernel will only demote the slice(s) containing
the eHCA or cache-inhibited mappings, leaving the remaining slices
able to use 64kB hardware pages.

This also changes the slice_get_unmapped_area code so that it is
willing to place a 64k-page mapping into (or across) a 4k-page slice
if there is no better alternative, i.e. if the program specified
MAP_FIXED or if there is not sufficient space available in slices that
are either empty or already have 64k-page mappings in them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-01 11:27:57 +10:00
Paul Mackerras fcff474ea5 Merge branch 'linux-2.6' into powerpc-next 2008-05-16 23:13:42 +10:00
Benjamin Herrenschmidt cec08e7a94 [POWERPC] vmemmap fixes to use smaller pages
This changes vmemmap to use a different region (region 0xf) of the
address space, and to configure the page size of that region
dynamically at boot.

The problem with the current approach of always using 16M pages is that
it's not well suited to machines that have small amounts of memory such
as small partitions on pseries, or PS3's.

In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
tends to prevent hotplugging the HV's "additional" memory, thus limiting
the available memory even more, from my experience down to something
like 80M total, which makes it really not very useable.

The logic used by my match to choose the vmemmap page size is:

 - If 16M pages are available and there's 1G or more RAM at boot,
   use that size.
 - Else if 64K pages are available, use that
 - Else use 4K pages

I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
and it seems to work fine.

Note that I intend to change the way we organize the kernel regions &
SLBs so the actual region will change from 0xf back to something else at
one point, as I simplify the SLB miss handler, but that will be for a
later patch.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-15 20:49:25 +10:00
Michael Ellerman 572fb578de [POWERPC] Move declaration of tce variables into mmu-hash64.h
... instead of having extern declarations in a .c file.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:47 +10:00
Michael Ellerman 09de9ff872 [POWERPC] Fix sparse warnings in arch/powerpc/mm
Make two vmemmap helpers static in init_64.c
Make stab variables static in stab.c
Make psize defs static in hash_utils_64.c

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:46 +10:00
Stephen Rothwell ae86f0088d [POWERPC] htab_remove_mapping is only used by MEMORY_HOTPLUG
This eliminates a warning in builds that don't define
CONFIG_MEMORY_HOTPLUG.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-07 13:49:25 +10:00
Badari Pulavarty 52db9b4426 [POWERPC] Add error return from htab_remove_mapping()
If the platform doesn't support hpte_removebolted(), gracefully
return failure rather than success.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-01 20:43:08 +11:00
Paul Mackerras 54f53f2b94 Merge branch 'linux-2.6' 2008-03-26 08:44:18 +11:00
Paul Mackerras cfe666b145 [POWERPC] Don't use 64k pages for ioremap on pSeries
On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
adapter using 64k pages, and thus the ehea driver will fail if 64k
pages are configured.  This works around the problem by always
using 4k pages for ioremap on pSeries (but not on other platforms).
A better fix would be to check whether the partition could ever
have an eHEA adapter, and only force 4k pages if it could, but this
will do for 2.6.25.

This is based on an earlier patch by Tony Breeds.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-03-24 17:41:22 +11:00
Paul Mackerras bed04a4413 Merge branch 'linux-2.6' 2008-03-13 15:26:33 +11:00
Michael Ellerman 31bf111944 [POWERPC] Fix large hash table allocation on Cell blades
My recent hack to allocate the hash table under 1GB on cell was poorly
tested, *cough*. It turns out on blades with large amounts of memory we
fail to allocate the hash table at all. This is because RTAS has been
instantiated just below 768MB, and 0-x MB are used by the kernel,
leaving no areas that are both large enough and also naturally-aligned.

For the cell IOMMU hack the page tables must be under 2GB, so use that
as the limit instead. This has been tested on real hardware and boots
happily.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-03-13 10:10:26 +11:00
Badari Pulavarty f8c8803bda [POWERPC] Add code for removing HPTEs for parts of the linear mapping
For memory remove, we need to clean up htab mappings for the
section of the memory we are removing.

This implements support for removing htab bolted mappings for pSeries
logical partitions.  Other sub-archs may need to implement similar
functionality for hotplug memory remove to work on them.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-26 22:17:03 +11:00
David S. Miller d9b2b2a277 [LIB]: Make PowerPC LMB code generic so sparc64 can use it too.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 16:56:49 -08:00
Michael Ellerman 41d824bf61 [POWERPC] Allocate the hash table under 1G on cell
In order to support the fixed IOMMU mapping (in a subsequent patch),
we need the hash table to be inside the IOMMUs DMA window.  This is
usually 2G, but let's make sure the hash table is under 1G as that
will satisfy the IOMMU requirements and also means the hash table will
be on node 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-31 12:11:09 +11:00
Paul Mackerras fa28237cfc [POWERPC] Provide a way to protect 4k subpages when using 64k pages
Using 64k pages on 64-bit PowerPC systems makes life difficult for
emulators that are trying to emulate an ISA, such as x86, which use a
smaller page size, since the emulator can no longer use the MMU and
the normal system calls for controlling page protections.  Of course,
the emulator can emulate the MMU by checking and possibly remapping
the address for each memory access in software, but that is pretty
slow.

This provides a facility for such programs to control the access
permissions on individual 4k sub-pages of 64k pages.  The idea is
that the emulator supplies an array of protection masks to apply to a
specified range of virtual addresses.  These masks are applied at the
level where hardware PTEs are inserted into the hardware page table
based on the Linux PTEs, so the Linux PTEs are not affected.  Note
that this new mechanism does not allow any access that would otherwise
be prohibited; it can only prohibit accesses that would otherwise be
allowed.  This new facility is only available on 64-bit PowerPC and
only when the kernel is configured for 64k pages.

The masks are supplied using a new subpage_prot system call, which
takes a starting virtual address and length, and a pointer to an array
of protection masks in memory.  The array has a 32-bit word per 64k
page to be protected; each 32-bit word consists of 16 2-bit fields,
for which 0 allows any access (that is otherwise allowed), 1 prevents
write accesses, and 2 or 3 prevent any access.

Implicit in this is that the regions of the address space that are
protected are switched to use 4k hardware pages rather than 64k
hardware pages (on machines with hardware 64k page support).  In fact
the whole process is switched to use 4k hardware pages when the
subpage_prot system call is used, but this could be improved in future
to switch only the affected segments.

The subpage protection bits are stored in a 3 level tree akin to the
page table tree.  The top level of this tree is stored in a structure
that is appended to the top level of the page table tree, i.e., the
pgd array.  Since it will often only be 32-bit addresses (below 4GB)
that are protected, the pointers to the first four bottom level pages
are also stored in this structure (each bottom level page contains the
protection bits for 1GB of address space), so the protection bits for
addresses below 4GB can be accessed with one fewer loads than those
for higher addresses.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-24 10:06:01 +11:00
Jon Tollefson 4ec161cf73 [POWERPC] Add hugepagesz boot-time parameter
This adds the hugepagesz boot-time parameter for ppc64.  It lets one
pick the size for huge pages.  The choices available are 64K and 16M
when the base page size is 4k.  It defaults to 16M (previously the
only only choice) if nothing or an invalid choice is specified.

Tested 64K huge pages successfully with the libhugetlbfs 1.2.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-17 14:57:36 +11:00
Michael Neuling 584f8b71a2 [POWERPC] Use SLB size from the device tree
Currently we hardwire the number of SLBs to 64, but PAPR says we
should use the ibm,slb-size property to obtain the number of SLB
entries.  This uses this property instead of assuming 64.  If no
property is found, we assume 64 entries as before.

This soft patches the SLB handler, so it shouldn't change performance
at all.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-11 13:45:56 +11:00
will schmidt aa39be09df [POWERPC] Include udbg.h when using udbg_printf
This fixes the error
	error: implicit declaration of function "udbg_printf"

We have a few spots where we reference udbg_printf() without #including
udbg.h.  These are within #ifdef DEBUG blocks, so unnoticed until we do
a #define DEBUG or #define DEBUG_LOW nearby.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
Benjamin Herrenschmidt f6ab0b922c [POWERPC] powerpc: Fix demotion of segments to 4K pages
When demoting a process to use 4K HW pages (instead of 64K), which
happens under various circumstances such as doing cache inhibited
mappings on machines that do not support 64K CI pages, the assembly
hash code calls back into the C function flush_hash_page().  This
function prototype was recently changed to accomodate for 1T segments
but the assembly call site was not updated, causing applications that
do demotion to hang.  In addition, when updating the per-CPU PACA for
the new sizes, we didn't properly update the slice "map", thus causing
the SLB miss code to re-insert segments for the wrong size.

This fixes both and adds a warning comment next to the C
implementation to try to avoid problems next time someone changes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-29 14:34:14 +11:00
Olof Johansson f66bce5e6a [POWERPC] Add 1TB workaround for PA6T
PA6T has a bug where the slbie instruction does not honor the large
segment bit.  As a result, we have to always use slbia when switching
context.

We don't have to worry about changing the slbie's during fault processing,
since they should never be replacing one VSID with another using the
same ESID.  I.e. there's no risk for inserting duplicate entries due to a
failed slbie of the old entry.  So as long as we clear it out on context
switch we should be fine.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:09 +10:00
Olof Johansson f5534004e5 [POWERPC] Fix 1TB segment detection
Buglet in the 1TB detection makes it return after checking the first
property word, even if it's not a match.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:08 +10:00
Paul Mackerras 1189be6508 [POWERPC] Use 1TB segments
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).

We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.

We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments.  That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.

Parts of this patch were originally written by Ben Herrenschmidt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-12 14:05:17 +10:00