OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David Gibson	d1837cba5d	powerpc/mm: Cleanup initialization of hugepages on powerpc This patch simplifies the logic used to initialize hugepages on powerpc. The somewhat oddly named set_huge_psize() is renamed to add_huge_page_size() and now does all necessary verification of whether it's given a valid hugepage sizes (instead of just some) and instantiates the generic hstate structure (but no more). hugetlbpage_init() now steps through the available pagesizes, checks if they're valid for hugepages by calling add_huge_page_size() and initializes the kmem_caches for the hugepage pagetables. This means we can now eliminate the mmu_huge_psizes array, since we no longer need to pass the sizing information for the pagetable caches from set_huge_psize() into hugetlbpage_init() Determination of the default huge page size is also moved from the hash code into the general hugepage code. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:58 +11:00
David Gibson	a4fe3ce769	powerpc/mm: Allow more flexible layouts for hugepage pagetables Currently each available hugepage size uses a slightly different pagetable layout: that is, the bottem level table of pointers to hugepages is a different size, and may branch off from the normal page tables at a different level. Every hugepage aware path that needs to walk the pagetables must therefore look up the hugepage size from the slice info first, and work out the correct way to walk the pagetables accordingly. Future hardware is likely to add more possible hugepage sizes, more layout options and more mess. This patch, therefore reworks the handling of hugepage pagetables to reduce this complexity. In the new scheme, instead of having to consult the slice mask, pagetable walking code can check a flag in the PGD/PUD/PMD entries to see where to branch off to hugepage pagetables, and the entry also contains the information (eseentially hugepage shift) necessary to then interpret that table without recourse to the slice mask. This scheme can be extended neatly to handle multiple levels of self-describing "special" hugepage pagetables, although for now we assume only one level exists. This approach means that only the pagetable allocation path needs to know how the pagetables should be set out. All other (hugepage) pagetable walking paths can just interpret the structure as they go. There already was a flag bit in PGD/PUD/PMD entries for hugepage directory pointers, but it was only used for debug. We alter that flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable pointer (normally it would be 1 since the pointer lies in the linear mapping). This means that asm pagetable walking can test for (and punt on) hugepage pointers with the same test that checks for unpopulated page directory entries (beq becomes bge), since hugepage pointers will always be positive, and normal pointers always negative. While we're at it, we get rid of the confusing (and grep defeating) #defining of hugepte_shift to be the same thing as mmu_huge_psizes. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:58 +11:00
Michael Ellerman	24f1ce803c	powerpc: Fix crash on CPU hotplug early_init_mmu_secondary() is called at CPU hotplug time, so it must be marked as __cpuinit, not __init. Caused by `757c74d2` ("powerpc/mm: Introduce early_init_mmu() on 64-bit"). Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-22 14:56:34 +10:00
Benjamin Herrenschmidt	757c74d298	powerpc/mm: Introduce early_init_mmu() on 64-bit This moves some MMU related init code out of setup_64.c into hash_utils_64.c and calls it early_init_mmu() and early_init_mmu_secondary(). This will make it easier to plug in a new MMU type. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Rusty Russell	56aa4129e8	cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask Makes code futureproof against the impending change to mm->cpu_vm_mask. It's also a chance to use the new cpumask_ ops which take a pointer (the older ones are deprecated, but there's no hurry for arch code). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:29 +11:00
Anton Blanchard	13870b6575	powerpc/mm: Reduce hashtable size when using 64kB pages At the moment we size the hashtable based on 4kB pages / 2, even on a 64kB kernel. This results in a hashtable that is much larger than it needs to be. Grab the real page size and size the hashtable based on that Note: This only has effect on non hypervisor machines. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:58 +11:00
Jon Tollefson	4792adbac9	powerpc: Don't use a 16G page if beyond mem= limits If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available. Thanks to Michael Ellerman for finding the problem. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-22 15:01:21 +11:00
David Gibson	f5ea64dcba	powerpc: Get USE_STRICT_MM_TYPECHECKS working again The typesafe version of the powerpc pagetable handling (with USE_STRICT_MM_TYPECHECKS defined) has bitrotted again. This patch makes a bunch of small fixes to get it back to building status. It's still not enabled by default as gcc still generates worse code with it for some reason. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-14 10:35:27 +11:00
Paul Mackerras	549e8152de	powerpc: Make the 64-bit kernel as a position-independent executable This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as a position-independent executable (PIE) when it is set. This involves processing the dynamic relocations in the image in the early stages of booting, even if the kernel is being run at the address it is linked at, since the linker does not necessarily fill in words in the image for which there are dynamic relocations. (In fact the linker does fill in such words for 64-bit executables, though not for 32-bit executables, so in principle we could avoid calling relocate() entirely when we're running a 64-bit kernel at the linked address.) The dynamic relocations are processed by a new function relocate(addr), where the addr parameter is the virtual address where the image will be run. In fact we call it twice; once before calling prom_init, and again when starting the main kernel. This means that reloc_offset() returns 0 in prom_init (since it has been relocated to the address it is running at), which necessitated a few adjustments. This also changes __va and __pa to use an equivalent definition that is simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are constants (for 64-bit) whereas PHYSICAL_START is a variable (and KERNELBASE ideally should be too, but isn't yet). With this, relocatable kernels still copy themselves down to physical address 0 and run there. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:08:38 -07:00
Paul Mackerras	7e392f8c29	Merge branch 'linux-2.6'	2008-09-10 11:36:13 +10:00
Paul Mackerras	9e88ba4e45	powerpc: Only make kernel text pages of linear mapping executable Commit `bc033b63bb` ("powerpc/mm: Fix attribute confusion with htab_bolt_mapping()") moved the check for whether we should make pages of the linear mapping executable from htab_bolt_mapping into its callers, including htab_initialize. A side-effect of this is that the decision is now made once for each contiguous section in the LMB array rather than for each page individually. This can often mean that the whole of the linear mapping ends up being executable. This reverts to the previous behaviour, where individual pages are checked for being part of the kernel text or not, by moving the check back down into htab_bolt_mapping. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-03 20:53:22 +10:00
Tony Breeds	e16a9c0990	powerpc: Guard htab_dt_scan_hugepage_blocks appropriately htab_dt_scan_hugepage_blocks is only used when CONFIG_HUGETLB_PAGE is defined, so guard the declaration likewise. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-20 16:34:57 +10:00
Benjamin Herrenschmidt	bc033b63bb	powerpc/mm: Fix attribute confusion with htab_bolt_mapping() The function htab_bolt_mapping() is used to create permanent mappings in the MMU hash table, for example, in order to create the linear mapping of vmemmap. It's also used by early boot ioremap (before mem_init_done). However, the way ioremap uses it is incorrect as it passes it the protection flags in the "linux PTE" form while htab_bolt_mapping() expects them in the hash table format. This is made more confusing by the fact that some of those flags are actually in the same position in both cases. This fixes it all by making htab_bolt_mapping() take normal linux protection flags instead, and use a little helper to convert them to htab flags. Callers can now use the usual PAGE_* definitions safely. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/include/asm/mmu-hash64.h \| 2 - arch/powerpc/mm/hash_utils_64.c \| 65 ++++++++++++++++++++-------------- arch/powerpc/mm/init_64.c \| 9 +--- 3 files changed, 44 insertions(+), 32 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-11 10:09:56 +10:00
Jon Tollefson	0d9ea75443	powerpc: support multiple hugepage sizes Instead of using the variable mmu_huge_psize to keep track of the huge page size we use an array of MMU_PAGE_* values. For each supported huge page size we need to know the hugepte_shift value and have a pgtable_cache. The hstate or an mmu_huge_psizes index is passed to functions so that they know which huge page size they should use. The hugepage sizes 16M and 64K are setup(if available on the hardware) so that they don't have to be set on the boot cmd line in order to use them. The number of 16G pages have to be specified at boot-time though (e.g. hugepagesz=16G hugepages=5). Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	658013e93e	powerpc: scan device tree for gigantic pages The 16G huge pages have to be reserved in the HMC prior to boot. The location of the pages are placed in the device tree. This patch adds code to scan the device tree during very early boot and save these page locations until hugetlbfs is ready for them. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Paul Mackerras	3a8247cc2c	powerpc: Only demote individual slices rather than whole process At present, if we have a kernel with a 64kB page size, and some process maps something that has to be mapped with 4kB pages (such as a cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair pages), we change the process to use 4kB pages everywhere. This hurts the performance of HPC programs that access eHCA from userspace. With this patch, the kernel will only demote the slice(s) containing the eHCA or cache-inhibited mappings, leaving the remaining slices able to use 64kB hardware pages. This also changes the slice_get_unmapped_area code so that it is willing to place a 64k-page mapping into (or across) a 4k-page slice if there is no better alternative, i.e. if the program specified MAP_FIXED or if there is not sufficient space available in slices that are either empty or already have 64k-page mappings in them. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-01 11:27:57 +10:00
Paul Mackerras	fcff474ea5	Merge branch 'linux-2.6' into powerpc-next	2008-05-16 23:13:42 +10:00
Benjamin Herrenschmidt	cec08e7a94	[POWERPC] vmemmap fixes to use smaller pages This changes vmemmap to use a different region (region 0xf) of the address space, and to configure the page size of that region dynamically at boot. The problem with the current approach of always using 16M pages is that it's not well suited to machines that have small amounts of memory such as small partitions on pseries, or PS3's. In fact, on the PS3, failure to allocate the 16M page backing vmmemmap tends to prevent hotplugging the HV's "additional" memory, thus limiting the available memory even more, from my experience down to something like 80M total, which makes it really not very useable. The logic used by my match to choose the vmemmap page size is: - If 16M pages are available and there's 1G or more RAM at boot, use that size. - Else if 64K pages are available, use that - Else use 4K pages I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages) and it seems to work fine. Note that I intend to change the way we organize the kernel regions & SLBs so the actual region will change from 0xf back to something else at one point, as I simplify the SLB miss handler, but that will be for a later patch. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-15 20:49:25 +10:00
Michael Ellerman	572fb578de	[POWERPC] Move declaration of tce variables into mmu-hash64.h ... instead of having extern declarations in a .c file. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:47 +10:00
Michael Ellerman	09de9ff872	[POWERPC] Fix sparse warnings in arch/powerpc/mm Make two vmemmap helpers static in init_64.c Make stab variables static in stab.c Make psize defs static in hash_utils_64.c Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:46 +10:00
Stephen Rothwell	ae86f0088d	[POWERPC] htab_remove_mapping is only used by MEMORY_HOTPLUG This eliminates a warning in builds that don't define CONFIG_MEMORY_HOTPLUG. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-07 13:49:25 +10:00
Badari Pulavarty	52db9b4426	[POWERPC] Add error return from htab_remove_mapping() If the platform doesn't support hpte_removebolted(), gracefully return failure rather than success. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-01 20:43:08 +11:00
Paul Mackerras	54f53f2b94	Merge branch 'linux-2.6'	2008-03-26 08:44:18 +11:00
Paul Mackerras	cfe666b145	[POWERPC] Don't use 64k pages for ioremap on pSeries On pSeries, the hypervisor doesn't let us map in the eHEA ethernet adapter using 64k pages, and thus the ehea driver will fail if 64k pages are configured. This works around the problem by always using 4k pages for ioremap on pSeries (but not on other platforms). A better fix would be to check whether the partition could ever have an eHEA adapter, and only force 4k pages if it could, but this will do for 2.6.25. This is based on an earlier patch by Tony Breeds. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-03-24 17:41:22 +11:00
Paul Mackerras	bed04a4413	Merge branch 'linux-2.6'	2008-03-13 15:26:33 +11:00
Michael Ellerman	31bf111944	[POWERPC] Fix large hash table allocation on Cell blades My recent hack to allocate the hash table under 1GB on cell was poorly tested, cough. It turns out on blades with large amounts of memory we fail to allocate the hash table at all. This is because RTAS has been instantiated just below 768MB, and 0-x MB are used by the kernel, leaving no areas that are both large enough and also naturally-aligned. For the cell IOMMU hack the page tables must be under 2GB, so use that as the limit instead. This has been tested on real hardware and boots happily. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-03-13 10:10:26 +11:00
Badari Pulavarty	f8c8803bda	[POWERPC] Add code for removing HPTEs for parts of the linear mapping For memory remove, we need to clean up htab mappings for the section of the memory we are removing. This implements support for removing htab bolted mappings for pSeries logical partitions. Other sub-archs may need to implement similar functionality for hotplug memory remove to work on them. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-02-26 22:17:03 +11:00
David S. Miller	d9b2b2a277	[LIB]: Make PowerPC LMB code generic so sparc64 can use it too. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-13 16:56:49 -08:00
Michael Ellerman	41d824bf61	[POWERPC] Allocate the hash table under 1G on cell In order to support the fixed IOMMU mapping (in a subsequent patch), we need the hash table to be inside the IOMMUs DMA window. This is usually 2G, but let's make sure the hash table is under 1G as that will satisfy the IOMMU requirements and also means the hash table will be on node 0. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-31 12:11:09 +11:00
Paul Mackerras	fa28237cfc	[POWERPC] Provide a way to protect 4k subpages when using 64k pages Using 64k pages on 64-bit PowerPC systems makes life difficult for emulators that are trying to emulate an ISA, such as x86, which use a smaller page size, since the emulator can no longer use the MMU and the normal system calls for controlling page protections. Of course, the emulator can emulate the MMU by checking and possibly remapping the address for each memory access in software, but that is pretty slow. This provides a facility for such programs to control the access permissions on individual 4k sub-pages of 64k pages. The idea is that the emulator supplies an array of protection masks to apply to a specified range of virtual addresses. These masks are applied at the level where hardware PTEs are inserted into the hardware page table based on the Linux PTEs, so the Linux PTEs are not affected. Note that this new mechanism does not allow any access that would otherwise be prohibited; it can only prohibit accesses that would otherwise be allowed. This new facility is only available on 64-bit PowerPC and only when the kernel is configured for 64k pages. The masks are supplied using a new subpage_prot system call, which takes a starting virtual address and length, and a pointer to an array of protection masks in memory. The array has a 32-bit word per 64k page to be protected; each 32-bit word consists of 16 2-bit fields, for which 0 allows any access (that is otherwise allowed), 1 prevents write accesses, and 2 or 3 prevent any access. Implicit in this is that the regions of the address space that are protected are switched to use 4k hardware pages rather than 64k hardware pages (on machines with hardware 64k page support). In fact the whole process is switched to use 4k hardware pages when the subpage_prot system call is used, but this could be improved in future to switch only the affected segments. The subpage protection bits are stored in a 3 level tree akin to the page table tree. The top level of this tree is stored in a structure that is appended to the top level of the page table tree, i.e., the pgd array. Since it will often only be 32-bit addresses (below 4GB) that are protected, the pointers to the first four bottom level pages are also stored in this structure (each bottom level page contains the protection bits for 1GB of address space), so the protection bits for addresses below 4GB can be accessed with one fewer loads than those for higher addresses. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-24 10:06:01 +11:00
Jon Tollefson	4ec161cf73	[POWERPC] Add hugepagesz boot-time parameter This adds the hugepagesz boot-time parameter for ppc64. It lets one pick the size for huge pages. The choices available are 64K and 16M when the base page size is 4k. It defaults to 16M (previously the only only choice) if nothing or an invalid choice is specified. Tested 64K huge pages successfully with the libhugetlbfs 1.2. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-17 14:57:36 +11:00
Michael Neuling	584f8b71a2	[POWERPC] Use SLB size from the device tree Currently we hardwire the number of SLBs to 64, but PAPR says we should use the ibm,slb-size property to obtain the number of SLB entries. This uses this property instead of assuming 64. If no property is found, we assume 64 entries as before. This soft patches the SLB handler, so it shouldn't change performance at all. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-12-11 13:45:56 +11:00
will schmidt	aa39be09df	[POWERPC] Include udbg.h when using udbg_printf This fixes the error error: implicit declaration of function "udbg_printf" We have a few spots where we reference udbg_printf() without #including udbg.h. These are within #ifdef DEBUG blocks, so unnoticed until we do a #define DEBUG or #define DEBUG_LOW nearby. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-11-08 14:15:31 +11:00
Benjamin Herrenschmidt	f6ab0b922c	[POWERPC] powerpc: Fix demotion of segments to 4K pages When demoting a process to use 4K HW pages (instead of 64K), which happens under various circumstances such as doing cache inhibited mappings on machines that do not support 64K CI pages, the assembly hash code calls back into the C function flush_hash_page(). This function prototype was recently changed to accomodate for 1T segments but the assembly call site was not updated, causing applications that do demotion to hang. In addition, when updating the per-CPU PACA for the new sizes, we didn't properly update the slice "map", thus causing the SLB miss code to re-insert segments for the wrong size. This fixes both and adds a warning comment next to the C implementation to try to avoid problems next time someone changes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-29 14:34:14 +11:00
Olof Johansson	f66bce5e6a	[POWERPC] Add 1TB workaround for PA6T PA6T has a bug where the slbie instruction does not honor the large segment bit. As a result, we have to always use slbia when switching context. We don't have to worry about changing the slbie's during fault processing, since they should never be replacing one VSID with another using the same ESID. I.e. there's no risk for inserting duplicate entries due to a failed slbie of the old entry. So as long as we clear it out on context switch we should be fine. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-17 22:30:09 +10:00
Olof Johansson	f5534004e5	[POWERPC] Fix 1TB segment detection Buglet in the 1TB detection makes it return after checking the first property word, even if it's not a match. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-17 22:30:08 +10:00
Paul Mackerras	1189be6508	[POWERPC] Use 1TB segments This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+, POWER6 and PA6T). We detect that the machine supports 1TB segments by looking at the ibm,processor-segment-sizes property in the device tree. We don't currently use 1TB segments for user addresses < 1T, since that would effectively prevent 32-bit processes from using huge pages unless we also had a way to revert to using 256MB segments. That would be possible but would involve extra complications (such as keeping track of which segment size was used when HPTEs were inserted) and is not addressed here. Parts of this patch were originally written by Ben Herrenschmidt. Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-12 14:05:17 +10:00
Stephen Rothwell	e8ff0646e5	[POWERPC] Tidy up CONFIG_PPC_MM_SLICES code This removes some of the #ifdefs from .c files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-08-17 11:01:59 +10:00
Jesper Juhl	9420dc65ff	[POWERPC] Clean out a bunch of duplicate includes This removes several duplicate includes from arch/powerpc/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-08-17 11:01:51 +10:00
Ilpo JÃ¤rvinen	2b02d13996	[POWERPC] Fix invalid semicolon after if statement A similar fix to netfilter from Eric Dumazet inspired me to look around a bit by using some grep/sed stuff as looking for this kind of bugs seemed easy to automate. This is one of them I found where it looks like this semicolon is not valid. Signed-off-by: Ilpo JÃ¤rvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-08-17 10:48:52 +10:00
Michael Neuling	67439b76f2	[POWERPC] Fixes for the SLB shadow buffer code On a machine with hardware 64kB pages and a kernel configured for a 64kB base page size, we need to change the vmalloc segment from 64kB pages to 4kB pages if some driver creates a non-cacheable mapping in the vmalloc area. However, we never updated with SLB shadow buffer. This fixes it. Thanks to paulus for finding this. Also added some write barriers to ensure the shadow buffer contents are always consistent. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-08-03 19:36:01 +10:00
Geert Uytterhoeven	1e57ba8ddd	[POWERPC] cell: CONFIG_SPE_BASE is a typo The config symbol for SPE support is called CONFIG_SPU_BASE, not CONFIG_SPE_BASE. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-07-22 21:30:57 +10:00
David Gibson	8e561e7eda	[POWERPC] Kill typedef-ed structs for hash PTEs and BATs Using typedefs to rename structure types if frowned on by CodingStyle. However, we do so for the hash PTE structure on both ppc32 (where it's called "PTE") and ppc64 (where it's called "hpte_t"). On ppc32 we also have such a typedef for the BATs ("BAT"). This removes this unhelpful use of typedefs, in the process bringing ppc32 and ppc64 closer together, by using the name "struct hash_pte" in both cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-06-14 22:30:16 +10:00
Jon Tollefson	5b82583185	[POWERPC] Correct #endif comment Fix up comment on two #endifs to match their #ifs. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> ---- hash_utils_64.c \| 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-17 21:11:19 +10:00
Benjamin Herrenschmidt	16c2d47623	[POWERPC] Add ability to 4K kernel to hash in 64K pages This adds the ability for a kernel compiled with 4K page size to have special slices containing 64K pages and hash the right type of hash PTEs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt	d0f13e3c20	[POWERPC] Introduce address space "slices" The basic issue is to be able to do what hugetlbfs does but with different page sizes for some other special filesystems; more specifically, my need is: - Huge pages - SPE local store mappings using 64K pages on a 4K base page size kernel on Cell - Some special 4K segments in 64K-page kernels for mapping a dodgy type of powerpc-specific infiniband hardware that requires 4K MMU mappings for various reasons I won't explain here. The main issues are: - To maintain/keep track of the page size per "segment" (as we can only have one page size per segment on powerpc, which are 256MB divisions of the address space). - To make sure special mappings stay within their allotted "segments" (including MAP_FIXED crap) - To make sure everybody else doesn't mmap/brk/grow_stack into a "segment" that is used for a special mapping Some of the necessary mechanisms to handle that were present in the hugetlbfs code, but mostly in ways not suitable for anything else. The patch relies on some changes to the generic get_unmapped_area() that just got merged. It still hijacks hugetlb callbacks here or there as the generic code hasn't been entirely cleaned up yet but that shouldn't be a problem. So what is a slice ? Well, I re-used the mechanism used formerly by our hugetlbfs implementation which divides the address space in "meta-segments" which I called "slices". The division is done using 256MB slices below 4G, and 1T slices above. Thus the address space is divided currently into 16 "low" slices and 16 "high" slices. (Special case: high slice 0 is the area between 4G and 1T). Doing so simplifies significantly the tracking of segments and avoids having to keep track of all the 256MB segments in the address space. While I used the "concepts" of hugetlbfs, I mostly re-implemented everything in a more generic way and "ported" hugetlbfs to it. Slices can have an associated page size, which is encoded in the mmu context and used by the SLB miss handler to set the segment sizes. The hash code currently doesn't care, it has a specific check for hugepages, though I might add a mechanism to provide per-slice hash mapping functions in the future. The slice code provide a pair of "generic" get_unmapped_area() (bottomup and topdown) functions that should work with any slice size. There is some trickiness here so I would appreciate people to have a look at the implementation of these and let me know if I got something wrong. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt	16f1c74675	[POWERPC] Small fixes & cleanups in segment page size demotion The code for demoting segments to 4K had some issues, like for example, when using _PAGE_4K_PFN flag, the first CPU to hit it would do the demotion, but other CPUs hitting the same page wouldn't properly flush their SLBs if mmu_ci_restriction isn't set. There are also potential issues with hash_preload not handling _PAGE_4K_PFN. All of these are non issues on current hardware but might bite us in the future. This patch thus fixes it by: - Taking the test comparing the mm and current CPU context page sizes to decide to flush SLBs out of the mmu_ci_restrictions test since that can also be triggered by _PAGE_4K_PFN pages - Due to the above being done all the time, demote_segment_4k doesn't need update the context and flush the SLB - demote_segment_4k can be static and doesn't need an EXPORT_SYMBOL - Making hash_preload ignore anything that has either _PAGE_4K_PFN or _PAGE_NO_CACHE set, thus avoiding duplication of the complicated logic in hash_page() (and possibly making hash_preload a little bit faster for the normal case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-09 16:35:00 +10:00
Michael Ellerman	ed16669298	[POWERPC] Initialise spinlock in the DEBUG_PAGEALLOC code Fixes: BUG: spinlock bad magic on CPU#0, swapper/0 lock: c00000000064ec30, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 Call Trace: [c00000000062b980] [c00000000000f920] .show_stack+0x6c/0x1a0 (unreliable) [c00000000062ba20] [c0000000001c2b40] .spin_bug+0xb0/0xd4 [c00000000062bab0] [c0000000001c2ed0] ._raw_spin_lock+0x44/0x184 [c00000000062bb50] [c0000000003a42b4] ._spin_lock+0x10/0x24 [c00000000062bbd0] [c00000000002b4dc] .kernel_map_pages+0x198/0x278 [c00000000062bc90] [c000000000079720] .free_hot_cold_page+0x124/0x418 [c00000000062bd70] [c000000000530278] .free_all_bootmem_core+0x14c/0x224 [c00000000062be50] [c00000000052a178] .mem_init+0x68/0x170 [c00000000062bee0] [c00000000051d874] .start_kernel+0x2a0/0x37c [c00000000062bf90] [c0000000000084c8] .start_here_common+0x54/0x8c Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:29 +10:00
Benjamin Herrenschmidt	370a908db1	[POWERPC] DEBUG_PAGEALLOC for 64-bit Here's an implementation of DEBUG_PAGEALLOC for 64 bits powerpc. It applies on top of the 32 bits patch. Unlike Anton's previous attempt, I'm not using updatepp. I'm removing the hash entries from the bolted mapping (using a map in RAM of all the slots). Expensive but it doesn't really matter, does it ? :-) Memory hot-added doesn't benefit from this unless it's added at an address that is below end_of_DRAM() as calculated at boot time. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/Kconfig.debug \| 2 arch/powerpc/mm/hash_utils_64.c \| 84 ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 82 insertions(+), 4 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-13 04:09:39 +10:00
Paul Mackerras	721151d004	[POWERPC] Allow drivers to map individual 4k pages to userspace Some drivers have resources that they want to be able to map into userspace that are 4k in size. On a kernel configured with 64k pages we currently end up mapping the 4k we want plus another 60k of physical address space, which could contain anything. This can introduce security problems, for example in the case of an infiniband adaptor where the other 60k could contain registers that some other program is using for its communications. This patch adds a new function, remap_4k_pfn, which drivers can use to map a single 4k page to userspace regardless of whether the kernel is using a 4k or a 64k page size. Like remap_pfn_range, it would typically be called in a driver's mmap function. It only maps a single 4k page, which on a 64k page kernel appears replicated 16 times throughout a 64k page. On a 4k page kernel it reduces to a call to remap_pfn_range. The way this works on a 64k kernel is that a new bit, _PAGE_4K_PFN, gets set on the linux PTE. This alters the way that __hash_page_4K computes the real address to put in the HPTE. The RPN field of the linux PTE becomes the 4k RPN directly rather than being interpreted as a 64k RPN. Since the RPN field is 32 bits, this means that physical addresses being mapped with remap_4k_pfn have to be below 2^44, i.e. 0x100000000000. The patch also factors out the code in arch/powerpc/mm/hash_utils_64.c that deals with demoting a process to use 4k pages into one function that gets called in the various different places where we need to do that. There were some discrepancies between exactly what was done in the various places, such as a call to spu_flush_all_slbs in one case but not in others. Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-13 03:55:18 +10:00

1 2

80 Commits