linux-sg2042

Commit Graph

Author	SHA1	Message	Date
David S. Miller	048c9acca9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Merge sparc bug fixes that didn't make it into v3.9 into sparc-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-04 18:34:13 -07:00
Johannes Weiner	0aad818b2d	sparse-vmemmap: specify vmemmap population range in bytes The sparse code, when asking the architecture to populate the vmemmap, specifies the section range as a starting page and a number of pages. This is an awkward interface, because none of the arch-specific code actually thinks of the range in terms of 'struct page' units and always translates it to bytes first. In addition, later patches mix huge page and regular page backing for the vmemmap. For this, they need to call vmemmap_populate_basepages() on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these are not necessarily multiples of the 'struct page' size and so this unit is too coarse. Just translate the section range into bytes once in the generic sparse code, then pass byte ranges down the stack. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: David S. Miller <davem@davemloft.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
Kirill Tkhai	07df841877	sparc64: Do not save/restore interrupts in get_new_mmu_context() get_new_mmu_context() is always called with interrupts disabled. So it's possible to do this micro optimization. (Also fix the comment to switch_mm, which is called in both cases) Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-08 22:50:47 -04:00
Tkhai Kirill	ce835e513a	sparc64: Do not change num_physpages during initmem freeing Common hibernation code looks at num_physpages during suspend and restore. Restore is able to be called from initcall, which is before initmem freeing. This case leads to restore fail. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> CC: David Miller <davem@davemloft.net> CC: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 11:06:54 -07:00
Tang Chen	0197518cd3	memory-hotplug: remove memmap of sparse-vmemmap Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Yasuaki Ishimatsu	46723bfa54	memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Linus Torvalds	2ef14f465b	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The really big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...	2013-02-21 18:06:55 -08:00
David S. Miller	0fbebed682	sparc64: Fix tsb_grow() in atomic context. If our first THP installation for an MM is via the set_pmd_at() done during khugepaged's collapsing we'll end up in tsb_grow() trying to do a GFP_KERNEL allocation with several locks held. Simply using GFP_ATOMIC in this situation is not the best option because we really can't have this fail, so we'd really like to keep this an order 0 GFP_KERNEL allocation if possible. Also, doing the TSB allocation from khugepaged is a really bad idea because we'll allocate it potentially from the wrong NUMA node in that context. So what we do is defer the hugepage TSB allocation until the first TLB miss we take on a hugepage. This is slightly tricky because we have to handle two unusual cases: 1) Taking the first hugepage TLB miss in the window trap handler. We'll call the winfix_trampoline when that is detected. 2) An initial TSB allocation via TLB miss races with a hugetlb fault on another cpu running the same MM. We handle this by unconditionally loading the TSB we see into the current cpu even if it's non-NULL at hugetlb_setup time. Reported-by: Meelis Roos <mroos@ut.ee> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-20 09:46:08 -08:00
David S. Miller	bcd896bae0	sparc64: Handle hugepage TSB being NULL. Accomodate the possibility that the TSB might be NULL at the point that update_mmu_cache() is invoked. This is necessary because we will sometimes need to defer the TSB allocation to the first fault that happens in the 'mm'. Seperate out the hugepage PTE test into a seperate function so that the logic is clearer. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-20 09:46:08 -08:00
H. Peter Anvin	de65d816aa	Merge remote-tracking branch 'origin/x86/boot' into x86/mm2 Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:10:15 -08:00
Greg Kroah-Hartman	7c9503b838	SPARC: drivers: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:04 -08:00
Yinghai Lu	961f8fa04b	sparc, mm: Remove calling of free_all_bootmem_node() Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node instead. That is confusing, try to kill that free_all_bootmem_node(). Before that, this patch will remove calling of free_all_bootmem_node() We add register_page_bootmem_info() to call register_page_bootmem_info_node directly. Also could use free_all_bootmem() for numa case, and it is just the same as free_low_memory_core_early(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-45-git-send-email-yinghai@kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: sparclinux@vger.kernel.org Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:48 -08:00
Al Viro	dff933da76	sparc64: clear syscall_noerror on the entry to syscall, not on the exit Move that sucker to just before TI_FPDEPTH and replace stb with sth in etrap_save(). Take current_ds to its old place, so that we don't push wsaved into TI_... flags. That allows to lose clearing syscall_noerror on return from syscall. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-14 19:26:52 -04:00
David Miller	9e695d2ecc	sparc64: Support transparent huge pages. This is relatively easy since PMD's now cover exactly 4MB of memory. Our PMD entries are 32-bits each, so we use a special encoding. The lowest bit, PMD_ISHUGE, determines the interpretation. This is possible because sparc64's page tables are purely software entities so we can use whatever encoding scheme we want. We just have to make the TLB miss assembler page table walkers aware of the layout. set_pmd_at() works much like set_pte_at() but it has to operate in two page from a table of non-huge PTEs, so we have to queue up TLB flushes based upon what mappings are valid in the PTE table. In the second regime we are going from huge-page to non-huge-page, and in that case we need only queue up a single TLB flush to push out the huge page mapping. We still have 5 bits remaining in the huge PMD encoding so we can very likely support any new pieces of THP state tracking that might get added in the future. With lots of help from Johannes Weiner. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:06 +09:00
David Miller	c460bec78d	sparc64: Eliminate PTE table memory wastage. We've split up the PTE tables so that they take up half a page instead of a full page. This is in order to facilitate transparent huge page support, which works much better if our PMDs cover 4MB instead of 8MB. What we do is have a one-behind cache for PTE table allocations in the mm struct. This logic triggers only on allocations. For example, we don't try to keep track of free'd up page table blocks in the style that the s390 port does. There were only two slightly annoying aspects to this change: 1) Changing pgtable_t to be a "pte_t *". There's all of this special logic in the TLB free paths that needed adjustments, as did the PMD populate interfaces. 2) init_new_context() needs to zap the pointer, since the mm struct just gets copied from the parent on fork. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:05 +09:00
David Miller	15b9350a17	sparc64: Only support 4MB huge pages and 8KB base pages. Narrowing the scope of the page size configurations will make the transparent hugepage changes much simpler. In the end what we really want to do is have the kernel support multiple huge page sizes and use whatever is appropriate as the context dictactes. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:04 +09:00
Akinobu Mita	5da444aae5	sparc: fix format string argument for prom_printf() prom_printf() takes printf style arguments. Specifing GCC's format attribute reveals that there are several wrong usages of prom_printf(). This fixes those wrong format strings and arguments, and also leaves format attributes in order to detect similar mistakes at compile time. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-02 23:20:34 -04:00
David S. Miller	c69ad0a3f7	sparc64: Use cpu_pgsz_mask for linear kernel mapping config. This required a little bit of reordering of how we set up the memory management early on. We now only know the final values of kern_linear_pte_xor[] after we take over the trap table and start processing TLB misses ourselves. So once we fill those values in we re-clear the kernel's 4M TSB and flush the TLBs. That way if we find we support larger than 4M pages we won't have any stale smaller page size entries in the TSB. SUN4U Panther support for larger page sizes should now be extremely trivial but I have no hardware on which to test it and I believe that some of the sun4u TLB miss assembler needs to be audited first to make sure it really can handle larger than 4M PTEs properly. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-06 20:35:36 -07:00
David S. Miller	ce33fdc52a	sparc64: Probe cpu page size support more portably. On sun4v, interrogate the machine description. This code is extremely defensive in nature, and a lot of the checks can probably be removed. On sun4u things are a lot simpler. There are the page sizes all chips support, and then Panther adds 32MB and 256MB pages. Report the probed value in /proc/cpuinfo Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-06 19:01:25 -07:00
David S. Miller	4f93d21d25	sparc64: Support 2GB and 16GB page sizes for kernel linear mappings. SPARC-T4 supports 2GB pages. So convert kpte_linear_bitmap into an array of 2-bit values which index into kern_linear_pte_xor. Now kern_linear_pte_xor is used for 4 page size aligned regions, 4MB, 256MB, 2GB, and 16GB respectively. Enabling 2GB pages is currently hardcoded using a check against sun4v_chip_type. In the future this will be done more cleanly by interrogating the machine description which is the correct way to determine this kind of thing. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-06 18:13:58 -07:00
David S. Miller	2856cc2e4d	sparc64: Be less verbose during vmemmap population. On a 2-node machine with 256GB of ram we get 512 lines of console output, which is just too much. This mimicks Yinghai Lu's x86 commit `c2b91e2eec` (x86_64/mm: check and print vmemmap allocation continuous) except that we aren't ever going to get contiguous block pointers in between calls so just print when the virtual address or node changes. This decreases the output by an order of 16. Also demote this to KERN_DEBUG. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-15 00:37:29 -07:00
Paul Gortmaker	aa6f079075	sparc: fix build fail in mm/init_64.c when NEED_MULTIPLE_NODES is off Commit `625d693e97` (linux-next) "sparc64: Convert over to NO_BOOTMEM." causes the following compile failure for sparc64 allnoconfig: arch/sparc/mm/init_64.c:822:16: error: unused variable 'paddr' arch/sparc/mm/init_64.c:1759:7: error: unused variable 'node' arch/sparc/mm/init_64.c:809:12: error: 'memblock_nid_range' defined but not used The paddr decl can easily be shuffled within the ifdef. The memblock_nid_range is just a stub function for when NEED_MULTIPLE_NODES is off, but the only caller is within a NEED_MULTIPLE_NODES enabled section, so we can simply delete it. The unused "node" is slightly more interesting. In the case of "# CONFIG_NEED_MULTIPLE_NODES is not set" we no longer get the definition of: #define NODE_DATA(nid) (node_data[nid]) from arch/sparc/include/asm/mmzone.h - but instead we get: #define NODE_DATA(nid) (&contig_page_data) from include/linux/mmzone.h -- and since the arg is ignored, the thing really is unused. Rather than put in a confusing looking __maybe_unused, simply splitting the declaration from the assignment seemed to me to be the least offensive. Cc: Sam Ravnborg <sam@ravnborg.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-09 19:58:07 -07:00
David S. Miller	799d40cc10	sparc64: Do not set max_mapnr. There is no need, since nothing relevant to sparc64 makes use of this value. Noticed by Sam Ravnborg. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 10:55:08 -07:00
David S. Miller	5ed56f1ad9	sparc64: Use node local allocations for IRQ stacks. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-26 20:50:34 -07:00
David S. Miller	625d693e97	sparc64: Convert over to NO_BOOTMEM. With help from Sam Ravnborg. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-26 20:02:23 -07:00
David Howells	d550bbd40c	Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for Sparc. Signed-off-by: David Howells <dhowells@redhat.com> cc: sparclinux@vger.kernel.org	2012-03-28 18:30:03 +01:00
Tejun Heo	2a4814df54	sparc: Use HAVE_MEMBLOCK_NODE_MAP sparc doesn't access early_node_map[] directly and enabling HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is enough. -v2: Use select in Kconfig instead as suggested by Sam Ravnborg. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: sparclinux@vger.kernel.org	2011-12-08 10:22:08 -08:00
Tejun Heo	1aadc0560f	memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users The only function of memblock_analyze() is now allowing resize of memblock region arrays. Rename it to memblock_allow_resize() and update its users. * The following users remain the same other than renaming. arm/mm/init.c::arm_memblock_init() microblaze/kernel/prom.c::early_init_devtree() powerpc/kernel/prom.c::early_init_devtree() openrisc/kernel/prom.c::early_init_devtree() sh/mm/init.c::paging_init() sparc/mm/init_64.c::paging_init() unicore32/mm/init.c::uc32_memblock_init() * In the following users, analyze was used to update total size which is no longer necessary. powerpc/kernel/machine_kexec.c::reserve_crashkernel() powerpc/kernel/prom.c::early_init_devtree() powerpc/mm/init_32.c::MMU_init() powerpc/mm/tlb_nohash.c::__early_init_mmu() powerpc/platforms/ps3/mm.c::ps3_mm_add_memory() powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups() sh/kernel/machine_kexec.c::reserve_crashkernel() * x86/kernel/e820.c::memblock_x86_fill() was directly setting memblock_can_resize before populating memblock and calling analyze afterwards. Call memblock_allow_resize() before start populating. memblock_can_resize is now static inside memblock.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:08 -08:00
Tejun Heo	fe091c208a	memblock: Kill memblock_init() memblock_init() initializes arrays for regions and memblock itself; however, all these can be done with struct initializers and memblock_init() can be removed. This patch kills memblock_init() and initializes memblock with struct initializer. The only difference is that the first dummy entries don't have .nid set to MAX_NUMNODES initially. This doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:07 -08:00
Tejun Heo	d4bbf7e775	Merge branch 'master' into x86/memblock Conflicts & resolutions: * arch/x86/xen/setup.c `dc91c728fd` "xen: allow extra memory to be in multiple regions" `24aa07882b` "memblock, x86: Replace memblock_x86_reserve/free..." conflicted on xen_add_extra_mem() updates. The resolution is trivial as the latter just want to replace memblock_x86_reserve_range() with memblock_reserve(). * drivers/pci/intel-iommu.c `166e9278a3` "x86/ia64: intel-iommu: move to drivers/iommu/" `5dfe8660a3` "bootmem: Replace work_with_active_regions() with..." conflicted as the former moved the file under drivers/iommu/. Resolved by applying the chnages from the latter on the moved file. * mm/Kconfig `6661672053` "memblock: add NO_BOOTMEM config symbol" `c378ddd53f` "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option" conflicted trivially. Both added config options. Just letting both add their own options resolves the conflict. * mm/memblock.c `d1f0ece6cd` "mm/memblock.c: small function definition fixes" `ed7b56a799` "memblock: Remove memblock_memory_can_coalesce()" confliected. The former updates function removed by the latter. Resolution is trivial. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-11-28 09:46:22 -08:00
David S. Miller	f4142cba4e	sparc64: Force the execute bit in OpenFirmware's translation entries. In the OF 'translations' property, the template TTEs in the mappings never specify the executable bit. This is the case even though some of these mappings are for OF's code segment. Therefore, we need to force the execute bit on in every mapping. This problem can only really trigger on Niagara/sun4v machines and the history behind this is a little complicated. Previous to sun4v, the sun4u TTE entries lacked a hardware execute permission bit. So OF didn't have to ever worry about setting anything to handle executable pages. Any valid TTE loaded into the I-TLB would be respected by the chip. But sun4v Niagara chips have a real hardware enforced executable bit in their TTEs. So it has to be set or else the I-TLB throws an instruction access exception with type code 6 (protection violation). We've been extremely fortunate to not get bitten by this in the past. The best I can tell is that the OF's mappings for it's executable code were mapped using permanent locked mappings on sun4v in the past. Therefore, the fact that we didn't have the exec bit set in the OF translations we would use did not matter in practice. Thanks to Greg Onufer for helping me track this down. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-29 12:18:59 -07:00
David S. Miller	0785a8e87b	sparc: Fix build with DEBUG_PAGEALLOC enabled. arch/sparc/mm/init_64.c:1622:22: error: unused variable '__swapper_4m_tsb_phys_patch_end' [-Werror=unused-variable] arch/sparc/mm/init_64.c:1621:22: error: unused variable '__swapper_4m_tsb_phys_patch' [-Werror=unused-variable] Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-06 05:26:35 -07:00
David S. Miller	9076d0e7e0	sparc: Access kernel TSB using physical addressing when possible. On sun4v this is basically required since we point the hypervisor and the TSB walking hardware at these tables using physical addressing too. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-05 00:53:57 -07:00
Tejun Heo	f9b18db3b1	memblock: Don't allow archs to override memblock_nid_range() memblock_nid_range() is used to implement memblock_[try_]alloc_nid(). The generic version determines the range by walking early_node_map with for_each_mem_pfn_range(). The generic version is defined __weak to allow arch override. Currently, only sparc overrides it; however, with the previous update to the generic implementation, there isn't much to be gained with arch override. Sparc would behave exactly the same with the generic implementation. This patch disallows arch override for memblock_nid_range() and make both generic and sparc versions static. sparc is only compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310460395-30913-6-git-send-email-tj@kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:45:33 -07:00
Joe Perches	6cb79b3f3b	sparc: Remove unnecessary semicolons Semicolons are not necessary after switch/while/for/if braces so remove them. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-07 16:06:34 -07:00
KOSAKI Motohiro	fb1fece5da	sparc: convert old cpumask API into new one Adapt new API. Almost change is trivial, most important change are to remove following like =operator. cpumask_t cpu_mask = *mm_cpumask(mm); cpus_allowed = current->cpus_allowed; Because cpumask_var_t is =operator unsafe. These usage might prevent kernel core improvement. No functional change. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-16 13:38:07 -07:00
Linus Torvalds	51f00a471c	Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6 * 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: mtd/m25p80: add support to parse the partitions by OF node of/irq: of_irq.c needs to include linux/irq.h of/mips: Cleanup some include directives/files. of/mips: Add device tree support to MIPS of/flattree: Eliminate need to provide early_init_dt_scan_chosen_arch of/device: Rework to use common platform_device_alloc() for allocating devices of/xsysace: Fix OF probing on little-endian systems of: use __be32 types for big-endian device tree data of/irq: remove references to NO_IRQ in drivers/of/platform.c of/promtree: add package-to-path support to pdt of/promtree: add of_pdt namespace to pdt code of/promtree: no longer call prom_ functions directly; use an ops structure of/promtree: make drivers/of/pdt.c no longer sparc-only sparc: break out some PROM device-tree building code out into drivers/of of/sparc: convert various prom_* functions to use phandle sparc: stop exporting openprom.h header powerpc, of_serial: Endianness issues setting up the serial ports of: MTD: Fix OF probing on little-endian systems of: GPIO: Fix OF probing on little-endian systems	2010-10-25 08:19:14 -07:00
Yinghai Lu	c7fc2de0c8	memblock, bootmem: Round pfn properly for memory and reserved regions We need to round memory regions correctly -- specifically, we need to round reserved region in the more expansive direction (lower limit down, upper limit up) whereas usable memory regions need to be rounded in the more restrictive direction (lower limit up, upper limit down). This introduces two set of inlines: memblock_region_memory_base_pfn() memblock_region_memory_end_pfn() memblock_region_reserved_base_pfn() memblock_region_reserved_end_pfn() Although they are antisymmetric (and therefore are technically duplicates) the use of the different inlines explicitly documents the programmer's intention. The lack of proper rounding caused a bug on ARM, which was then found to also affect other architectures. Reported-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB4CDFD.4020105@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-12 15:37:51 -07:00
Andres Salomon	8d1255627d	of/sparc: convert various prom_* functions to use phandle Rather than passing around ints everywhere, use the phandle type where appropriate for the various functions that talk to the PROM. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2010-10-09 02:33:34 -06:00
Benjamin Herrenschmidt	9d1e24928e	memblock: Separate memblock_alloc_nid() and memblock_alloc_try_nid() The former is now strict, it will fail if it cannot honor the allocation within the node, while the later implements the previous semantic which falls back to allocating anywhere. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:24 +10:00
Benjamin Herrenschmidt	35a1f0bd07	memblock: Remove nid_range argument, arch provides memblock_nid_range() instead Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:04 +10:00
Benjamin Herrenschmidt	08b8479881	memblock/sparc: Use new accessors CC: David S. Miller <davem@davemloft.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-04 14:39:00 +10:00
Benjamin Herrenschmidt	e3239ff92a	memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-04 14:21:49 +10:00
Yinghai Lu	95f72d1ed4	lmb: rename to memblock via following scripts FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N \| sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-14 17:14:00 +10:00
Tejun Heo	336f5899d2	Merge branch 'master' into export-slabh	2010-04-05 11:37:28 +09:00
Ben Hutchings	33cd9dfa3a	sparc64: Fix array size reported by vmemmap_populate() vmemmap_populate() attempts to report the used index and total size of vmemmap_table, but it wrongly shifts the total size so that it is always shown as 0. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-03 13:58:45 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Russell King	4b3073e1c5	MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2010-02-20 16:41:46 +00:00
David S. Miller	1a78cedb99	sparc64: Fix D-cache flushing on swapin from SW devices. Thanks to tip form ARM folks and Russell King. If flush_dcache_page() occurs on a swapin it will have a mapping and we'll try to defer the flush by setting the dirty bit. But when it hits update_dcache_page() we won't flush because the page won't have a mapping any more. So remove the mapping requirement in flush_dcache(). Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-12 03:20:57 -07:00
David S. Miller	d8ed1d43e1	sparc64: Validate linear D-TLB misses. When page alloc debugging is not enabled, we essentially accept any virtual address for linear kernel TLB misses. But with kgdb, kernel address probing, and other facilities we can try to access arbitrary crap. So, make sure the address we miss on will translate to physical memory that actually exists. In order to make this work we have to embed the valid address bitmap into the kernel image. And in order to make that less expensive we make an adjustment, in that the max physical memory address is decreased to "1 << 41", even on the chips that support a 42-bit physical address space. We can do this because bit 41 indicates "I/O space" and thus covers non-memory ranges. The result of this is that: 1) kpte_linear_bitmap shrinks from 2K to 1K in size 2) we need 64K more for the valid address bitmap We can't let the valid address bitmap be dynamically allocated once we start using it to validate TLB misses, otherwise we have crazy issues to deal with wrt. recursive TLB misses and such. If we're in a TLB miss it could be the deepest trap level that's legal inside of the cpu. So if we TLB miss referencing the bitmap, the cpu will be out of trap levels and enter RED state. To guard against out-of-range accesses to the bitmap, we have to check to make sure no bits in the physical address above bit 40 are set. We could export and use last_valid_pfn for this check, but that's just an unnecessary extra memory reference. On the plus side of all this, since we load all of these translations into the special 4MB mapping TSB, and we check the TSB first for TLB misses, there should be absolutely no real cost for these new checks in the TLB miss path. Reported-by: heyongli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-25 16:47:46 -07:00

1 2

63 Commits