Commit Graph

254878 Commits

Author SHA1 Message Date
Tejun Heo 474b881bf4 x86: Use absent_pages_in_range() instead of memblock_x86_hole_size()
memblock_x86_hole_size() calculates the total size of holes in a given
range according to memblock and is used by numa emulation code and
numa_meminfo_cover_memory().

Since conversion to MEMBLOCK_NODE_MAP, absent_pages_in_range() also
uses memblock and gives the same result.  This patch replaces
memblock_x86_hole_size() uses with absent_pages_in_range().  After the
conversion the x86 function doesn't have any user left and is killed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-12-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:51 -07:00
Tejun Heo 6b5d41a1b9 memblock, x86: Reimplement memblock_find_dma_reserve() using iterators
memblock_find_dma_reserve() wants to find out how much memory is
reserved under MAX_DMA_PFN.  memblock_x86_memory_[free_]in_range() are
used to find out the amounts of all available and free memory in the
area, which are then subtracted to find out the amount of reservation.

memblock_x86_memblock_[free_]in_range() are implemented using
__memblock_x86_memory_in_range() which builds ranges from memblock and
then count them, which is rather unnecessarily complex.

This patch open codes the counting logic directly in
memblock_find_dma_reserve() using memblock iterators and removes now
unused __memblock_x86_memory_in_range() and find_range_array().

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-11-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:50 -07:00
Tejun Heo 8a9ca34c11 memblock, x86: Replace __get_free_all_memory_range() with for_each_free_mem_range()
__get_free_all_memory_range() walks memblock, calculates free memory
areas and fills in the specified range.  It can be easily replaced
with for_each_free_mem_range().

Convert free_low_memory_core_early() and
add_highpages_with_active_regions() to for_each_free_mem_range().
This leaves __get_free_all_memory_range() without any user.  Kill it
and related functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-10-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:49 -07:00
Tejun Heo 64a02daacb memblock, x86: Make free_all_memory_core_early() explicitly free lowmem only
nomemblock is currently used only by x86 and on x86_32
free_all_memory_core_early() silently freed only the low mem because
get_free_all_memory_range() in arch/x86/mm/memblock.c implicitly
limited range to max_low_pfn.

Rename free_all_memory_core_early() to free_low_memory_core_early()
and make it call __get_free_all_memory_range() and limit the range to
max_low_pfn explicitly.  This makes things clearer and also is
consistent with the bootmem behavior.

This leaves get_free_all_memory_range() without any user.  Kill it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-9-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:49 -07:00
Tejun Heo 8d89ac8084 x86: Replace memblock_x86_find_in_range_size() with for_each_free_mem_range()
setup_bios_corruption_check() and memtest do_one_pass() open code
memblock free area iteration using memblock_x86_find_in_range_size().
Convert them to use for_each_free_mem_range() instead.

This leaves memblock_x86_find_in_range_size() and
memblock_x86_check_reserved_size() unused.  Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-8-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:48 -07:00
Tejun Heo 35fd0808d7 memblock: Implement for_each_free_mem_range()
Implement for_each_free_mem_range() which iterates over free memory
areas according to memblock (memory && !reserved).  This will be used
to simplify memblock users.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-7-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:47 -07:00
Tejun Heo ab5d140b9e x86: Use __memblock_alloc_base() in early_reserve_e820()
early_reserve_e820() implements its own ad-hoc early allocator using
memblock_x86_find_in_range_size().  Use __memblock_alloc_base()
instead and remove the unnecessary @startt parameter (it's top-down
allocation anyway).

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-6-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:47 -07:00
Tejun Heo 0608f70c78 x86: Use HAVE_MEMBLOCK_NODE_MAP
From 5732e1247898d67cbf837585150fe9f68974671d Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 14 Jul 2011 11:22:16 +0200

Convert x86 to HAVE_MEMBLOCK_NODE_MAP.  The only difference in memory
handling is that allocations can't no longer cross node boundaries
whether they're node affine or not, which shouldn't matter at all.

This conversion will enable further simplification of boot memory
handling.

-v2: Fix build failure on !NUMA configurations discovered by hpa.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714094423.GG3455@htj.dyndns.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:43 -07:00
Tejun Heo 7c0caeb866 memblock: Add optional region->nid
From 83103b92f3234ec830852bbc5c45911bd6cbdb20 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 14 Jul 2011 11:22:16 +0200

Add optional region->nid which can be enabled by arch using
CONFIG_HAVE_MEMBLOCK_NODE_MAP.  When enabled, memblock also carries
NUMA node information and replaces early_node_map[].

Newly added memblocks have MAX_NUMNODES as nid.  Arch can then call
memblock_set_node() to set node information.  memblock takes care of
merging and node affine allocations w.r.t. node information.

When MEMBLOCK_NODE_MAP is enabled, early_node_map[], related data
structures and functions to manipulate and iterate it are disabled.
memblock version of __next_mem_pfn_range() is provided such that
for_each_mem_pfn_range() behaves the same and its users don't have to
be updated.

-v2: Yinghai spotted section mismatch caused by missing
     __init_memblock in memblock_set_node().  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714094342.GF3455@htj.dyndns.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:43 -07:00
Tejun Heo 67e24bcb72 memblock: Use __meminit[data] instead of __init[data]
From 19ab281ed67b87a6623d725237a7333ca79f1e75 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 14 Jul 2011 11:22:16 +0200

memblock will be extended to include early_node_map[], which is also
used during memory hotplug.  Make memblock use __meminit[data] instead
of __init[data] so that memory hotplug code can safely reference it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714094203.GE3455@htj.dyndns.org
Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:42 -07:00
Tejun Heo 784656f9c6 memblock: Reimplement memblock_add_region()
memblock_add_region() carefully checked for merge and overlap
conditions while adding a new region, which is complicated and makes
it difficult to allow arbitrary overlaps or add more merge conditions
(e.g. node ID).

This re-implements memblock_add_region() such that insertion is done
in two steps - all non-overlapping portions of new area are inserted
as separate regions first and then memblock_merge_regions() scan and
merge all neighbouring compatible regions.

This makes addition logic simpler and more versatile and enables
adding node information to memblock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-3-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:41 -07:00
Tejun Heo ed7b56a799 memblock: Remove memblock_memory_can_coalesce()
Arch could implement memblock_memor_can_coalesce() to veto merging of
adjacent or overlapping memblock regions; however, no arch did and any
vetoing would trigger WARN_ON().  Memblock regions are supposed to
deal with proper memory anyway.  Remove the unused hook.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-2-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:47:40 -07:00
Tejun Heo eb40c4c27f memblock, x86: Replace memblock_x86_find_in_range_node() with generic memblock calls
With the previous changes, generic NUMA aware memblock API has feature
parity with memblock_x86_find_in_range_node().  There currently are
two users - x86 setup_node_data() and __alloc_memory_core_early() in
nobootmem.c.

This patch converts the former to use memblock_alloc_nid() and the
latter memblock_find_range_in_node(), and kills
memblock_x86_find_in_range_node() and related functions including
find_memory_early_core_early() in page_alloc.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-9-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:35 -07:00
Tejun Heo e64980405c memblock: Separate out memblock_find_in_range_node()
Node affine memblock allocation logic is currently implemented across
memblock_alloc_nid() and memblock_alloc_nid_region().  This
reorganizes it such that it resembles that of non-NUMA allocation API.

Area finding is collected and moved into new exported function
memblock_find_in_range_node() which is symmetrical to non-NUMA
counterpart - it handles @start/@end and understands ANYWHERE and
ACCESSIBLE.  memblock_alloc_nid() now simply calls
memblock_find_in_range_node() and reserves the returned area.

This makes memblock_alloc[_try]_nid() observe ACCESSIBLE limit on node
affine allocations too (again, this doesn't make any difference for
the current sole user - sparc64).

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-8-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:35 -07:00
Tejun Heo 34e1845548 memblock: Make memblock_alloc_[try_]nid() top-down
NUMA aware memblock alloc functions - memblock_alloc_[try_]nid() -
weren't properly top-down because memblock_nid_range() scanned
forward.  This patch reverses memblock_nid_range(), renames it to
memblock_nid_range_rev() and updates related functions to implement
proper top-down allocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-7-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:34 -07:00
Tejun Heo f9b18db3b1 memblock: Don't allow archs to override memblock_nid_range()
memblock_nid_range() is used to implement memblock_[try_]alloc_nid().
The generic version determines the range by walking early_node_map
with for_each_mem_pfn_range().  The generic version is defined __weak
to allow arch override.

Currently, only sparc overrides it; however, with the previous update
to the generic implementation, there isn't much to be gained with arch
override.  Sparc would behave exactly the same with the generic
implementation.

This patch disallows arch override for memblock_nid_range() and make
both generic and sparc versions static.

sparc is only compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-6-git-send-email-tj@kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:33 -07:00
Tejun Heo b2fea988f4 memblock: Improve generic memblock_nid_range() using for_each_mem_pfn_range()
Given an address range, memblock_nid_range() determines the node the
start of the range belongs to and upto where the range stays in the
same node.

It's implemented by calling get_pfn_range_for_nid(), which determines
min and max pfns for a given node, for each node and testing whether
start address falls in there.  This is not only inefficient but also
incorrect when nodes interleave as min-max ranges for nodes overlap.

This patch reimplements memblock_nid_range() using
for_each_mem_pfn_range().  It's simpler, walks the mem ranges once and
can find the exact range the start address falls in.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-5-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:32 -07:00
Tejun Heo c13291a536 bootmem: Use for_each_mem_pfn_range() in page_alloc.c
The previous patch added for_each_mem_pfn_range() which is more
versatile than for_each_active_range_index_in_nid().  This patch
replaces for_each_active_range_index_in_nid() and open coded
early_node_map[] walks with for_each_mem_pfn_range().

All conversions in this patch are straight-forward and shouldn't cause
any functional difference.  After the conversions,
for_each_active_range_index_in_nid() doesn't have any user left and is
removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-4-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:31 -07:00
Tejun Heo 96e907d136 bootmem: Reimplement __absent_pages_in_range() using for_each_mem_pfn_range()
__absent_pages_in_range() was needlessly complex.  Reimplement it
using for_each_mem_pfn_range().

Also, update zone_absent_pages_in_node() such that it doesn't call
__absent_pages_in_range() with @zone_start_pfn which is larger than
@zone_end_pfn.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-3-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:30 -07:00
Tejun Heo 5dfe8660a3 bootmem: Replace work_with_active_regions() with for_each_mem_pfn_range()
Callback based iteration is cumbersome and much less useful than
for_each_*() iterator.  This patch implements for_each_mem_pfn_range()
which replaces work_with_active_regions().  All the current users of
work_with_active_regions() are converted.

This simplifies walking over early_node_map and will allow converting
internal logics in page_alloc to use iterator instead of walking
early_node_map directly, which in turn will enable moving node
information to memblock.

powerpc change is only compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714074610.GD3455@htj.dyndns.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:29 -07:00
Tejun Heo fc769a8e70 memblock: Replace memblock_find_base() with memblock_find_in_range()
memblock_find_base() is a static function with two callers in
memblock.c and memblock_find_in_range() is a wrapper around it which
just changes the types and order of parameters.

Make memblock_find_in_range() take phys_addr_t instead of u64 for
consistency and replace memblock_find_base() with it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-7-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 16:36:02 -07:00
Tejun Heo 1f5026a7e2 memblock: Kill MEMBLOCK_ERROR
25818f0f28 (memblock: Make MEMBLOCK_ERROR be 0) thankfully made
MEMBLOCK_ERROR 0 and there already are codes which expect error return
to be 0.  There's no point in keeping MEMBLOCK_ERROR around.  End its
misery.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-6-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 16:36:01 -07:00
Tejun Heo 348968eb15 memblock: Use round_up/down() instead of memblock_align_up/down()
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-5-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 16:35:59 -07:00
Tejun Heo 15fb09722d memblock: Use MEMBLOCK_ALLOC_ACCESSIBLE instead of ANYWHERE in memblock_alloc_try_nid()
After node affine allocation fails, memblock_alloc_try_nid() calls
memblock_alloc_base() with @max_addr set to MEMBLOCK_ALLOC_ANYWHERE.
This is inconsistent with memblock_alloc() and what the function's
sole user - sparc/mm/init_64 - expects, although it doesn't make any
difference as sparc64 doesn't have highmem and ACCESSIBLE equals
ANYWHERE.

This patch makes memblock_alloc_try_nid() use ACCESSIBLE instead of
ANYWHERE.  This isn't complete as node affine allocation doesn't
consider memblock.current_limit.  It will be handled with future
changes.

This patch doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-4-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 16:35:58 -07:00
Tejun Heo 53348f2716 bootmem: Fix __free_pages_bootmem() to use @order properly
a226f6c899 (FRV: Clean up bootmem allocator's page freeing algorithm)
separated out __free_pages_bootmem() from free_all_bootmem_core().
__free_pages_bootmem() takes @order argument but it assumes @order is
either 0 or ilog2(BITS_PER_LONG).  Note that all the current users
match that assumption and this doesn't cause actual problems.

Fix it by using 1 << order instead of BITS_PER_LONG.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-3-git-send-email-tj@kernel.org
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 16:35:56 -07:00
Tejun Heo bf61549a2d x86: Fix memblock_x86_check_reserved_size() use in efi_reserve_boot_services()
The return value of memblock_x86_check_reserved_size() doesn't
indicate whether there's an overlapping reservatoin or not.  It
indicates whether the caller needs to iterate further to discover all
reserved portions of the specified area.

efi_reserve_boot_esrvices() wants to check whether the boot services
area overlaps with other reservations but incorrectly used
membloc_x86_check_reserved_size().  Use memblock_is_region_reserved()
instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-2-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 16:35:52 -07:00
Tejun Heo 1e01979c8f x86, numa: Implement pfn -> nid mapping granularity check
SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
sections array to map pfn to nid which is limited in granularity.  If
NUMA nodes are laid out such that the mapping cannot be accurate, boot
will fail triggering BUG_ON() in mminit_verify_page_links().

On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
led to the following BUG_ON().

 On node 0 totalpages: 2096615
   DMA zone: 32 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 3927 pages, LIFO batch:0
   Normal zone: 1740 pages used for memmap
   Normal zone: 220978 pages, LIFO batch:31
   HighMem zone: 16405 pages used for memmap
   HighMem zone: 1853533 pages, LIFO batch:31
 BUG: Int 6: CR2   (null)
      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
 Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
 Call Trace:
  [<c136b1e5>] ? early_fault+0x2e/0x2e
  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
  [<c1601d80>] ? paging_init+0x112/0x118
  [<c15f578d>] ? setup_arch+0x791/0x82f
  [<c15f43d9>] ? start_kernel+0x6a/0x257

This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.

This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 21:58:29 -07:00
Tejun Heo d0ead15738 x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
the next patch to implement mapping granularity check.

This patch is trivial constant rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 21:58:11 -07:00
Linus Torvalds 620917de59 Linux 3.0-rc7 2011-07-11 16:51:52 -07:00
Linus Torvalds 5adaf851d2 Documentation/Changes: remove some really obsolete text
That file harkens back to the days of the big 2.4 -> 2.6 version jump,
and was based even then on older versions.  Some of it is just obsolete,
and Jesper Juhl points out that it talks about kernel versions 2.6 and
should be updated to 3.0.

Remove some obsolete text, and re-phrase some other to not be 2.6-specific.

Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-11 16:48:38 -07:00
Linus Torvalds c15000b40d Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  [media] msp3400: fill in v4l2_tuner based on vt->type field
  [media] tuner-core.c: don't change type field in g_tuner or g_frequency
  [media] cx18/ivtv: fix g_tuner support
  [media] tuner-core: power up tuner when called with s_power(1)
  [media] v4l2-ioctl.c: check for valid tuner type in S_HW_FREQ_SEEK
  [media] tuner-core: simplify the standard fixup
  [media] tuner-core/v4l2-subdev: document that the type field has to be filled in
  [media] v4l2-subdev.h: remove unused s_mode tuner op
  [media] feature-removal-schedule: change in how radio device nodes are handled
  [media] bttv: fix s_tuner for radio
  [media] pvrusb2: fix g/s_tuner support
  [media] v4l2-ioctl.c: prefill tuner type for g_frequency and g/s_tuner
  [media] tuner-core: fix tuner_resume: use t->mode instead of t->type
  [media] tuner-core: fix s_std and s_tuner
2011-07-11 16:43:27 -07:00
Linus Torvalds 9ddf7f5058 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Reintroduce dropped call to check_wakeup_irqs
2011-07-11 12:49:03 -07:00
Linus Torvalds 71a1b44b03 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: drop spinlock before calling cifs_put_tlink
  cifs: fix expand_dfs_referral
  cifs: move bdi_setup_and_register outside of CONFIG_CIFS_DFS_UPCALL
  cifs: factor smb_vol allocation out of cifs_setup_volume_info
  cifs: have cifs_cleanup_volume_info not take a double pointer
  cifs: fix build_unc_path_to_root to account for a prefixpath
  cifs: remove bogus call to cifs_cleanup_volume_info
2011-07-11 12:48:24 -07:00
Linus Torvalds c891f2cd89 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] fix cpumask memory leak in acpi-cpufreq on cpu hotplug.
2011-07-11 12:47:53 -07:00
Linus Torvalds 145628130b Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86:
  hp-wmi: fix use after free
  dell-laptop - using buffer without mutex_lock
  Revert: "dell-laptop: Toggle the unsupported hardware killswitch"
  platform-drivers-x86: set backlight type to BACKLIGHT_PLATFORM
  thinkpad-acpi: handle HKEY 0x4010, 0x4011 events
  drivers/platform/x86: Fix memory leak
  thinkpad-acpi: handle some new HKEY 0x60xx events
  acer-wmi: fix bitwise bug when set device state
  acer-wmi: Only update rfkill status for associated hotkey events
2011-07-11 12:47:09 -07:00
Linus Torvalds 83e9569714 Merge branch 'movieboard' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'movieboard' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: ohci: do not bind to Pinnacle cards, avert panic
2011-07-11 12:46:39 -07:00
Joe Perches 404ba3f029 ath5k: Add missing breaks in switch/case
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-11 12:46:02 -07:00
Muthu Kumar 0580181784 Documentation/spinlocks.txt: Remove reference to sti()/cli()
Since we removed sti()/cli() and related, how about removing it from
Documentation/spinlocks.txt?

Signed-off-by: Muthukumar R <muthur@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-11 12:45:04 -07:00
Jeff Layton f484b5d001 cifs: drop spinlock before calling cifs_put_tlink
...as that function can sleep.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-11 18:40:52 +00:00
Eric Dumazet 0401846c33 hp-wmi: fix use after free
[  191.310008] WARNING: kmemcheck: Caught 32-bit read from freed memory (f0d25f14)
[  191.310011] c056d2f088000000105fd2f00000000050415353040000000000000000000000
[  191.310020]  i i i i f f f f f f f f f f f f f f f f f f f f f f f f f f f f
[  191.310027]                                          ^
[  191.310029]
[  191.310032] Pid: 737, comm: modprobe Not tainted 3.0.0-rc5+ #268 Hewlett-Packard HP Compaq 6005 Pro SFF PC/3047h
[  191.310036] EIP: 0060:[<f80b3104>] EFLAGS: 00010286 CPU: 0
[  191.310039] EIP is at hp_wmi_perform_query+0x104/0x150 [hp_wmi]
[  191.310041] EAX: f0d25601 EBX: f0d25f00 ECX: 000121cf EDX: 000121ce
[  191.310043] ESI: f0d25f10 EDI: f0f97ea8 EBP: f0f97ec4 ESP: c173f34c
[  191.310045]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  191.310046] CR0: 8005003b CR2: f540c000 CR3: 30f30000 CR4: 000006d0
[  191.310048] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  191.310050] DR6: ffff4ff0 DR7: 00000400
[  191.310051]  [<f80b317b>] hp_wmi_dock_state+0x2b/0x40 [hp_wmi]
[  191.310054]  [<f80b6093>] hp_wmi_init+0x93/0x1a8 [hp_wmi]
[  191.310057]  [<c10011f0>] do_one_initcall+0x30/0x170
[  191.310061]  [<c107ab9f>] sys_init_module+0xef/0x1a60
[  191.310064]  [<c149f998>] sysenter_do_call+0x12/0x28
[  191.310067]  [<ffffffff>] 0xffffffff

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2011-07-11 09:52:35 -04:00
Jose Alonso b486742a12 dell-laptop - using buffer without mutex_lock
Using buffer->output[1] without mutex_lock()

Signed-off-by: Jose Alonso <joalonsof@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2011-07-11 09:52:31 -04:00
Keng-Yu Lin be65dde82a Revert: "dell-laptop: Toggle the unsupported hardware killswitch"
This reverts commit a3d77411e8,

as it causes a mess in the wireless rfkill status on some models.
It is probably a bad idea to toggle the rfkill for all dell models
without the respect to the claim that it is hardware-controlled.

Cc: stable@kernel.org
Signed-off-by: Keng-Yu Lin <kengyu@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2011-07-11 09:52:19 -04:00
Colin Cross 887596224c PM: Reintroduce dropped call to check_wakeup_irqs
Patch 2e711c04db
(PM: Remove sysdev suspend, resume and shutdown operations)
deleted sysdev_suspend(), which was being relied on to call
check_wakeup_irqs() in suspend.  If check_wakeup_irqs() is not
called, wake interrupts that are pending when suspend is
entered may be lost.  It also breaks IRQCHIP_MASK_ON_SUSPEND,
which is handled in check_wakeup_irqs().

This patch adds a call to check_wakeup_irqs() in syscore_suspend(),
similar to what was deleted in sysdev_suspend().

Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-11 10:51:49 +02:00
Luming Yu 50f4ddd4ff [CPUFREQ] fix cpumask memory leak in acpi-cpufreq on cpu hotplug.
I came across a memory leak during a cyclic cpu-online-offline test.

Signed-off-by: Yu Luming <luming.yu@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-07-10 17:03:04 -04:00
Linus Torvalds e3bbfa78ba Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
  hwmon: (pmbus) Improve auto-detection of temperature status register
  hwmon: (lm95241) Fix negative temperature results
  hwmon: (lm95241) Fix chip detection code
2011-07-10 10:24:47 -07:00
Guenter Roeck 22e6b2312d hwmon: (pmbus) Improve auto-detection of temperature status register
It is possible that a PMBus device supports the READ_TEMPERATURE2 and/or
READ_TEMPERATURE3 registers but does not support READ_TEMPERATURE1.
Improve temperature status register detection to address this condition.

Reported-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org # 2.6.39+
2011-07-10 08:54:29 -07:00
Guenter Roeck 0c2a40e2fe hwmon: (lm95241) Fix negative temperature results
Negative temperatures were returned in degrees C instead of milli-Degrees C.
Also, negative temperatures were reported for remote temperature sensors even
if the chip was configured for positive-only results.

Fix by detecting temperature modes, and by treating negative temperatures
similar to positive temperatures, with appropriate sign extension.

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org # 2.6.30+
2011-07-10 08:54:15 -07:00
Linus Torvalds aa4c495e3d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix a copmile warning
  ASoC: ak4642: fixup snd_soc_update_bits mask for PW_MGMT2
  ALSA: hda - Change all ADCs for dual-adc switching mode for Realtek
  ASoC: Manage WM8731 ACTIVE bit as a supply widget
  ASoC: Don't set invalid name string to snd_card->driver field
  ASoC: Ensure we delay long enough for WM8994 FLL to lock when starting
  ASoC: Tegra: I2S: Ensure clock is enabled when writing regs
  ASoC: Fix Blackfin I2S _pointer() implementation return in bounds values
  ASoC: tlv320aic3x: Do soft reset to codec when going to bias off state
  ASoC: tlv320aic3x: Don't sync first two registers from register cache
  audio: tlv320aic26: fix PLL register configuration
2011-07-10 07:29:22 -07:00
Linus Torvalds 2169ce92ca Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: conditional resource-reallocation through kernel parameter pci=realloc
2011-07-10 07:28:51 -07:00
Linus Torvalds 7fc7693627 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6994/1: smp_twd: Fix typo in 'twd_timer_rate' printing
  ARM: 6987/1: l2x0: fix disabling function to avoid deadlock
  ARM: 6966/1: ep93xx: fix inverted RTS/DTR signals on uart1
  ARM: 6980/1: mmci: use StartBitErr to detect bad connections
  ARM: 6979/1: mach-vt8500: add forgotten irq_data conversion
  ARM: move memory layout sanity checking before meminfo initialization
  ARM: 6990/1: MAINTAINERS: add entry for ARM PMU profiling and debugging
  ARM: 6989/1: perf: do not start the PMU when no events are present
  ARM: dmabounce: fix map_single() error return value
2011-07-10 07:28:30 -07:00