x86, ACPI, mm: Revert movablemem_map support

Tim found:

  WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
  Hardware name: S2600CP
  sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
  smpboot: Booting Node   1, Processors  #1
  Modules linked in:
  Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
  Call Trace:
    set_cpu_sibling_map+0x279/0x449
    start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
	nodes_clear(numa_nodes_parsed)
	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
   can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
   and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
   a. that early_parse_srat is NOT called for ia64, so you break ia64.
   b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
	     set_apicid_to_node(i, NUMA_NO_NODE)
     still left in numa_init. So it will just clear result from early_parse_srat.
     it should be moved before that....
   c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
       early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
   but it changes critical x86 code. It caused x86 guys did not
   pay attention to find the problem early. Those patches really should
   be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
  a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
  b. initrd... it will be freed after booting, so it could be on movable...
  c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
	anymore.
  d. init_mem_mapping: can not put page table high anymore.
  e. initmem_init: vmemmap can not be high local node anymore. That is
     not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

 f7210e6c4a ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
    protect movablecore_map in memblock_overlaps_region().")

 01a178a94e ("acpi, memory-hotplug: support getting hotplug info from
    SRAT")

 27168d38fa ("acpi, memory-hotplug: extend movablemem_map ranges to
    the end of node")

 e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock is
    ready")

 fb06bc8e5f ("page_alloc: bootmem limit with movablecore_map")

 42f47e27e7 ("page_alloc: make movablemem_map have higher priority")

 6981ec3114 ("page_alloc: introduce zone_movable_limit[] to keep
    movable limit for nodes")

 34b71f1e04 ("page_alloc: add movable_memmap kernel parameter")

 4d59a75125 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0.  Also
need to find way to put other kernel used ram to local node ram.

Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: Don Morris <don.morris@hp.com>
Bisected-by: Don Morris <don.morris@hp.com>
Tested-by: Don Morris <don.morris@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Yinghai Lu 2013-03-01 14:51:27 -08:00 committed by Linus Torvalds
parent 14cc0b55b7
commit 20e6926dcb
10 changed files with 27 additions and 544 deletions

View File

@ -1645,42 +1645,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
movablemem_map=acpi
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
This option inform the kernel to use Hot Pluggable bit
in flags from SRAT from ACPI BIOS to determine which
memory devices could be hotplugged. The corresponding
memory ranges will be set as ZONE_MOVABLE.
NOTE: Whatever node the kernel resides in will always
be un-hotpluggable.
movablemem_map=nn[KMG]@ss[KMG]
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
If user specifies memory ranges, the info in SRAT will
be ingored. And it works like the following:
- If more ranges are all within one node, then from
lowest ss to the end of the node will be ZONE_MOVABLE.
- If a range is within a node, then from ss to the end
of the node will be ZONE_MOVABLE.
- If a range covers two or more nodes, then from ss to
the end of the 1st node will be ZONE_MOVABLE, and all
the rest nodes will only have ZONE_MOVABLE.
If memmap is specified at the same time, the
movablemem_map will be limited within the memmap
areas. If kernelcore or movablecore is also specified,
movablemem_map will have higher priority to be
satisfied. So the administrator should be careful that
the amount of movablemem_map areas are not too large.
Otherwise kernel won't have enough memory to start.
NOTE: We don't stop users specifying the node the
kernel resides in as hotpluggable so that this
option can be used as a workaround of firmware
bugs.
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>

View File

@ -1056,15 +1056,6 @@ void __init setup_arch(char **cmdline_p)
setup_bios_corruption_check();
#endif
/*
* In the memory hotplug case, the kernel needs info from SRAT to
* determine which memory is hotpluggable before allocating memory
* using memblock.
*/
acpi_boot_table_init();
early_acpi_boot_init();
early_parse_srat();
#ifdef CONFIG_X86_32
printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
(max_pfn_mapped<<PAGE_SHIFT) - 1);
@ -1110,6 +1101,10 @@ void __init setup_arch(char **cmdline_p)
/*
* Parse the ACPI tables for possible boot-time SMP configuration.
*/
acpi_boot_table_init();
early_acpi_boot_init();
initmem_init();
memblock_find_dma_reserve();

View File

@ -212,9 +212,10 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
* Allocate node data. Try node-local memory and then any node.
* Never allocate in DMA zone.
*/
nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
pr_err("Cannot find %zu bytes in any node\n", nd_size);
pr_err("Cannot find %zu bytes in node %d\n",
nd_size, nid);
return;
}
nd = __va(nd_pa);
@ -559,12 +560,10 @@ static int __init numa_init(int (*init_func)(void))
for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE);
/*
* Do not clear numa_nodes_parsed or zero numa_meminfo here, because
* SRAT was parsed earlier in early_parse_srat().
*/
nodes_clear(numa_nodes_parsed);
nodes_clear(node_possible_map);
nodes_clear(node_online_map);
memset(&numa_meminfo, 0, sizeof(numa_meminfo));
WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
numa_reset_distance();

View File

@ -141,126 +141,11 @@ static inline int save_add_info(void) {return 1;}
static inline int save_add_info(void) {return 0;}
#endif
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
static void __init
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
int overlap, i;
unsigned long start_pfn, end_pfn;
start_pfn = PFN_DOWN(start);
end_pfn = PFN_UP(end);
/*
* For movablemem_map=acpi:
*
* SRAT: |_____| |_____| |_________| |_________| ......
* node id: 0 1 1 2
* hotpluggable: n y y n
* movablemem_map: |_____| |_________|
*
* Using movablemem_map, we can prevent memblock from allocating memory
* on ZONE_MOVABLE at boot time.
*
* Before parsing SRAT, memblock has already reserve some memory ranges
* for other purposes, such as for kernel image. We cannot prevent
* kernel from using these memory, so we need to exclude these memory
* even if it is hotpluggable.
* Furthermore, to ensure the kernel has enough memory to boot, we make
* all the memory on the node which the kernel resides in
* un-hotpluggable.
*/
if (hotpluggable && movablemem_map.acpi) {
/* Exclude ranges reserved by memblock. */
struct memblock_type *rgn = &memblock.reserved;
for (i = 0; i < rgn->cnt; i++) {
if (end <= rgn->regions[i].base ||
start >= rgn->regions[i].base +
rgn->regions[i].size)
continue;
/*
* If the memory range overlaps the memory reserved by
* memblock, then the kernel resides in this node.
*/
node_set(node, movablemem_map.numa_nodes_kernel);
goto out;
}
/*
* If the kernel resides in this node, then the whole node
* should not be hotpluggable.
*/
if (node_isset(node, movablemem_map.numa_nodes_kernel))
goto out;
insert_movablemem_map(start_pfn, end_pfn);
/*
* numa_nodes_hotplug nodemask represents which nodes are put
* into movablemem_map.map[].
*/
node_set(node, movablemem_map.numa_nodes_hotplug);
goto out;
}
/*
* For movablemem_map=nn[KMG]@ss[KMG]:
*
* SRAT: |_____| |_____| |_________| |_________| ......
* node id: 0 1 1 2
* user specified: |__| |___|
* movablemem_map: |___| |_________| |______| ......
*
* Using movablemem_map, we can prevent memblock from allocating memory
* on ZONE_MOVABLE at boot time.
*
* NOTE: In this case, SRAT info will be ingored.
*/
overlap = movablemem_map_overlap(start_pfn, end_pfn);
if (overlap >= 0) {
/*
* If part of this range is in movablemem_map, we need to
* add the range after it to extend the range to the end
* of the node, because from the min address specified to
* the end of the node will be ZONE_MOVABLE.
*/
start_pfn = max(start_pfn,
movablemem_map.map[overlap].start_pfn);
insert_movablemem_map(start_pfn, end_pfn);
/*
* Set the nodemask, so that if the address range on one node
* is not continuse, we can add the subsequent ranges on the
* same node into movablemem_map.
*/
node_set(node, movablemem_map.numa_nodes_hotplug);
} else {
if (node_isset(node, movablemem_map.numa_nodes_hotplug))
/*
* Insert the range if we already have movable ranges
* on the same node.
*/
insert_movablemem_map(start_pfn, end_pfn);
}
out:
return;
}
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static inline void
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
int __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
u64 start, end;
u32 hotpluggable;
int node, pxm;
if (srat_disabled())
@ -269,8 +154,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
goto out_err_bad_srat;
if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
goto out_err;
hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
if (hotpluggable && !save_add_info())
if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
goto out_err;
start = ma->base_address;
@ -290,12 +174,9 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
node_set(node, numa_nodes_parsed);
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
node, pxm,
(unsigned long long) start, (unsigned long long) end - 1,
hotpluggable ? "Hot Pluggable": "");
handle_movablemem(node, start, end, hotpluggable);
(unsigned long long) start, (unsigned long long) end - 1);
return 0;
out_err_bad_srat:

View File

@ -282,10 +282,10 @@ acpi_table_parse_srat(enum acpi_srat_type id,
handler, max_entries);
}
static int srat_mem_cnt;
void __init early_parse_srat(void)
int __init acpi_numa_init(void)
{
int cnt = 0;
/*
* Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
* SRAT cpu entries could have different order with that in MADT.
@ -298,21 +298,18 @@ void __init early_parse_srat(void)
acpi_parse_x2apic_affinity, 0);
acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
acpi_parse_processor_affinity, 0);
srat_mem_cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity,
NR_NODE_MEMBLKS);
}
}
int __init acpi_numa_init(void)
{
/* SLIT: System Locality Information Table */
acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
acpi_numa_arch_fixup();
if (srat_mem_cnt < 0)
return srat_mem_cnt;
if (cnt < 0)
return cnt;
else if (!parsed_numa_memblks)
return -ENOENT;
return 0;

View File

@ -485,14 +485,6 @@ static inline bool acpi_driver_match_device(struct device *dev,
#endif /* !CONFIG_ACPI */
#ifdef CONFIG_ACPI_NUMA
void __init early_parse_srat(void);
#else
static inline void early_parse_srat(void)
{
}
#endif
#ifdef CONFIG_ACPI
void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state,
u32 pm1a_ctrl, u32 pm1b_ctrl));

View File

@ -42,7 +42,6 @@ struct memblock {
extern struct memblock memblock;
extern int memblock_debug;
extern struct movablemem_map movablemem_map;
#define memblock_dbg(fmt, ...) \
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
@ -61,7 +60,6 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
void memblock_trim_memory(phys_addr_t align);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
unsigned long *out_end_pfn, int *out_nid);

View File

@ -1333,24 +1333,6 @@ extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
extern void sparse_memory_present_with_active_regions(int nid);
#define MOVABLEMEM_MAP_MAX MAX_NUMNODES
struct movablemem_entry {
unsigned long start_pfn; /* start pfn of memory segment */
unsigned long end_pfn; /* end pfn of memory segment (exclusive) */
};
struct movablemem_map {
bool acpi; /* true if using SRAT info */
int nr_map;
struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
nodemask_t numa_nodes_hotplug; /* on which nodes we specify memory */
nodemask_t numa_nodes_kernel; /* on which nodes kernel resides in */
};
extern void __init insert_movablemem_map(unsigned long start_pfn,
unsigned long end_pfn);
extern int __init movablemem_map_overlap(unsigned long start_pfn,
unsigned long end_pfn);
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
#if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \

View File

@ -92,58 +92,9 @@ static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
*
* Find @size free area aligned to @align in the specified range and node.
*
* If we have CONFIG_HAVE_MEMBLOCK_NODE_MAP defined, we need to check if the
* memory we found if not in hotpluggable ranges.
*
* RETURNS:
* Found address on success, %0 on failure.
*/
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
phys_addr_t end, phys_addr_t size,
phys_addr_t align, int nid)
{
phys_addr_t this_start, this_end, cand;
u64 i;
int curr = movablemem_map.nr_map - 1;
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
end = memblock.current_limit;
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
end = max(start, end);
for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL) {
this_start = clamp(this_start, start, end);
this_end = clamp(this_end, start, end);
restart:
if (this_end <= this_start || this_end < size)
continue;
for (; curr >= 0; curr--) {
if ((movablemem_map.map[curr].start_pfn << PAGE_SHIFT)
< this_end)
break;
}
cand = round_down(this_end - size, align);
if (curr >= 0 &&
cand < movablemem_map.map[curr].end_pfn << PAGE_SHIFT) {
this_end = movablemem_map.map[curr].start_pfn
<< PAGE_SHIFT;
goto restart;
}
if (cand >= this_start)
return cand;
}
return 0;
}
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
phys_addr_t end, phys_addr_t size,
phys_addr_t align, int nid)
@ -172,7 +123,6 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
}
return 0;
}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
/**
* memblock_find_in_range - find free area in given range

View File

@ -202,18 +202,11 @@ static unsigned long __meminitdata nr_all_pages;
static unsigned long __meminitdata dma_reserve;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
/* Movable memory ranges, will also be used by memblock subsystem. */
struct movablemem_map movablemem_map = {
.acpi = false,
.nr_map = 0,
};
static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
static unsigned long __initdata required_kernelcore;
static unsigned long __initdata required_movablecore;
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
int movable_zone;
@ -4412,77 +4405,6 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
}
/**
* sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
*
* zone_movable_limit is initialized as 0. This function will try to get
* the first ZONE_MOVABLE pfn of each node from movablemem_map, and
* assigne them to zone_movable_limit.
* zone_movable_limit[nid] == 0 means no limit for the node.
*
* Note: Each range is represented as [start_pfn, end_pfn)
*/
static void __meminit sanitize_zone_movable_limit(void)
{
int map_pos = 0, i, nid;
unsigned long start_pfn, end_pfn;
if (!movablemem_map.nr_map)
return;
/* Iterate all ranges from minimum to maximum */
for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
/*
* If we have found lowest pfn of ZONE_MOVABLE of the node
* specified by user, just go on to check next range.
*/
if (zone_movable_limit[nid])
continue;
#ifdef CONFIG_ZONE_DMA
/* Skip DMA memory. */
if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
#endif
#ifdef CONFIG_ZONE_DMA32
/* Skip DMA32 memory. */
if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
#endif
#ifdef CONFIG_HIGHMEM
/* Skip lowmem if ZONE_MOVABLE is highmem. */
if (zone_movable_is_highmem() &&
start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
#endif
if (start_pfn >= end_pfn)
continue;
while (map_pos < movablemem_map.nr_map) {
if (end_pfn <= movablemem_map.map[map_pos].start_pfn)
break;
if (start_pfn >= movablemem_map.map[map_pos].end_pfn) {
map_pos++;
continue;
}
/*
* The start_pfn of ZONE_MOVABLE is either the minimum
* pfn specified by movablemem_map, or 0, which means
* the node has no ZONE_MOVABLE.
*/
zone_movable_limit[nid] = max(start_pfn,
movablemem_map.map[map_pos].start_pfn);
break;
}
}
}
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
unsigned long zone_type,
@ -4500,6 +4422,7 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
return zholes_size[zone_type];
}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
@ -4941,19 +4864,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
required_kernelcore = max(required_kernelcore, corepages);
}
/*
* If neither kernelcore/movablecore nor movablemem_map is specified,
* there is no ZONE_MOVABLE. But if movablemem_map is specified, the
* start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
*/
if (!required_kernelcore) {
if (movablemem_map.nr_map)
memcpy(zone_movable_pfn, zone_movable_limit,
sizeof(zone_movable_pfn));
/* If kernelcore was not specified, there is no ZONE_MOVABLE */
if (!required_kernelcore)
goto out;
}
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
find_usable_zone_for_movable();
usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
restart:
@ -4981,24 +4897,10 @@ restart:
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
unsigned long size_pages;
/*
* Find more memory for kernelcore in
* [zone_movable_pfn[nid], zone_movable_limit[nid]).
*/
start_pfn = max(start_pfn, zone_movable_pfn[nid]);
if (start_pfn >= end_pfn)
continue;
if (zone_movable_limit[nid]) {
end_pfn = min(end_pfn, zone_movable_limit[nid]);
/* No range left for kernelcore in this node */
if (start_pfn >= end_pfn) {
zone_movable_pfn[nid] =
zone_movable_limit[nid];
break;
}
}
/* Account for what is only usable for kernelcore */
if (start_pfn < usable_startpfn) {
unsigned long kernel_pages;
@ -5058,12 +4960,12 @@ restart:
if (usable_nodes && required_kernelcore > usable_nodes)
goto restart;
out:
/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
for (nid = 0; nid < MAX_NUMNODES; nid++)
zone_movable_pfn[nid] =
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
out:
/* restore the node_state */
node_states[N_MEMORY] = saved_node_state;
}
@ -5126,8 +5028,6 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
/* Find the PFNs that ZONE_MOVABLE begins at in each node */
memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
find_usable_zone_for_movable();
sanitize_zone_movable_limit();
find_zone_movable_pfns_for_nodes();
/* Print out the zone ranges */
@ -5211,181 +5111,6 @@ static int __init cmdline_parse_movablecore(char *p)
early_param("kernelcore", cmdline_parse_kernelcore);
early_param("movablecore", cmdline_parse_movablecore);
/**
* movablemem_map_overlap() - Check if a range overlaps movablemem_map.map[].
* @start_pfn: start pfn of the range to be checked
* @end_pfn: end pfn of the range to be checked (exclusive)
*
* This function checks if a given memory range [start_pfn, end_pfn) overlaps
* the movablemem_map.map[] array.
*
* Return: index of the first overlapped element in movablemem_map.map[]
* or -1 if they don't overlap each other.
*/
int __init movablemem_map_overlap(unsigned long start_pfn,
unsigned long end_pfn)
{
int overlap;
if (!movablemem_map.nr_map)
return -1;
for (overlap = 0; overlap < movablemem_map.nr_map; overlap++)
if (start_pfn < movablemem_map.map[overlap].end_pfn)
break;
if (overlap == movablemem_map.nr_map ||
end_pfn <= movablemem_map.map[overlap].start_pfn)
return -1;
return overlap;
}
/**
* insert_movablemem_map - Insert a memory range in to movablemem_map.map.
* @start_pfn: start pfn of the range
* @end_pfn: end pfn of the range
*
* This function will also merge the overlapped ranges, and sort the array
* by start_pfn in monotonic increasing order.
*/
void __init insert_movablemem_map(unsigned long start_pfn,
unsigned long end_pfn)
{
int pos, overlap;
/*
* pos will be at the 1st overlapped range, or the position
* where the element should be inserted.
*/
for (pos = 0; pos < movablemem_map.nr_map; pos++)
if (start_pfn <= movablemem_map.map[pos].end_pfn)
break;
/* If there is no overlapped range, just insert the element. */
if (pos == movablemem_map.nr_map ||
end_pfn < movablemem_map.map[pos].start_pfn) {
/*
* If pos is not the end of array, we need to move all
* the rest elements backward.
*/
if (pos < movablemem_map.nr_map)
memmove(&movablemem_map.map[pos+1],
&movablemem_map.map[pos],
sizeof(struct movablemem_entry) *
(movablemem_map.nr_map - pos));
movablemem_map.map[pos].start_pfn = start_pfn;
movablemem_map.map[pos].end_pfn = end_pfn;
movablemem_map.nr_map++;
return;
}
/* overlap will be at the last overlapped range */
for (overlap = pos + 1; overlap < movablemem_map.nr_map; overlap++)
if (end_pfn < movablemem_map.map[overlap].start_pfn)
break;
/*
* If there are more ranges overlapped, we need to merge them,
* and move the rest elements forward.
*/
overlap--;
movablemem_map.map[pos].start_pfn = min(start_pfn,
movablemem_map.map[pos].start_pfn);
movablemem_map.map[pos].end_pfn = max(end_pfn,
movablemem_map.map[overlap].end_pfn);
if (pos != overlap && overlap + 1 != movablemem_map.nr_map)
memmove(&movablemem_map.map[pos+1],
&movablemem_map.map[overlap+1],
sizeof(struct movablemem_entry) *
(movablemem_map.nr_map - overlap - 1));
movablemem_map.nr_map -= overlap - pos;
}
/**
* movablemem_map_add_region - Add a memory range into movablemem_map.
* @start: physical start address of range
* @end: physical end address of range
*
* This function transform the physical address into pfn, and then add the
* range into movablemem_map by calling insert_movablemem_map().
*/
static void __init movablemem_map_add_region(u64 start, u64 size)
{
unsigned long start_pfn, end_pfn;
/* In case size == 0 or start + size overflows */
if (start + size <= start)
return;
if (movablemem_map.nr_map >= ARRAY_SIZE(movablemem_map.map)) {
pr_err("movablemem_map: too many entries;"
" ignoring [mem %#010llx-%#010llx]\n",
(unsigned long long) start,
(unsigned long long) (start + size - 1));
return;
}
start_pfn = PFN_DOWN(start);
end_pfn = PFN_UP(start + size);
insert_movablemem_map(start_pfn, end_pfn);
}
/*
* cmdline_parse_movablemem_map - Parse boot option movablemem_map.
* @p: The boot option of the following format:
* movablemem_map=nn[KMG]@ss[KMG]
*
* This option sets the memory range [ss, ss+nn) to be used as movable memory.
*
* Return: 0 on success or -EINVAL on failure.
*/
static int __init cmdline_parse_movablemem_map(char *p)
{
char *oldp;
u64 start_at, mem_size;
if (!p)
goto err;
if (!strcmp(p, "acpi"))
movablemem_map.acpi = true;
/*
* If user decide to use info from BIOS, all the other user specified
* ranges will be ingored.
*/
if (movablemem_map.acpi) {
if (movablemem_map.nr_map) {
memset(movablemem_map.map, 0,
sizeof(struct movablemem_entry)
* movablemem_map.nr_map);
movablemem_map.nr_map = 0;
}
return 0;
}
oldp = p;
mem_size = memparse(p, &p);
if (p == oldp)
goto err;
if (*p == '@') {
oldp = ++p;
start_at = memparse(p, &p);
if (p == oldp || *p != '\0')
goto err;
movablemem_map_add_region(start_at, mem_size);
return 0;
}
err:
return -EINVAL;
}
early_param("movablemem_map", cmdline_parse_movablemem_map);
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
/**