It is required to have an early static cpu to node mapping. This patch
pins all possible cpus to nodes for which no topology information is
present. Since there is no interface available which would allow to
tell where a non-present cpu would appear topology-wise, simply use a
round robin algorithm.
Right now this makes sure that the cpu_to_node() function will return
the same value for a cpu during the life time of the system.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
CPU topology information like cpu to node mapping must be setup in
setup_arch already. Topology information is currently made available
with a per cpu variable; this however will not work when the
initialization will be moved to setup_arch, since the generic percpu
setup will be done much later.
Therefore convert back to a cpu_topology array.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The toptree algorithm uses the physical core ids to create a mapping
between cores and nodes (to_node_id array within emu_cores structure).
The core ids are used as an index into an array which size depends on
CONFIG_NR_CPUS. If the physical core ids are larger, this will result
in out-of-bounds write accesses.
Generate logical core ids instead to avoid this.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make sure that only those nodes appear in the node_possible_map that
may actually be used. Usually that means that the node online and
possible maps are identical. For mode "plain" we only have one node,
for mode "emu" we have "emu_nodes" nodes.
Before this the possible map included (with default config) 16 nodes
while usually only one was used. That made a couple of loops that
iterated over all possible nodes do more work than necessary.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The z13 machine added a fourth level to the cpu topology
information. The new top level is called drawer.
A drawer contains two books, which used to be the top level.
Adding this additional scheduling domain did show performance
improvements for some workloads of up to 8%, while there don't
seem to be any workloads impacted in a negative way.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 3e89e1c5ea ("hugetlb: make mm and fs code explicitly non-modular")
moves hugetlb_init() from module_init to subsys_initcall.
The hugetlb_init()->hugetlb_register_node() code accesses "node->dev.kobj"
which is initialized in numa_init_late().
Since numa_init_late() is a device_initcall which is called *after*
subsys_initcall the above mentioned patch breaks NUMA on s390.
So fix this and move numa_init_late() to arch_initcall.
Fixes: 3e89e1c5ea ("hugetlb: make mm and fs code explicitly non-modular")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
memblock_alloc() and memblock_alloc_base() will panic on their own if
they can't find free memory. Therefore remove some pointless checks.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Allocating memory with a requested minimum alignment of 1 is wrong
since pg_data_t contains a spinlock which requires an alignment of 4
bytes.
Therefore fix this and ask for an alignment of 8 bytes like it is
guarenteed for all kmalloc requests.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With CONFIG_CPUMASK_OFFSTACK=y cpumask_var_t is a pointer to a CPU mask.
Replace the incorrect type for node_to_cpumask_map with cpumask_t.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The core to node mapping data consumes about 2 KB bss data. To save memory
for the non-NUMA case, make the data dynamic. In addition change the
"core_to_node" array from "int" to "s32" which saves 1 KB also for the
NUMA case.
Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since we are already protected by the "sched_domains_mutex" lock, we can
safely remove the topology lock.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
NUMA emulation (aka fake NUMA) distributes the available memory to nodes
without using real topology information about the physical memory of the
machine.
Splitting the system memory into nodes replicates the memory management
structures for each node. Particularly each node has its own "mm locks"
and its own "kswapd" task.
For large systems, under certain conditions, this results in improved
system performance and/or latency based on reduced pressure on the mm
locks and the kswapd tasks.
NUMA emulation distributes CPUs to nodes while respecting the original
machine topology information. This is done by trying to avoid to separate
CPUs which reside on the same book or even on the same MC. Because the
current Linux scheduler code requires a stable cpu to node mapping, cores
are pinned to nodes when the first CPU thread is set online.
This patch is based on the initial implementation from Philipp Hachtmann.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
NUMA emulation needs proper means to mangle the book/mc/core topology
of the machine. The topology tree (toptree) consistently maintains cpu
masks for the root, each node, and all leaves of the tree while the
user may use the toptree functions to rearrange the tree in various
ways.
This patch contains several changes from Michael Holzheu.
Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Enable core NUMA support for s390 and add one simple default mode "plain"
that creates one single NUMA node.
This patch contains several changes from Michael Holzheu.
Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>