linux-sg2042/include
Martin KaFai Lau 96eabe7a40 bpf: Allow selecting numa node during map creation
The current map creation API does not allow to provide the numa-node
preference.  The memory usually comes from where the map-creation-process
is running.  The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.

One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps).  Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:

[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]

># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec  #<<<

After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec
4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec
7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec
0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<<

This patch adds one field, numa_node, to the bpf_attr.  Since numa node 0
is a valid node, a new flag BPF_F_NUMA_NODE is also added.  The numa_node
field is honored if and only if the BPF_F_NUMA_NODE flag is set.

Numa node selection is not supported for percpu map.

This patch does not change all the kmalloc.  F.e.
'htab = kzalloc()' is not changed since the object
is small enough to stay in the cache.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 21:35:43 -07:00
..
acpi ACPI: NUMA: add missing include in acpi_numa.h 2017-07-24 22:27:43 +02:00
asm-generic mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem 2017-08-10 15:54:07 -07:00
clocksource
crypto
drm i915, amd and some core fixes + mediatek color support 2017-07-13 11:26:18 -07:00
dt-bindings Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-07-15 10:59:54 -07:00
keys
kvm KVM: arm/arm64: PMU: Fix overflow interrupt injection 2017-07-25 14:18:01 +01:00
linux bpf: Allow selecting numa node during map creation 2017-08-19 21:35:43 -07:00
math-emu
media media: platform: davinci: drop VPFE_CMD_S_CCDC_RAW_PARAMS 2017-07-26 06:14:33 -04:00
memory
misc cxl: Export library to support IBM XSL 2017-07-03 23:07:03 +10:00
net ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t 2017-08-18 15:14:07 -07:00
pcmcia
ras
rdma IB/cma: Fix reference count leak when no ipv4 addresses are set 2017-07-20 11:24:13 -04:00
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2017-07-13 14:27:32 -07:00
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
sound ASoC: fix pcm-creation regression 2017-07-17 15:50:32 +01:00
target iscsi-target: Fix iscsi_np reset hung task during parallel delete 2017-08-06 14:41:41 -07:00
trace xdp: adjust xdp redirect tracepoint to include return error code 2017-08-18 16:18:40 -07:00
uapi bpf: Allow selecting numa node during map creation 2017-08-19 21:35:43 -07:00
video
xen xen/balloon: don't online new memory initially 2017-07-23 08:13:18 +02:00