OpenCloudOS-Kernel/kernel/bpf
Martin KaFai Lau 695ba2651a bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
After doing map_perf_test with a much bigger
BPF_F_NO_COMMON_LRU map, the perf report shows a
lot of time spent in rotating the inactive list (i.e.
__bpf_lru_list_rotate_inactive):
> map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}'
19644783 (19M/s)
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
6283930 (6.28M/s)

By inactive, it usually means the element is not in cache.  Hence,
there is a need to tune the PERCPU_NR_SCANS value.

This patch finds a better number of elements to
scan during each list rotation.  The PERCPU_NR_SCANS (which
is defined the same as PERCPU_FREE_TARGET) decreases
from 16 elements to 4 elements.  This change only
affects the BPF_F_NO_COMMON_LRU map.

The test_lru_dist does not show meaningful difference
between 16 and 4.  Our production L4 load balancer which uses
the LRU map for conntrack-ing also shows little change in cache
hit rate.  Since both benchmark and production data show no
cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
We can consider making it configurable if we find a usecase
later that shows another value works better and/or use
a different rotation strategy.

After this change:
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
9240324 (9.2M/s)

i.e. 6.28M/s -> 9.2M/s

The test_lru_dist has not shown meaningful difference:
> test_lru_dist zipf.100k.a1_01.out 4000 1:
nr_misses: 31575 (Before) vs 31566 (After)

> test_lru_dist zipf.100k.a0_01.out 40000 1
nr_misses: 67036 (Before) vs 67031 (After)

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
..
Makefile bpf: Add array of maps support 2017-03-22 15:45:45 -07:00
arraymap.c bpf: remove struct bpf_map_type_list 2017-04-11 14:38:43 -04:00
bpf_lru_list.c bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 2017-04-17 13:55:52 -04:00
bpf_lru_list.h bpf: Add percpu LRU list 2016-11-15 11:50:20 -05:00
cgroup.c bpf: pass sk to helper functions 2017-04-11 14:54:19 -04:00
core.c bpf: reference may_access_skb() from __bpf_prog_run() 2017-04-11 10:54:27 -04:00
hashtab.c bpf: remove struct bpf_map_type_list 2017-04-11 14:38:43 -04:00
helpers.c bpf: rename ARG_PTR_TO_STACK 2017-01-09 16:56:27 -05:00
inode.c bpf: add initial bpf tracepoints 2017-01-25 13:17:47 -05:00
lpm_trie.c bpf: remove struct bpf_map_type_list 2017-04-11 14:38:43 -04:00
map_in_map.c bpf: Add array of maps support 2017-03-22 15:45:45 -07:00
map_in_map.h bpf: Add array of maps support 2017-03-22 15:45:45 -07:00
percpu_freelist.c bpf: introduce percpu_freelist 2016-03-08 15:28:31 -05:00
percpu_freelist.h bpf: introduce percpu_freelist 2016-03-08 15:28:31 -05:00
stackmap.c bpf: remove struct bpf_map_type_list 2017-04-11 14:38:43 -04:00
syscall.c bpf: remove struct bpf_map_type_list 2017-04-11 14:38:43 -04:00
verifier.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-04-06 08:24:51 -07:00