Go to file

Hyeonggon Yoo 74e63b397d mm, slub: use prefetchw instead of prefetch upstream commit id: `04b4b00613` Commit `0ad9500e16` ("slub: prefetch next freelist pointer in slab_alloc()") introduced prefetch_freepointer() because when other cpu(s) freed objects into a page that current cpu owns, the freelist link is hot on cpu(s) which freed objects and possibly very cold on current cpu. But if freelist link chain is hot on cpu(s) which freed objects, it's better to invalidate that chain because they're not going to access again within a short time. So use prefetchw instead of prefetch. On supported architectures like x86 and arm, it invalidates other copied instances of a cache line when prefetching it. Before: Time: 91.677 Performance counter stats for 'hackbench -g 100 -l 10000': 1462938.07 msec cpu-clock # 15.908 CPUs utilized 18072550 context-switches # 12.354 K/sec 1018814 cpu-migrations # 696.416 /sec 104558 page-faults # 71.471 /sec 1580035699271 cycles # 1.080 GHz (54.51%) 2003670016013 instructions # 1.27 insn per cycle (54.31%) 5702204863 branch-misses (54.28%) 643368500985 cache-references # 439.778 M/sec (54.26%) 18475582235 cache-misses # 2.872 % of all cache refs (54.28%) 642206796636 L1-dcache-loads # 438.984 M/sec (46.87%) 18215813147 L1-dcache-load-misses # 2.84% of all L1-dcache accesses (46.83%) 653842996501 dTLB-loads # 446.938 M/sec (46.63%) 3227179675 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.85%) 537531951350 iTLB-loads # 367.433 M/sec (54.33%) 114750630 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.37%) 630135543177 L1-icache-loads # 430.733 M/sec (46.80%) 22923237620 L1-icache-load-misses # 3.64% of all L1-icache accesses (46.76%) 91.964452802 seconds time elapsed 43.416742000 seconds user 1422.441123000 seconds sys After: Time: 90.220 Performance counter stats for 'hackbench -g 100 -l 10000': 1437418.48 msec cpu-clock # 15.880 CPUs utilized 17694068 context-switches # 12.310 K/sec 958257 cpu-migrations # 666.651 /sec 100604 page-faults # 69.989 /sec 1583259429428 cycles # 1.101 GHz (54.57%) 2004002484935 instructions # 1.27 insn per cycle (54.37%) 5594202389 branch-misses (54.36%) 643113574524 cache-references # 447.409 M/sec (54.39%) 18233791870 cache-misses # 2.835 % of all cache refs (54.37%) 640205852062 L1-dcache-loads # 445.386 M/sec (46.75%) 17968160377 L1-dcache-load-misses # 2.81% of all L1-dcache accesses (46.79%) 651747432274 dTLB-loads # 453.415 M/sec (46.59%) 3127124271 dTLB-load-misses # 0.48% of all dTLB cache accesses (46.75%) 535395273064 iTLB-loads # 372.470 M/sec (54.38%) 113500056 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%) 628871845924 L1-icache-loads # 437.501 M/sec (46.80%) 22585641203 L1-icache-load-misses # 3.59% of all L1-icache accesses (46.79%) 90.514819303 seconds time elapsed 43.877656000 seconds user 1397.176001000 seconds sys [xuyu]: The latency of 'hackbench -g 100 -l 10000' in a x86 guest with 8 CPU and 16G memory is 92.81 (w/o this path) and 91.2018 (w/ this patch), respectively. Link: https://lkml.org/lkml/2021/10/8/598=20 Link: https://lkml.kernel.org/r/20211011144331.70084-1-42.hyeyoo@gmail.com Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Reviewed-by: Gang Deng <gavin.dg@linux.alibaba.com> Signed-off-by: johnnyaiai <johnnyaiai@tencent.com> Reviewed-by: Alex Shi <alexsshi@tencent.com> Reviewed-by: robinlai <robinlai@tencent.com>		2024-06-11 20:41:15 +08:00
Documentation	SPI platform driver support for Phytium desktop CPUS	2024-06-11 20:40:56 +08:00
LICENSES	LICENSES: Rename other to deprecated	2019-05-03 06:34:32 -06:00
arch	xen/arm: Fix race in RB-tree based P2M accounting	2024-06-11 20:41:09 +08:00
block	block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern	2024-06-11 20:41:13 +08:00
certs	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
crypto	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
dist	dist: remove leds from filter out directory	2024-06-11 20:41:06 +08:00
drivers	Hawei TM280 ethernet card can't resolve ipip internal packet, this	2024-06-11 20:41:14 +08:00
fs	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
include	Update pci_ids.h with the vendor ID for Phytium.	2024-06-11 20:40:55 +08:00
init	irqlatency: add irq latency monitor support	2024-06-11 20:40:51 +08:00
ipc	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
kernel	cgroup: Use open-time credentials for process migraton perm checks	2024-06-11 20:41:13 +08:00
lib	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
mm	mm, slub: use prefetchw instead of prefetch	2024-06-11 20:41:15 +08:00
net	net/proc: added sockets details statistics	2024-06-11 20:41:14 +08:00
package	config/ARM64: Add config.performance for highest performing version	2024-06-11 20:41:15 +08:00
samples	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
scripts	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
security	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
sound	I2S driver support for Phytium CPUs	2024-06-11 20:41:05 +08:00
tools	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
usr	tkernel: add base tlinux kernel interfaces	2024-06-11 20:09:33 +08:00
virt	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
.clang-format	clang-format: Update with the latest for_each macro list	2019-08-31 10:00:51 +02:00
.cocciconfig	…
.get_maintainer.ignore	Opt out of scripts/get_maintainer.pl	2019-05-16 10:53:40 -07:00
.gitattributes	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
.gitignore	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
COPYING	COPYING: use the new text with points to the license files	2018-03-23 12:41:45 -06:00
CREDITS	MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer	2019-10-10 08:12:51 -07:00
Kbuild	kbuild: do not descend to ./Kbuild when cleaning	2019-08-21 21:03:58 +09:00
Kconfig	docs: kbuild: convert docs to ReST and rename to *.rst	2019-06-14 14:21:21 -06:00
MAINTAINERS	Phytium JPEG Encoder driver	2024-06-11 20:41:01 +08:00
Makefile	ock: sync codes to ock 5.4.119-20.0009.21	2024-06-11 20:27:38 +08:00
README	Drop all 00-INDEX files from Documentation/	2018-09-09 15:08:58 -06:00
README.md	tkernel: add base tlinux kernel interfaces	2024-06-11 20:09:33 +08:00
backport_remove_lists.txt	tkernel: add base tlinux kernel interfaces	2024-06-11 20:09:33 +08:00
tools_key.pub	tkernel: add base tlinux kernel interfaces	2024-06-11 20:09:33 +08:00

README.md

Tencent Linux Kernel 4.0