Go to file
Hyeonggon Yoo 74e63b397d mm, slub: use prefetchw instead of prefetch
upstream commit id: 04b4b00613

Commit 0ad9500e16 ("slub: prefetch next freelist pointer in
slab_alloc()") introduced prefetch_freepointer() because when other
cpu(s) freed objects into a page that current cpu owns, the freelist
link is hot on cpu(s) which freed objects and possibly very cold on
current cpu.

But if freelist link chain is hot on cpu(s) which freed objects, it's
better to invalidate that chain because they're not going to access
again within a short time.

So use prefetchw instead of prefetch.  On supported architectures like
x86 and arm, it invalidates other copied instances of a cache line when
prefetching it.

Before:

Time: 91.677

 Performance counter stats for 'hackbench -g 100 -l 10000':
        1462938.07 msec cpu-clock                 #   15.908 CPUs utilized
          18072550      context-switches          #   12.354 K/sec
           1018814      cpu-migrations            #  696.416 /sec
            104558      page-faults               #   71.471 /sec
     1580035699271      cycles                    #    1.080 GHz                      (54.51%)
     2003670016013      instructions              #    1.27  insn per cycle           (54.31%)
        5702204863      branch-misses                                                 (54.28%)
      643368500985      cache-references          #  439.778 M/sec                    (54.26%)
       18475582235      cache-misses              #    2.872 % of all cache refs      (54.28%)
      642206796636      L1-dcache-loads           #  438.984 M/sec                    (46.87%)
       18215813147      L1-dcache-load-misses     #    2.84% of all L1-dcache accesses  (46.83%)
      653842996501      dTLB-loads                #  446.938 M/sec                    (46.63%)
        3227179675      dTLB-load-misses          #    0.49% of all dTLB cache accesses  (46.85%)
      537531951350      iTLB-loads                #  367.433 M/sec                    (54.33%)
         114750630      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.37%)
      630135543177      L1-icache-loads           #  430.733 M/sec                    (46.80%)
       22923237620      L1-icache-load-misses     #    3.64% of all L1-icache accesses  (46.76%)

      91.964452802 seconds time elapsed

      43.416742000 seconds user
    1422.441123000 seconds sys

After:

Time: 90.220

 Performance counter stats for 'hackbench -g 100 -l 10000':
        1437418.48 msec cpu-clock                 #   15.880 CPUs utilized
          17694068      context-switches          #   12.310 K/sec
            958257      cpu-migrations            #  666.651 /sec
            100604      page-faults               #   69.989 /sec
     1583259429428      cycles                    #    1.101 GHz                      (54.57%)
     2004002484935      instructions              #    1.27  insn per cycle           (54.37%)
        5594202389      branch-misses                                                 (54.36%)
      643113574524      cache-references          #  447.409 M/sec                    (54.39%)
       18233791870      cache-misses              #    2.835 % of all cache refs      (54.37%)
      640205852062      L1-dcache-loads           #  445.386 M/sec                    (46.75%)
       17968160377      L1-dcache-load-misses     #    2.81% of all L1-dcache accesses  (46.79%)
      651747432274      dTLB-loads                #  453.415 M/sec                    (46.59%)
        3127124271      dTLB-load-misses          #    0.48% of all dTLB cache accesses  (46.75%)
      535395273064      iTLB-loads                #  372.470 M/sec                    (54.38%)
         113500056      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.35%)
      628871845924      L1-icache-loads           #  437.501 M/sec                    (46.80%)
       22585641203      L1-icache-load-misses     #    3.59% of all L1-icache accesses  (46.79%)

      90.514819303 seconds time elapsed

      43.877656000 seconds user
    1397.176001000 seconds sys

[xuyu]:
The latency of 'hackbench -g 100 -l 10000' in a x86 guest with 8 CPU
and 16G memory is 92.81 (w/o this path) and 91.2018 (w/ this patch),
respectively.

Link: https://lkml.org/lkml/2021/10/8/598=20
Link: https://lkml.kernel.org/r/20211011144331.70084-1-42.hyeyoo@gmail.com
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Reviewed-by: Gang Deng <gavin.dg@linux.alibaba.com>
Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:15 +08:00
Documentation SPI platform driver support for Phytium desktop CPUS 2024-06-11 20:40:56 +08:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch xen/arm: Fix race in RB-tree based P2M accounting 2024-06-11 20:41:09 +08:00
block block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern 2024-06-11 20:41:13 +08:00
certs ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
crypto ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
dist dist: remove leds from filter out directory 2024-06-11 20:41:06 +08:00
drivers Hawei TM280 ethernet card can't resolve ipip internal packet, this 2024-06-11 20:41:14 +08:00
fs ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
include Update pci_ids.h with the vendor ID for Phytium. 2024-06-11 20:40:55 +08:00
init irqlatency: add irq latency monitor support 2024-06-11 20:40:51 +08:00
ipc ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
kernel cgroup: Use open-time credentials for process migraton perm checks 2024-06-11 20:41:13 +08:00
lib ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
mm mm, slub: use prefetchw instead of prefetch 2024-06-11 20:41:15 +08:00
net net/proc: added sockets details statistics 2024-06-11 20:41:14 +08:00
package config/ARM64: Add config.performance for highest performing version 2024-06-11 20:41:15 +08:00
samples ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
scripts ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
security ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
sound I2S driver support for Phytium CPUs 2024-06-11 20:41:05 +08:00
tools ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
usr tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
virt ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
.gitignore ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS Phytium JPEG Encoder driver 2024-06-11 20:41:01 +08:00
Makefile ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
README.md tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
backport_remove_lists.txt tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
tools_key.pub tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00

README.md

Tencent Linux Kernel 4.0