llvm-project

Commit Graph

Author	SHA1	Message	Date
Clement Courbet	0ddf81c43d	[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58285 llvm-svn: 354862	2019-02-26 10:54:45 +00:00
Roman Lebedev	542e5d7bb5	[llvm-exegesis] Split Epsilon param into two (PR40787) Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767	2019-02-25 09:36:12 +00:00
Fangrui Song	990061b6d6	Fix file header issues in fuzzers. NFC llvm-svn: 354551	2019-02-21 07:57:14 +00:00
Hans Wennborg	14e15ec18d	Fix the build with gcc/libstdc++ 4.8.2 after r354441 llvm-svn: 354469	2019-02-20 14:50:08 +00:00
Roman Lebedev	69716394f3	[llvm-exegesis] Opcode stabilization / reclusterization (PR40715) Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with similar characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with different performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into different clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move all the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable and unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, this measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 \| PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441	2019-02-20 09:14:04 +00:00
Andrea Di Biagio	edbf06a767	[AsmPrinter] Remove hidden flag -print-schedule. This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043	2019-02-04 12:51:26 +00:00
Roman Lebedev	bd84b139b0	[llvm-exegesis] Cut run time of analysis mode by another -35% (sic) (YamlContext::getRegNo()) Summary: Together with the previous patch, it's an -90% improvement, or roughly -96% improvement if you look starting with rL347204 ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs): 1483.18 msec task-clock # 0.999 CPUs utilized ( +- 0.10% ) 68 context-switches # 46.085 M/sec ( +- 22.62% ) 0 cpu-migrations # 0.000 K/sec 11641 page-faults # 7850.880 M/sec ( +- 0.62% ) 5943246799 cycles # 4008184.428 GHz ( +- 0.10% ) (83.28%) 442869514 stalled-cycles-frontend # 7.45% frontend cycles idle ( +- 0.41% ) (83.29%) 1443375663 stalled-cycles-backend # 24.29% backend cycles idle ( +- 0.47% ) (33.43%) 7714006752 instructions # 1.30 insn per cycle # 0.19 stalled cycles per insn ( +- 0.07% ) (50.17%) 1977242936 branches # 1333472193.855 M/sec ( +- 0.07% ) (66.79%) 32624220 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.34%) 1.48438 +- 0.00143 seconds time elapsed ( +- 0.10% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-newer.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-newer.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-newer.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-newer.html' (9 runs): 963.28 msec task-clock # 0.999 CPUs utilized ( +- 0.37% ) 12 context-switches # 12.695 M/sec ( +- 52.79% ) 0 cpu-migrations # 0.000 K/sec 11599 page-faults # 12046.971 M/sec ( +- 0.59% ) 3860122322 cycles # 4009359.596 GHz ( +- 0.37% ) (83.19%) 380300669 stalled-cycles-frontend # 9.85% frontend cycles idle ( +- 0.34% ) (83.30%) 1071910340 stalled-cycles-backend # 27.77% backend cycles idle ( +- 1.30% ) (33.51%) 4773418224 instructions # 1.24 insn per cycle # 0.22 stalled cycles per insn ( +- 0.15% ) (50.17%) 1106990316 branches # 1149787979.919 M/sec ( +- 0.11% ) (66.80%) 23632231 branch-misses # 2.13% of all branches ( +- 0.18% ) (83.33%) 0.96389 +- 0.00356 seconds time elapsed ( +- 0.37% ) ``` ``` $ sha512sum /tmp/clusters-* db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-newer.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html ``` Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57658 llvm-svn: 353025	2019-02-04 09:12:25 +00:00
Roman Lebedev	5b94fe9623	[llvm-exegesis] Cut run time of analysis mode by -84% (sic) (YamlContext::getInstrOpcode()) Summary: ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 9465.46 msec task-clock # 1.000 CPUs utilized ( +- 0.05% ) 60 context-switches # 6.363 M/sec ( +- 79.45% ) 0 cpu-migrations # 0.000 K/sec 11364 page-faults # 1200.697 M/sec ( +- 0.60% ) 37935623543 cycles # 4008083.912 GHz ( +- 0.05% ) (83.32%) 2371625356 stalled-cycles-frontend # 6.25% frontend cycles idle ( +- 0.37% ) (83.32%) 8476077875 stalled-cycles-backend # 22.34% backend cycles idle ( +- 0.18% ) (33.36%) 41822439158 instructions # 1.10 insn per cycle # 0.20 stalled cycles per insn ( +- 0.02% ) (50.03%) 11607658944 branches # 1226405861.486 M/sec ( +- 0.01% ) (66.69%) 210864633 branch-misses # 1.82% of all branches ( +- 0.06% ) (83.34%) 9.46636 +- 0.00441 seconds time elapsed ( +- 0.05% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs): 1480.66 msec task-clock # 1.000 CPUs utilized ( +- 0.19% ) 13 context-switches # 8.483 M/sec ( +- 83.10% ) 0 cpu-migrations # 0.075 M/sec ( +-100.00% ) 11596 page-faults # 7834.247 M/sec ( +- 0.59% ) 5933732194 cycles # 4008977.535 GHz ( +- 0.19% ) (83.22%) 438111928 stalled-cycles-frontend # 7.38% frontend cycles idle ( +- 0.37% ) (83.25%) 1454969705 stalled-cycles-backend # 24.52% backend cycles idle ( +- 0.94% ) (33.53%) 7724218604 instructions # 1.30 insn per cycle # 0.19 stalled cycles per insn ( +- 0.07% ) (50.14%) 1979796413 branches # 1337599858.945 M/sec ( +- 0.06% ) (66.74%) 32641638 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.31%) 1.48128 +- 0.00284 seconds time elapsed ( +- 0.19% ) $ sha512sum /tmp/clusters-* db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html ``` Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D57657 llvm-svn: 353024	2019-02-04 09:12:21 +00:00
Roman Lebedev	1a0d595f15	[llvm-exegesis] Throughput support in analysis mode Summary: D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 \| PR37698 ]] added support for measuring of the inverse throughput. But the support for the analysis was not added. This attempts to fix that. (analysis done o bdver2 / piledriver) First, small-scale experiment: ``` $ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o --- mode: inverse_throughput key: instructions: - 'BSF64rr RAX RDX' config: '' register_initial_values: - 'RDX=0x0' cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 } error: '' info: instruction has no tied variables picking Uses different from defs assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3 ... ``` If we plug `bsfq %r12, %r10` into llvm-mca: https://godbolt.org/z/ZtOyhJ ``` Dispatch Width: 4 uOps Per Cycle: 3.00 IPC: 0.50 Block RThroughput: 2.0 ``` So RThroughput mismatch exists. Now, let's upscale and analyse: {F8207148} `$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`: {F8207172} {F8207197} And if we now look at https://www.agner.org/optimize/instruction_tables.pdf, `Reciprocal throughput` for `BSF r,r` is listed as `3`. Yay? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57647 llvm-svn: 353023	2019-02-04 09:12:17 +00:00
Roman Lebedev	dc78bc277d	[llvm-exegesis] deserializeMCInst(): bump SmallVector small size up to 16 Summary: ... from 8. `VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1 i_0x0 i_0x1` instruction already has 9 components. It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here.. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57654 llvm-svn: 353022	2019-02-04 09:12:13 +00:00
Roman Lebedev	21193f4b7e	[llvm-exegesis] Don't default to running&dumping all analyses to '-' Summary: Up until the point i have looked in the source, i didn't even understood that i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`. (And hoped it wasn't contributing much of the run time.) While i expect that it has it's use-cases i never once needed it so far. If i forget to silence it, console is completely flooded with that output. How about not expecting users to opt-out of analyses, but to explicitly specify the analyses that should be performed? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57648 llvm-svn: 353021	2019-02-04 09:12:08 +00:00
Clement Courbet	362653f7af	[llvm-exegesis] Add throughput mode. Summary: This just uses the latency benchmark runner on the parallel uops snippet generator. Fixes PR37698. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D57000 llvm-svn: 352632	2019-01-30 16:02:20 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Clement Courbet	176388c973	Revert rL350035 "[llvm-exegesis] Clustering: don't enqueue a point multiple times" Let's discuss this on the review thread before submitting. llvm-svn: 350207	2019-01-02 09:21:00 +00:00
Fangrui Song	cd93d7ef43	[llvm-exegesis] Clustering: don't enqueue a point multiple times Summary: SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead. Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body. We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check. Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process. Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong. llvm-svn: 350035	2018-12-23 20:48:52 +00:00
Simon Pilgrim	96408bb04a	Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54442 ........ Patch wasn't approved and breaks buildbots llvm-svn: 349139	2018-12-14 09:25:08 +00:00
Fangrui Song	92537ccc7e	[llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54442 llvm-svn: 349136	2018-12-14 08:27:35 +00:00
Jinsong Ji	56c74cff70	[llvm-exegesis][NFC] Some code style cleanup Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically: 1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces 2. Add missing header to some files 3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode 4. Fix typo Differential Revision: https://reviews.llvm.org/D54343 llvm-svn: 347309	2018-11-20 14:41:59 +00:00
Clement Courbet	bbab546a71	[llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands(). Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54304 llvm-svn: 347209	2018-11-19 14:31:43 +00:00
Roman Lebedev	71fdb57640	[llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors Summary: As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known, it will contain at most Points_.size() minus one (the center of the cluster) While that is the upper bound, meaning in the most cases, the actual count will be much smaller, since D54390 made the allocation persistent, we no longer have to worry about overly-optimistically `reserve()`ing. Old: (D54393) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% ) ... 6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% ) ... 6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% ) ``` And that is another -~4%. Since this is the last (as of this moment) patch in this patch series, it is a good time to summarize: Old: (svn trunk, as stated in D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` So these patches, on a given benchmark, has decreased llvm-exegesis analysis time by 74.62%. There surely is more room for further improvements. D54514 may improve thins by -11.5% more (relative to this patch). Parallelization may improve things further significantly, too. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54415 llvm-svn: 347204	2018-11-19 13:28:41 +00:00
Roman Lebedev	8e315b66c2	[llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% ) ... 7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% ) ... 6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% ) ``` And another -12%. You'd think it would be `inline`d anyway, but no! :) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54393 llvm-svn: 347203	2018-11-19 13:28:36 +00:00
Roman Lebedev	666d855fbb	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into llvm::SmallVectorImpl& output parameter Summary: I do believe this is the correct fix. We call `rangeQuery()` very often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help. Old: (D54389) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% ) ... 7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% ) ... 7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% ) ``` And another -7%. And that isn't even the good bit yet. Old: * calls to allocation functions: 2081419 * temporary allocations: 219658 (10.55%) * bytes allocated in total (ignoring deallocations): 4.31 GB New: * calls to allocation functions: 1880295 (-10%) * temporary allocations: 18758 (1%) (-91% sic) * bytes allocated in total (ignoring deallocations): 545.15 MB (-88% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54390 llvm-svn: 347202	2018-11-19 13:28:31 +00:00
Roman Lebedev	5c5b1ea725	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace std::vector<> with std::deque<> in llvm::SetVector<> Summary: Old: (D54388) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% ) ... 8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% ) ... 7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% ) ``` Another -~7%. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, RKSimon Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54389 llvm-svn: 347201	2018-11-19 13:28:26 +00:00
Roman Lebedev	8aecb0c489	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage. Summary: Old: (D54383) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% ) ... 9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% ) ... 8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% ) ``` So another -6%. That is because the `SmallVector` doubles it size when reallocating, which is great here, since we can't `reserve()` since we can't know how many `Neighbors` we will have. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54388 llvm-svn: 347200	2018-11-19 13:28:22 +00:00
Roman Lebedev	b311c1d6b8	[llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54383 llvm-svn: 347199	2018-11-19 13:28:17 +00:00
Roman Lebedev	f8b28e9bf4	[llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54382 llvm-svn: 347198	2018-11-19 13:28:14 +00:00
Roman Lebedev	0b4b512826	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<> Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.469s user 0m9.797s sys 0m0.672s ``` So -60%. And that isn't the good bit yet. Old: * calls to allocation functions: 106560180 (yes, 107 million allocations.) * bytes allocated in total (ignoring deallocations): 12.17 GB New: * calls to allocation functions: 3347676 (-96.86%) (just 3 mil) * bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less) --- Two points i want to raise: * `std::unordered_set<>` should not have been used there in the first place. It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options * There is no tests, so i'm not fully sure this is correct. Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok? * I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc, this `llvm::SetVector<>` seems to be best here. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: kristina, bobsayshilol, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54381 llvm-svn: 347197	2018-11-19 13:28:09 +00:00
Clement Courbet	eee2e06e2a	[llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54297 llvm-svn: 346489	2018-11-09 13:15:32 +00:00
Jinsong Ji	5fd3e75478	[PowerPC][llvm-exegesis] Add a PowerPC target This is patch to add PowerPC target to llvm-exegesis. The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes. Differential Revision: https://reviews.llvm.org/D54185 llvm-svn: 346411	2018-11-08 16:51:42 +00:00
Clement Courbet	54c2fa1202	[llvm-exegesis][NFC] Add missing header guard + cosmetics. Reviewers: gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54252 llvm-svn: 346400	2018-11-08 12:37:56 +00:00
Clement Courbet	0d79aaf1a7	Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes." This reverts accidental commit rL346394. llvm-svn: 346398	2018-11-08 12:09:45 +00:00
Clement Courbet	c0950ae990	[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes. llvm-svn: 346394	2018-11-08 11:45:14 +00:00
Clement Courbet	5b0d783078	[llvm-exegesis] Remove superfluous move. /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move] return std::move(Error); ^ /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: note: remove std::move call here return std::move(Error); ^~~~~~~~~~ ~ llvm-svn: 346333	2018-11-07 16:52:50 +00:00
Clement Courbet	c544838f87	[llvm-exegesis] Correclty handle all X86 memory encoding formats. Summary: Add unit tests to check the support for each supported format to avoid regressions such as the one in PR36906. Reviewers: gchatelet Subscribers: tschuett, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54144 llvm-svn: 346330	2018-11-07 16:14:55 +00:00
Clement Courbet	7066769223	[llvm-exegesis] Increasing wrapping limit. Summary: Fixes PR39097. Reviewers: gchatelet Subscribers: llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D54151 llvm-svn: 346328	2018-11-07 15:46:45 +00:00
Clement Courbet	003e08ff28	[llvm-exegesis] Ignore X86 pseudo instructions. Summary: They do not lower to actual MCInsts and have no scheduling info. Reviewers: gchatelet Subscribers: llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D54147 llvm-svn: 346227	2018-11-06 14:11:58 +00:00
Matthias Braun	3d849f67cb	MachineModuleInfo: Store more specific reference to LLVMTargetMachine; NFC MachineModuleInfo can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. llvm-svn: 346182	2018-11-05 23:49:13 +00:00
Clement Courbet	4d837fce88	[llvm-exegesis] Fix SNB counter definition and handling. Summary: SNB is the only one that has P23 as a single proc res. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53766 llvm-svn: 345480	2018-10-28 19:09:14 +00:00
Simon Pilgrim	2a9c728088	Fix MSVC llvm-exegesis build. NFCI. MSVC is a bit funny about is_pod..... llvm-svn: 345252	2018-10-25 10:45:38 +00:00
Clement Courbet	b4b6ec01c6	[llvm-exegesis] Add missing initializer. This is a better fix than rL345245. llvm-svn: 345246	2018-10-25 08:11:35 +00:00
Clement Courbet	fa99b36e4d	[llvm-exegesis] Fix VC build of r345243. "const members cannot be default initialized unless their type has a user defined default constructor" Make members non-const. llvm-svn: 345245	2018-10-25 08:08:58 +00:00
Clement Courbet	8902c885d6	[llvm-exegesis] Fix warning in r345243. warning C4099: 'llvm::exegesis::PfmCountersInfo': type name first seen using 'class' now seen using 'struct' llvm-svn: 345244	2018-10-25 08:06:35 +00:00
Clement Courbet	41c8af3924	[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel. Summary: The pfm counters are now in the ExegesisTarget rather than the MCSchedModel (PR39165). This also compresses the pfm counter tables (PR37068). Reviewers: RKSimon, gchatelet Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D52932 llvm-svn: 345243	2018-10-25 07:44:01 +00:00
Guillaume Chatelet	da11b85606	[llvm-exegesis] Implements a cache of Instruction objects. llvm-svn: 345130	2018-10-24 11:55:06 +00:00
Fangrui Song	a342834b24	[llvm-exegesis] Fix name lookup ambiguity in MSVC after 344922 llvm-svn: 344927	2018-10-22 17:52:31 +00:00
Fangrui Song	32401afd8c	[llvm-exegesis] Move namespace exegesis inside llvm:: Summary: This allows simplifying references of llvm::foo with foo when the needs come in the future. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53455 llvm-svn: 344922	2018-10-22 17:10:47 +00:00
Guillaume Chatelet	18ef4a4a0d	[llvm-exegesis] Crash when assembling invalid Operand llvm-svn: 344907	2018-10-22 15:06:10 +00:00
Guillaume Chatelet	02f70a3fde	[llvm-exegesis] Mark x86 segment register instructions as unsupported. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53499 llvm-svn: 344906	2018-10-22 14:55:43 +00:00
Guillaume Chatelet	3c639f33b4	[llvm-exegesis] Reject x86 instructions that use non uniform memory accesses Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53438 llvm-svn: 344905	2018-10-22 14:46:08 +00:00
Clement Courbet	8d0dd0ba0e	[llvm-exegesis] Mark second-form X87 instructions as unsupported. Summary: We only support the first form because we rely on information that is only available there. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53430 llvm-svn: 344782	2018-10-19 12:24:49 +00:00

1 2 3 4 5

213 Commits