Clement Courbet
0ddf81c43d
[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables.
...
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58285
llvm-svn: 354862
2019-02-26 10:54:45 +00:00
Roman Lebedev
542e5d7bb5
[llvm-exegesis] Split Epsilon param into two (PR40787)
...
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values
What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)
By splitting it into two params it is now possible.
This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% )
12 context-switches # 31.735 M/sec ( +- 27.38% )
0 cpu-migrations # 0.000 K/sec
4745 page-faults # 12183.732 M/sec ( +- 0.54% )
1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%)
185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%)
392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%)
1839236666 instructions # 1.18 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%)
407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%)
10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%)
0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% )
262 context-switches # 38.546 M/sec ( +- 23.06% )
0 cpu-migrations # 0.065 M/sec ( +- 76.03% )
13287 page-faults # 1953.206 M/sec ( +- 0.32% )
27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%)
1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%)
16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%)
17611143370 instructions # 0.65 insn per cycle
# 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%)
3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%)
116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%)
6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% )
12 context-switches # 29.429 M/sec ( +- 25.95% )
0 cpu-migrations # 0.100 M/sec ( +-100.00% )
4714 page-faults # 11796.496 M/sec ( +- 0.55% )
1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%)
199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%)
402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%)
1847783963 instructions # 1.15 insn per cycle
# 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%)
407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%)
10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%)
0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% )
lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% )
217 context-switches # 31.236 M/sec ( +- 36.16% )
1 cpu-migrations # 0.096 M/sec ( +- 50.00% )
13258 page-faults # 1908.389 M/sec ( +- 0.34% )
27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%)
1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%)
16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%)
17755545931 instructions # 0.64 insn per cycle
# 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%)
3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%)
117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%)
6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% )
```
I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.
Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58476
llvm-svn: 354767
2019-02-25 09:36:12 +00:00
Fangrui Song
990061b6d6
Fix file header issues in fuzzers. NFC
...
llvm-svn: 354551
2019-02-21 07:57:14 +00:00
Hans Wennborg
14e15ec18d
Fix the build with gcc/libstdc++ 4.8.2 after r354441
...
llvm-svn: 354469
2019-02-20 14:50:08 +00:00
Roman Lebedev
69716394f3
[llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
...
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.
We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.
I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.
The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)
Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% )
172 context-switches # 25.965 M/sec ( +- 29.89% )
0 cpu-migrations # 0.042 M/sec ( +- 56.54% )
31073 page-faults # 4690.754 M/sec ( +- 0.08% )
26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%)
2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%)
13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%)
19770706799 instructions # 0.74 insn per cycle
# 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%)
4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%)
121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%)
6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% )
```
patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):
6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% )
213 context-switches # 32.952 M/sec ( +- 23.81% )
1 cpu-migrations # 0.130 M/sec ( +- 43.84% )
31287 page-faults # 4832.057 M/sec ( +- 0.08% )
25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%)
1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%)
13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%)
19752995402 instructions # 0.76 insn per cycle
# 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%)
4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%)
121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%)
6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.
patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):
6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% )
133 context-switches # 20.792 M/sec ( +- 23.39% )
0 cpu-migrations # 0.063 M/sec ( +- 61.24% )
31318 page-faults # 4903.256 M/sec ( +- 0.08% )
25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%)
1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%)
13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%)
19767554347 instructions # 0.77 insn per cycle
# 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%)
4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%)
118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%)
6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.
patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):
6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% )
194 context-switches # 31.709 M/sec ( +- 20.46% )
0 cpu-migrations # 0.039 M/sec ( +- 49.77% )
31413 page-faults # 5129.261 M/sec ( +- 0.06% )
24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%)
1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%)
13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%)
18260877653 instructions # 0.74 insn per cycle
# 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%)
4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%)
114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%)
6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58355
llvm-svn: 354441
2019-02-20 09:14:04 +00:00
Andrea Di Biagio
edbf06a767
[AsmPrinter] Remove hidden flag -print-schedule.
...
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311 ).
Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216 ). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".
These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.
When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.
Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.
Differential Revision: https://reviews.llvm.org/D57244
llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Roman Lebedev
bd84b139b0
[llvm-exegesis] Cut run time of analysis mode by another -35% (*sic*) (YamlContext::getRegNo())
...
Summary:
Together with the previous patch, it's an -90% improvement,
or roughly -96% improvement if you look starting with rL347204
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs):
1483.18 msec task-clock # 0.999 CPUs utilized ( +- 0.10% )
68 context-switches # 46.085 M/sec ( +- 22.62% )
0 cpu-migrations # 0.000 K/sec
11641 page-faults # 7850.880 M/sec ( +- 0.62% )
5943246799 cycles # 4008184.428 GHz ( +- 0.10% ) (83.28%)
442869514 stalled-cycles-frontend # 7.45% frontend cycles idle ( +- 0.41% ) (83.29%)
1443375663 stalled-cycles-backend # 24.29% backend cycles idle ( +- 0.47% ) (33.43%)
7714006752 instructions # 1.30 insn per cycle
# 0.19 stalled cycles per insn ( +- 0.07% ) (50.17%)
1977242936 branches # 1333472193.855 M/sec ( +- 0.07% ) (66.79%)
32624220 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.34%)
1.48438 +- 0.00143 seconds time elapsed ( +- 0.10% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-newer.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-newer.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-newer.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-newer.html' (9 runs):
963.28 msec task-clock # 0.999 CPUs utilized ( +- 0.37% )
12 context-switches # 12.695 M/sec ( +- 52.79% )
0 cpu-migrations # 0.000 K/sec
11599 page-faults # 12046.971 M/sec ( +- 0.59% )
3860122322 cycles # 4009359.596 GHz ( +- 0.37% ) (83.19%)
380300669 stalled-cycles-frontend # 9.85% frontend cycles idle ( +- 0.34% ) (83.30%)
1071910340 stalled-cycles-backend # 27.77% backend cycles idle ( +- 1.30% ) (33.51%)
4773418224 instructions # 1.24 insn per cycle
# 0.22 stalled cycles per insn ( +- 0.15% ) (50.17%)
1106990316 branches # 1149787979.919 M/sec ( +- 0.11% ) (66.80%)
23632231 branch-misses # 2.13% of all branches ( +- 0.18% ) (83.33%)
0.96389 +- 0.00356 seconds time elapsed ( +- 0.37% )
```
```
$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-newer.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html
```
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57658
llvm-svn: 353025
2019-02-04 09:12:25 +00:00
Roman Lebedev
5b94fe9623
[llvm-exegesis] Cut run time of analysis mode by -84% (*sic*) (YamlContext::getInstrOpcode())
...
Summary:
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
9465.46 msec task-clock # 1.000 CPUs utilized ( +- 0.05% )
60 context-switches # 6.363 M/sec ( +- 79.45% )
0 cpu-migrations # 0.000 K/sec
11364 page-faults # 1200.697 M/sec ( +- 0.60% )
37935623543 cycles # 4008083.912 GHz ( +- 0.05% ) (83.32%)
2371625356 stalled-cycles-frontend # 6.25% frontend cycles idle ( +- 0.37% ) (83.32%)
8476077875 stalled-cycles-backend # 22.34% backend cycles idle ( +- 0.18% ) (33.36%)
41822439158 instructions # 1.10 insn per cycle
# 0.20 stalled cycles per insn ( +- 0.02% ) (50.03%)
11607658944 branches # 1226405861.486 M/sec ( +- 0.01% ) (66.69%)
210864633 branch-misses # 1.82% of all branches ( +- 0.06% ) (83.34%)
9.46636 +- 0.00441 seconds time elapsed ( +- 0.05% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs):
1480.66 msec task-clock # 1.000 CPUs utilized ( +- 0.19% )
13 context-switches # 8.483 M/sec ( +- 83.10% )
0 cpu-migrations # 0.075 M/sec ( +-100.00% )
11596 page-faults # 7834.247 M/sec ( +- 0.59% )
5933732194 cycles # 4008977.535 GHz ( +- 0.19% ) (83.22%)
438111928 stalled-cycles-frontend # 7.38% frontend cycles idle ( +- 0.37% ) (83.25%)
1454969705 stalled-cycles-backend # 24.52% backend cycles idle ( +- 0.94% ) (33.53%)
7724218604 instructions # 1.30 insn per cycle
# 0.19 stalled cycles per insn ( +- 0.07% ) (50.14%)
1979796413 branches # 1337599858.945 M/sec ( +- 0.06% ) (66.74%)
32641638 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.31%)
1.48128 +- 0.00284 seconds time elapsed ( +- 0.19% )
$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html
```
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits, RKSimon
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57657
llvm-svn: 353024
2019-02-04 09:12:21 +00:00
Roman Lebedev
1a0d595f15
[llvm-exegesis] Throughput support in analysis mode
...
Summary:
D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 | PR37698 ]] added support for measuring of the inverse throughput.
But the support for the analysis was not added.
This attempts to fix that. (analysis done o bdver2 / piledriver)
First, small-scale experiment:
```
$ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o
---
mode: inverse_throughput
key:
instructions:
- 'BSF64rr RAX RDX'
config: ''
register_initial_values:
- 'RDX=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3
...
```
If we plug `bsfq %r12, %r10` into llvm-mca:
https://godbolt.org/z/ZtOyhJ
```
Dispatch Width: 4
uOps Per Cycle: 3.00
IPC: 0.50
Block RThroughput: 2.0
```
So RThroughput mismatch exists.
Now, let's upscale and analyse:
{F8207148}
`$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`:
{F8207172}
{F8207197}
And if we now look at https://www.agner.org/optimize/instruction_tables.pdf ,
`Reciprocal throughput` for `BSF r,r` is listed as `3`.
Yay?
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57647
llvm-svn: 353023
2019-02-04 09:12:17 +00:00
Roman Lebedev
dc78bc277d
[llvm-exegesis] deserializeMCInst(): bump SmallVector small size up to 16
...
Summary:
... from 8.
`VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1 i_0x0 i_0x1` instruction already has 9 components.
It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here..
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57654
llvm-svn: 353022
2019-02-04 09:12:13 +00:00
Roman Lebedev
21193f4b7e
[llvm-exegesis] Don't default to running&dumping all analyses to '-'
...
Summary:
Up until the point i have looked in the source, i didn't even understood that
i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
(And hoped it wasn't contributing much of the run time.)
While i expect that it has it's use-cases i never once needed it so far.
If i forget to silence it, console is completely flooded with that output.
How about not expecting users to opt-out of analyses,
but to explicitly specify the analyses that should be performed?
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57648
llvm-svn: 353021
2019-02-04 09:12:08 +00:00
Clement Courbet
362653f7af
[llvm-exegesis] Add throughput mode.
...
Summary:
This just uses the latency benchmark runner on the parallel uops snippet
generator.
Fixes PR37698.
Reviewers: gchatelet
Subscribers: tschuett, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D57000
llvm-svn: 352632
2019-01-30 16:02:20 +00:00
Chandler Carruth
2946cd7010
Update the file headers across all of the LLVM projects in the monorepo
...
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Clement Courbet
176388c973
Revert rL350035 "[llvm-exegesis] Clustering: don't enqueue a point multiple times"
...
Let's discuss this on the review thread before submitting.
llvm-svn: 350207
2019-01-02 09:21:00 +00:00
Fangrui Song
cd93d7ef43
[llvm-exegesis] Clustering: don't enqueue a point multiple times
...
Summary:
SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead.
Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body.
We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process.
Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong.
llvm-svn: 350035
2018-12-23 20:48:52 +00:00
Simon Pilgrim
96408bb04a
Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan
...
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442
........
Patch wasn't approved and breaks buildbots
llvm-svn: 349139
2018-12-14 09:25:08 +00:00
Fangrui Song
92537ccc7e
[llvm-exegesis] Optimize ToProcess in dbScan
...
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442
llvm-svn: 349136
2018-12-14 08:27:35 +00:00
Jinsong Ji
56c74cff70
[llvm-exegesis][NFC] Some code style cleanup
...
Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically:
1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces
2. Add missing header to some files
3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode
4. Fix typo
Differential Revision: https://reviews.llvm.org/D54343
llvm-svn: 347309
2018-11-20 14:41:59 +00:00
Clement Courbet
bbab546a71
[llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands().
...
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54304
llvm-svn: 347209
2018-11-19 14:31:43 +00:00
Roman Lebedev
71fdb57640
[llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors
...
Summary:
As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known,
it will contain at most Points_.size() minus one (the center of the cluster)
While that is the upper bound, meaning in the most cases, the actual count
will be much smaller, since D54390 made the allocation persistent,
we no longer have to worry about overly-optimistically `reserve()`ing.
Old: (D54393)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% )
...
6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% )
...
6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% )
```
And that is another -~4%.
Since this is the last (as of this moment) patch in this patch series,
it is a good time to summarize:
Old: (svn trunk, as stated in D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s
user 0m24.099s
sys 0m0.785s
```
So these patches, on a given benchmark,
has decreased llvm-exegesis analysis time by 74.62%.
There surely is more room for further improvements.
D54514 may improve thins by -11.5% more (relative to this patch).
Parallelization may improve things further significantly, too.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54415
llvm-svn: 347204
2018-11-19 13:28:41 +00:00
Roman Lebedev
8e315b66c2
[llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header
...
Summary:
Old: (D54390)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% )
...
7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% )
...
6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% )
```
And another -12%. You'd think it would be `inline`d anyway, but no! :)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54393
llvm-svn: 347203
2018-11-19 13:28:36 +00:00
Roman Lebedev
666d855fbb
[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into llvm::SmallVectorImpl& output parameter
...
Summary:
I do believe this is the correct fix.
We call `rangeQuery()` *very* often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help.
Old: (D54389)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% )
...
7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% )
...
7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% )
```
And another -7%. And that isn't even the good bit yet.
Old:
* calls to allocation functions: 2081419
* temporary allocations: 219658 (10.55%)
* bytes allocated in total (ignoring deallocations): 4.31 GB
New:
* calls to allocation functions: 1880295 (-10%)
* temporary allocations: 18758 (1%) (-91% *sic*)
* bytes allocated in total (ignoring deallocations): 545.15 MB (-88% *sic*)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54390
llvm-svn: 347202
2018-11-19 13:28:31 +00:00
Roman Lebedev
5c5b1ea725
[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace std::vector<> with std::deque<> in llvm::SetVector<>
...
Summary:
Old: (D54388)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% )
...
8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% )
...
7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% )
```
Another -~7%.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, RKSimon
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54389
llvm-svn: 347201
2018-11-19 13:28:26 +00:00
Roman Lebedev
8aecb0c489
[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage.
...
Summary:
Old: (D54383)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% )
...
9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% )
...
8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% )
```
So another -6%.
That is because the `SmallVector` **doubles** it size when reallocating, which is great here,
since we can't `reserve()` since we can't know how many `Neighbors` we will have.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54388
llvm-svn: 347200
2018-11-19 13:28:22 +00:00
Roman Lebedev
b311c1d6b8
[llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time.
...
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54382)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% )
...
9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% )
```
New time:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% )
...
9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% )
```
-~0.3%, not that much. But this isn't the important part.
Old:
* calls to allocation functions: 2109712
* temporary allocations: 33112
* bytes allocated in total (ignoring deallocations): 4.43 GB
New:
* calls to allocation functions: 2095345 (-0.68%)
* temporary allocations: 18745 (-43.39% !!!)
* bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54383
llvm-svn: 347199
2018-11-19 13:28:17 +00:00
Roman Lebedev
f8b28e9bf4
[llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations.
...
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.487s
user 0m9.745s
sys 0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m9.599s
user 0m8.824s
sys 0m0.772s
```
Not that much, around -9%. But that is not the good part yet, again.
Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB
New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54382
llvm-svn: 347198
2018-11-19 13:28:14 +00:00
Roman Lebedev
0b4b512826
[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<>
...
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s
user 0m24.099s
sys 0m0.785s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.469s
user 0m9.797s
sys 0m0.672s
```
So -60%. And that isn't the good bit yet.
Old:
* calls to allocation functions: 106560180 (yes, 107 *million* allocations.)
* bytes allocated in total (ignoring deallocations): 12.17 GB
New:
* calls to allocation functions: 3347676 (-96.86%) (just 3 mil)
* bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less)
---
Two points i want to raise:
* `std::unordered_set<>` should not have been used there in the first place.
It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options
* There is no tests, so i'm not fully sure this is correct.
Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok?
* I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc ,
this `llvm::SetVector<>` seems to be best here.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet
Subscribers: kristina, bobsayshilol, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54381
llvm-svn: 347197
2018-11-19 13:28:09 +00:00
Clement Courbet
eee2e06e2a
[llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target.
...
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.
Reviewers: gchatelet, john.brawn, jsji
Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54297
llvm-svn: 346489
2018-11-09 13:15:32 +00:00
Jinsong Ji
5fd3e75478
[PowerPC][llvm-exegesis] Add a PowerPC target
...
This is patch to add PowerPC target to llvm-exegesis.
The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes.
Differential Revision: https://reviews.llvm.org/D54185
llvm-svn: 346411
2018-11-08 16:51:42 +00:00
Clement Courbet
54c2fa1202
[llvm-exegesis][NFC] Add missing header guard + cosmetics.
...
Reviewers: gchatelet
Reviewed By: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54252
llvm-svn: 346400
2018-11-08 12:37:56 +00:00
Clement Courbet
0d79aaf1a7
Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes."
...
This reverts accidental commit rL346394.
llvm-svn: 346398
2018-11-08 12:09:45 +00:00
Clement Courbet
c0950ae990
[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes.
...
llvm-svn: 346394
2018-11-08 11:45:14 +00:00
Clement Courbet
5b0d783078
[llvm-exegesis] Remove superfluous move.
...
/Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
return std::move(Error);
^
/Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: note: remove std::move call here
return std::move(Error);
^~~~~~~~~~ ~
llvm-svn: 346333
2018-11-07 16:52:50 +00:00
Clement Courbet
c544838f87
[llvm-exegesis] Correclty handle all X86 memory encoding formats.
...
Summary:
Add unit tests to check the support for each supported format to avoid
regressions such as the one in PR36906.
Reviewers: gchatelet
Subscribers: tschuett, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D54144
llvm-svn: 346330
2018-11-07 16:14:55 +00:00
Clement Courbet
7066769223
[llvm-exegesis] Increasing wrapping limit.
...
Summary: Fixes PR39097.
Reviewers: gchatelet
Subscribers: llvm-commits, tschuett
Differential Revision: https://reviews.llvm.org/D54151
llvm-svn: 346328
2018-11-07 15:46:45 +00:00
Clement Courbet
003e08ff28
[llvm-exegesis] Ignore X86 pseudo instructions.
...
Summary: They do not lower to actual MCInsts and have no scheduling info.
Reviewers: gchatelet
Subscribers: llvm-commits, tschuett
Differential Revision: https://reviews.llvm.org/D54147
llvm-svn: 346227
2018-11-06 14:11:58 +00:00
Matthias Braun
3d849f67cb
MachineModuleInfo: Store more specific reference to LLVMTargetMachine; NFC
...
MachineModuleInfo can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
llvm-svn: 346182
2018-11-05 23:49:13 +00:00
Clement Courbet
4d837fce88
[llvm-exegesis] Fix SNB counter definition and handling.
...
Summary: SNB is the only one that has P23 as a single proc res.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53766
llvm-svn: 345480
2018-10-28 19:09:14 +00:00
Simon Pilgrim
2a9c728088
Fix MSVC llvm-exegesis build. NFCI.
...
MSVC is a bit funny about is_pod.....
llvm-svn: 345252
2018-10-25 10:45:38 +00:00
Clement Courbet
b4b6ec01c6
[llvm-exegesis] Add missing initializer.
...
This is a better fix than rL345245.
llvm-svn: 345246
2018-10-25 08:11:35 +00:00
Clement Courbet
fa99b36e4d
[llvm-exegesis] Fix VC build of r345243.
...
"const members cannot be default initialized unless their type has a user defined default constructor"
Make members non-const.
llvm-svn: 345245
2018-10-25 08:08:58 +00:00
Clement Courbet
8902c885d6
[llvm-exegesis] Fix warning in r345243.
...
warning C4099: 'llvm::exegesis::PfmCountersInfo': type name first seen using 'class' now seen using 'struct'
llvm-svn: 345244
2018-10-25 08:06:35 +00:00
Clement Courbet
41c8af3924
[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
...
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).
This also compresses the pfm counter tables (PR37068).
Reviewers: RKSimon, gchatelet
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D52932
llvm-svn: 345243
2018-10-25 07:44:01 +00:00
Guillaume Chatelet
da11b85606
[llvm-exegesis] Implements a cache of Instruction objects.
...
llvm-svn: 345130
2018-10-24 11:55:06 +00:00
Fangrui Song
a342834b24
[llvm-exegesis] Fix name lookup ambiguity in MSVC after 344922
...
llvm-svn: 344927
2018-10-22 17:52:31 +00:00
Fangrui Song
32401afd8c
[llvm-exegesis] Move namespace exegesis inside llvm::
...
Summary:
This allows simplifying references of llvm::foo with foo when the needs
come in the future.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: javed.absar, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53455
llvm-svn: 344922
2018-10-22 17:10:47 +00:00
Guillaume Chatelet
18ef4a4a0d
[llvm-exegesis] Crash when assembling invalid Operand
...
llvm-svn: 344907
2018-10-22 15:06:10 +00:00
Guillaume Chatelet
02f70a3fde
[llvm-exegesis] Mark x86 segment register instructions as unsupported.
...
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53499
llvm-svn: 344906
2018-10-22 14:55:43 +00:00
Guillaume Chatelet
3c639f33b4
[llvm-exegesis] Reject x86 instructions that use non uniform memory accesses
...
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53438
llvm-svn: 344905
2018-10-22 14:46:08 +00:00
Clement Courbet
8d0dd0ba0e
[llvm-exegesis] Mark second-form X87 instructions as unsupported.
...
Summary:
We only support the first form because we rely on information that is
only available there.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53430
llvm-svn: 344782
2018-10-19 12:24:49 +00:00