Commit Graph

49 Commits

Author SHA1 Message Date
Kazu Hirata 902cbcd59e Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Miloš Stojanović 24b7b99b7d [llvm-exegesis][NFC] Disassociate snippet generators from benchmark runners
The addition of `inverse_throughput` mode highlighted the disjointedness
of snippet generators and benchmark runners because it used the
`UopsSnippetGenerator` with the  `LatencyBenchmarkRunner`.
To keep the code consistent tie the snippet generators to
parallelization/serialization rather than their benchmark runners.

Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`.
Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`.

Differential Revision: https://reviews.llvm.org/D72928
2020-01-20 16:19:13 +01:00
Fangrui Song 6fdd6a7b3f [Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.

If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
2020-01-11 13:34:52 -08:00
Fangrui Song aa708763d3 [MC] Add parameter `Address` to MCInstPrinter::printInst
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.

It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.

Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.

The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.

In any case, downstream projects which don't know `Address` can pass 0 as
the argument.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72172
2020-01-06 20:42:22 -08:00
Mirko Brkusanin 4b63ca1379 [Mips] Use appropriate private label prefix based on Mips ABI
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.

Tags: #llvm, #clang, #lldb

Differential Revision: https://reviews.llvm.org/D66795
2019-10-23 12:24:35 +02:00
Clement Courbet c8eb0547ef [llvm-exegesis] Show noise cluster in analysis output.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68780

llvm-svn: 374533
2019-10-11 11:33:18 +00:00
Clement Courbet 50cdd56beb [llvm-exegesis][NFC] Remove extra `llvm::` qualifications.
Summary: Second patch: in the lib.

Reviewers: gchatelet

Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68692

llvm-svn: 374158
2019-10-09 11:58:42 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Roman Lebedev 4d81e87765 [NFC][llvm-exegesis] Also promote getSchedClassPoint() into ResolvedSchedClass.
Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59994

llvm-svn: 357263
2019-03-29 14:58:01 +00:00
Roman Lebedev 1d1330c546 [NFC][llvm-exegesis] Refactor ResolvedSchedClass & friends
Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59993

llvm-svn: 357259
2019-03-29 14:24:27 +00:00
Roman Lebedev b8fb15d412 [NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch()
Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
     This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
     https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
     I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59951

llvm-svn: 357245
2019-03-29 11:36:08 +00:00
Roman Lebedev c2423fe689 [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary:
This is an alternative to D59539.

Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.

Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.

This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59820

llvm-svn: 357152
2019-03-28 08:55:01 +00:00
Roman Lebedev 542e5d7bb5 [llvm-exegesis] Split Epsilon param into two (PR40787)
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values

What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)

By splitting it into two params it is now possible.

This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

            390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                 0      cpu-migrations            #    0.000 K/sec
              4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
        1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
         185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
         392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
        1839236666      instructions              #    1.18  insn per cycle
                                                  #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
         407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
          10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)

          0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
               262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                 0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
             13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
       27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
        1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
       16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
       17611143370      instructions              #    0.65  insn per cycle
                                                  #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
        3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
         116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)

            6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):

            400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                 0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
              4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
        1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
         199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
         402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
        1847783963      instructions              #    1.15  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
         407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
          10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)

           0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )

lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):

           6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
               217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                 1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
             13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
       27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
        1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
       16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
       17755545931      instructions              #    0.64  insn per cycle
                                                  #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
        3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
         117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)

            6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
```

I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.

Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58476

llvm-svn: 354767
2019-02-25 09:36:12 +00:00
Roman Lebedev 69716394f3 [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.

I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

           6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
               172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                 0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
             31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
       26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
        2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
       13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
       19770706799      instructions              #    0.74  insn per cycle
                                                  #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
         121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)

            6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )
```

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):

           6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
               213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                 1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
             31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
       25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
        1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
       13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
       19752995402      instructions              #    0.76  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
         121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)

            6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):

           6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
               133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                 0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
             31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
       25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
        1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
       13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
       19767554347      instructions              #    0.77  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
        4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
         118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)

            6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):

           6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
               194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                 0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
             31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
       24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
        1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
       13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
       18260877653      instructions              #    0.74  insn per cycle
                                                  #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
        4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
         114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)

            6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58355

llvm-svn: 354441
2019-02-20 09:14:04 +00:00
Roman Lebedev 1a0d595f15 [llvm-exegesis] Throughput support in analysis mode
Summary:
D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 | PR37698 ]] added support for measuring of the inverse throughput.
But the support for the analysis was not added.
This attempts to fix that. (analysis done o bdver2 / piledriver)

First, small-scale experiment:
```
$ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'BSF64rr RAX RDX'
  config:          ''
  register_initial_values:
    - 'RDX=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 }
error:           ''
info:            instruction has no tied variables picking Uses different from defs
assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3
...
```
If we plug `bsfq	%r12, %r10` into llvm-mca:
https://godbolt.org/z/ZtOyhJ
```
Dispatch Width:    4
uOps Per Cycle:    3.00
IPC:               0.50
Block RThroughput: 2.0
```
So RThroughput mismatch exists.

Now, let's upscale and analyse:
{F8207148}
`$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`:
{F8207172}
{F8207197}
And if we now look at https://www.agner.org/optimize/instruction_tables.pdf,
`Reciprocal throughput` for `BSF r,r` is listed as `3`.
Yay?

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57647

llvm-svn: 353023
2019-02-04 09:12:17 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Roman Lebedev b311c1d6b8 [llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54382)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       9024.354355      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.18% )
...
            9.0262 +- 0.0161 seconds time elapsed  ( +-  0.18% )
```
New time:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       8996.541057      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.19% )
...
            9.0045 +- 0.0172 seconds time elapsed  ( +-  0.19% )
```
-~0.3%, not that much. But this isn't the important part.

Old:
* calls to allocation functions: 2109712
* temporary allocations: 33112
* bytes allocated in total (ignoring deallocations): 4.43 GB

New:
* calls to allocation functions: 2095345 (-0.68%)
* temporary allocations: 18745 (-43.39% !!!)
* bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54383

llvm-svn: 347199
2018-11-19 13:28:17 +00:00
Roman Lebedev f8b28e9bf4 [llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m10.487s
user    0m9.745s
sys     0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m9.599s
user    0m8.824s
sys     0m0.772s

```
Not that much, around -9%. But that is not the good part yet, again.

Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB

New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54382

llvm-svn: 347198
2018-11-19 13:28:14 +00:00
Fangrui Song 32401afd8c [llvm-exegesis] Move namespace exegesis inside llvm::
Summary:
This allows simplifying references of llvm::foo with foo when the needs
come in the future.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: javed.absar, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53455

llvm-svn: 344922
2018-10-22 17:10:47 +00:00
Fangrui Song 2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Clement Courbet 5a768ddd44 [llvm-exegesis][NFC] Revert rL343682 "Fix unused variable warning".
That was not the proper fix: the variable is used in debug mode.

llvm-svn: 343685
2018-10-03 12:48:50 +00:00
Clement Courbet 8a5a6be47a [llvm-exegesis] Fix rL343680 in release mode.
llvm-svn: 343684
2018-10-03 12:35:35 +00:00
Clement Courbet af50a5b85f [llvm-exegesis][NFC] Fix unused variable warning.
llvm-svn: 343682
2018-10-03 12:27:43 +00:00
Clement Courbet d5a39553ff [llvm-exegesis] Resolve variant classes in analysis.
Summary: See PR38884.

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D52825

llvm-svn: 343680
2018-10-03 11:50:25 +00:00
Clement Courbet 30183093ab [llvm-exegesis] Fix PR39096.
Summary: The key is now the resource name, not the resource id.

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D52607

llvm-svn: 343208
2018-09-27 13:26:37 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Clement Courbet 28d4f85824 [llvm-exegesis] Get rid of debug_string.
Summary:
THis is a backwards-compatible change (existing files will work as
expected).

See PR39082.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52546

llvm-svn: 343108
2018-09-26 13:35:10 +00:00
Clement Courbet 684a5f6753 [llvm-exegesis] Output the unscaled value as well as the scaled one.
Summary: See PR38936 for context.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52500

llvm-svn: 343081
2018-09-26 08:37:21 +00:00
Clement Courbet 4273e1e828 [llvm-exegesis] Print the whole snippet in analysis.
Summary:
On hover, the whole asm snippet is displayed, including operands.

This requires the actual assembly output instead of just the MCInsts:
This is because some pseudo-instructions get lowered to actual target
instructions during codegen (e.g. ABS_Fp32 -> SSE or X87).

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48164

llvm-svn: 334805
2018-06-15 07:30:45 +00:00
Clement Courbet 49fad1cbf2 [llvm-exegesis] Use BenchmarkResult::Instructions instead of OpcodeName
Summary:
Get rid of OpcodeName.

To remove the opcode name from an old file:
```
cat old_file | sed '/opcode_name.*/d'
```

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48121

llvm-svn: 334691
2018-06-14 06:57:52 +00:00
Clement Courbet 62b34fa89a [llvm-exegesis] move Mode from Key to BenchmarResult.
Moves the Mode field out of the Key. The existing yaml benchmark results can be fixed with the following script:

```
readonly FILE=$1
readonly MODE=latency # Change to uops to fix a uops benchmark.
cat $FILE | \
  sed "/^\ \+mode:\ \+$MODE$/d" | \
  sed "/^cpu_name.*$/i mode:            $MODE"
```

Differential Revision: https://reviews.llvm.org/D47813

Authored by: Guillaume Chatelet

llvm-svn: 334079
2018-06-06 09:42:36 +00:00
Clement Courbet 2cb97b95a2 [llvm-exegesis][NFC] Use an enum instead of a string for benchmark mode.
Summary: YAML encoding is backwards-compatible.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47705

llvm-svn: 333886
2018-06-04 11:43:40 +00:00
Clement Courbet 7228721b30 [llvm-exegesis] Analysis: Show inconsistencies between checked-in and measured data.
Summary:
We now highlight any sched classes whose measurements do not match the
LLVM SchedModel. "bad" clusters are marked in red.

Screenshot in phabricator diff.

Reviewers: gchatelet

Subscribers: tschuett, mgrang, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D47639

llvm-svn: 333884
2018-06-04 11:11:55 +00:00
Clement Courbet df79e79e22 [llvm-exegesis] Analysis: Display idealized sched class port pressure.
Summary: Screenshot in phabricator diff.

Reviewers: gchatelet

Subscribers: mgorny, tschuett, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D47329

llvm-svn: 333753
2018-06-01 14:18:02 +00:00
Clement Courbet ae8ae5dc78 [llvm-exegesis] Analysis: Show value extents.
Summary: Screenshot attached in phabricator.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47318

llvm-svn: 333181
2018-05-24 12:41:02 +00:00
Clement Courbet b1f1b50da9 [llvm-exegesis] Analysis: show debug string instead of raw key if provided.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47315

llvm-svn: 333175
2018-05-24 11:26:00 +00:00
Clement Courbet 2637e5f828 [llvm-exegesis] Show sched class details in analysis.
Summary: And update docs.

Reviewers: gchatelet

Subscribers: tschuett, craig.topper, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D47254

llvm-svn: 333169
2018-05-24 10:47:05 +00:00
Clement Courbet 17d3c257b9 [llvm-exegesis] Analysis output uses HTML.
Summary: This makes the report much more readable.

Reviewers: gchatelet

Subscribers: tschuett, mgrang, craig.topper, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D47189

llvm-svn: 332979
2018-05-22 13:31:29 +00:00
Clement Courbet a1bee62308 [llvm-exegesis] Remove redudant explicit template instantiations.
llvm-svn: 332611
2018-05-17 14:37:31 +00:00
Clement Courbet cf21074603 [llvm-exegesis] Write out inconsistencies to a file.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47013

llvm-svn: 332608
2018-05-17 13:41:28 +00:00
Clement Courbet 448550d947 [llvm-exegesis] Analysis: detect clustering inconsistencies.
Summary:
Warn on instructions that should have the same performance
characteristics according to the sched model but actually
differ in their benchmarks.

Next step: Make the display nicer to browse, I was thinking maybe html.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D46945

llvm-svn: 332601
2018-05-17 12:25:18 +00:00
Clement Courbet c922e07796 [llvm-exegesis] Fix unused variable warning in release mode.
llvm-svn: 332455
2018-05-16 11:49:15 +00:00
Clement Courbet 4022b6c42f Fix unused variable warning in r332437.
llvm-svn: 332441
2018-05-16 09:10:04 +00:00
Clement Courbet 6d6c1a99d7 [llvm-exegesis] Analysis: Display sched class for instructions.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D46883

llvm-svn: 332437
2018-05-16 08:47:21 +00:00
Clement Courbet a66bfaa4c0 [llvm-exegesis] Split AsmTemplate.Name into components.
Summary:
AsmTemplate becomes IntructionBenchmarkKey, which has three components.
This allows retreiving the opcode for analysis.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D46873

llvm-svn: 332348
2018-05-15 13:07:05 +00:00
Clement Courbet 37f0ca0328 [llvm-exegesis] Add an analysis mode.
Summary:
The analysis mode gives the user a clustered view of the measurement results.
Next steps are (requires the split ok AsmTemplate.Name into {mnemonic, mode}):
 - Show the sched class.
 - Highlight any inconsistencies with the checked-in data.

Reviewers: gchatelet

Subscribers: mgorny, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D46865

llvm-svn: 332344
2018-05-15 12:08:00 +00:00
Clement Courbet dffc4ca4f2 [llvm-exegesis] Revert accidentally commited code.
llvm-svn: 332231
2018-05-14 11:35:37 +00:00
Clement Courbet 4623da8991 [llvm-exegesis] Fix a warning in r332221
comparison of integers of different signs: 'const unsigned long' and 'const int' [-Werror,-Wsign-compare]

unittests/tools/llvm-exegesis/BenchmarkResultTest.cpp:60:5: note: in instantiation of function template specialization 'testing::internal::EqHelper<false>::Compare<unsigned long, int>' requested here
    ASSERT_EQ(FromDiskVector.size(), 1);

llvm-svn: 332230
2018-05-14 11:31:02 +00:00
Clement Courbet 3d479fe81c [llvm-exegesis] Add an analysis mode.
The analysis mode gives the user a clustered view of the measurement results and
highlights any inconsistencies with the checked-in data.

llvm-svn: 332229
2018-05-14 11:30:56 +00:00