Summary:
A *lot* of instructions have this special register.
It seems this never really worked, but i finally noticed it only
because it happened to break for `CMOV16rm` instruction.
We serialized that register as "" (empty string), which is naturally
'ignored' during deserialization, so we re-create a `MCInst` with
too few operands.
And when we then happened to try to resolve variant sched class
for this mis-serialized instruction, and the variant predicate
tried to read an operand that was out of bounds since we got less operands,
we crashed.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=41448 | PR41448 ]].
Reviewers: craig.topper, courbet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60517
llvm-svn: 358153
Summary:
D60041 / D60138 refactoring changed how CMOV/SETcc opcodes
are handled. concode is now an immediate, with it's own operand type.
This at least allows to not crash on the opcode.
However, this still won't generate all the snippets
with all the condcode enumerators. D60066 does that.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60057
llvm-svn: 357841
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon
Reviewed By: RKSimon
Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60228
llvm-svn: 357802
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri
Reviewed By: andreadb
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60138
llvm-svn: 357801
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.
This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.
Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.
This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.
I plan to make similar changes for SETcc and Jcc.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60041
llvm-svn: 357800
Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59994
llvm-svn: 357263
Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59993
llvm-svn: 357259
Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59951
llvm-svn: 357245
Summary:
This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.
But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.
The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.
The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.
Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.
This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59820
llvm-svn: 357152
Summary:
This prevents "Cannot encode high byte register in REX-prefixed instruction"
from happening on instructions that require REX encoding when AH & co
get selected.
On the down side, these 4 registers can no longer be selected
automatically, but this avoids having to expose all the X86 encoding
complexity.
Reviewers: gchatelet
Subscribers: tschuett, jdoerfert, llvm-commits, bdb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59821
llvm-svn: 357003
Summary:
... from 8.
`VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1 i_0x0 i_0x1` instruction already has 9 components.
It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here..
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57654
llvm-svn: 353022
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead.
Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body.
We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process.
Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong.
llvm-svn: 350035
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442
........
Patch wasn't approved and breaks buildbots
llvm-svn: 349139
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442
llvm-svn: 349136
Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically:
1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces
2. Add missing header to some files
3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode
4. Fix typo
Differential Revision: https://reviews.llvm.org/D54343
llvm-svn: 347309
Summary:
As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known,
it will contain at most Points_.size() minus one (the center of the cluster)
While that is the upper bound, meaning in the most cases, the actual count
will be much smaller, since D54390 made the allocation persistent,
we no longer have to worry about overly-optimistically `reserve()`ing.
Old: (D54393)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% )
...
6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% )
...
6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% )
```
And that is another -~4%.
Since this is the last (as of this moment) patch in this patch series,
it is a good time to summarize:
Old: (svn trunk, as stated in D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s
user 0m24.099s
sys 0m0.785s
```
So these patches, on a given benchmark,
has decreased llvm-exegesis analysis time by 74.62%.
There surely is more room for further improvements.
D54514 may improve thins by -11.5% more (relative to this patch).
Parallelization may improve things further significantly, too.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54415
llvm-svn: 347204
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.487s
user 0m9.745s
sys 0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m9.599s
user 0m8.824s
sys 0m0.772s
```
Not that much, around -9%. But that is not the good part yet, again.
Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB
New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54382
llvm-svn: 347198
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s
user 0m24.099s
sys 0m0.785s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.469s
user 0m9.797s
sys 0m0.672s
```
So -60%. And that isn't the good bit yet.
Old:
* calls to allocation functions: 106560180 (yes, 107 *million* allocations.)
* bytes allocated in total (ignoring deallocations): 12.17 GB
New:
* calls to allocation functions: 3347676 (-96.86%) (just 3 mil)
* bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less)
---
Two points i want to raise:
* `std::unordered_set<>` should not have been used there in the first place.
It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options
* There is no tests, so i'm not fully sure this is correct.
Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok?
* I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc,
this `llvm::SetVector<>` seems to be best here.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet
Subscribers: kristina, bobsayshilol, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54381
llvm-svn: 347197
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.
Reviewers: gchatelet, john.brawn, jsji
Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54297
llvm-svn: 346489
This is patch to add PowerPC target to llvm-exegesis.
The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes.
Differential Revision: https://reviews.llvm.org/D54185
llvm-svn: 346411
Summary:
Add unit tests to check the support for each supported format to avoid
regressions such as the one in PR36906.
Reviewers: gchatelet
Subscribers: tschuett, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D54144
llvm-svn: 346330