The target does just enough to be able to run llvm-exegesis in latency
mode for at least some opcodes.
Patch by Miloš Stojanović.
Differential Revision: https://reviews.llvm.org/D68649
llvm-svn: 374590
Summary: Experiments show that this is the alignment we get (for ELF+Linux), but let's ensure that we have it.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68703
llvm-svn: 374170
This was breaking some bots:
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/include/llvm/Support/Error.h:483:5: required from ‘llvm::Expected<T>::Expected(OtherT&&, typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type*) [with OtherT = std::vector<llvm::exegesis::CodeTemplate>&; T = std::vector<llvm::exegesis::CodeTemplate>; typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type = void]’
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/tools/llvm-exegesis/lib/X86/Target.cpp:238:20: required from here
/usr/include/c++/6/bits/stl_construct.h:75:7: error: use of deleted function ‘llvm::exegesis::CodeTemplate::CodeTemplate(const llvm::exegesis::CodeTemplate&)’
{ ::new(static_cast<void*>(__p)) _T1(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llvm-svn: 374149
Summary:
This will help for PR32326.
This shows the well-known issue with `RBP` and `R13` as base registers.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, RKSimon, andreadb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68646
llvm-svn: 374146
Summary:
This adds a `-max-configs-per-opcode` option to limit the number of
configs per opcode.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68642
llvm-svn: 374054
Summary:
Right now there are no snippet generators that emit the `Config` Field,
but I plan to add it to investigate LEA operands for PR32326.
What was broken was:
- `Config` Was not propagated up until the BenchmarkResult::Key.
- Clustering should really consider different configs as measuring
different things, so we should stabilize on (Opcode, Config) instead of
just Opcode.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, lebedev.ri
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68629
llvm-svn: 374031
Existing clients are converted to use MachineModuleInfoWrapperPass. The
new interface is for defining a new pass manager API in CodeGen.
Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm
Reviewed By: arsenm, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D64183
llvm-svn: 373240
Summary:
Before this change the Executable function was made by duplicating the
snippet. This change adds a --repetion-mode={loop|duplicate} flag that
allows choosing between this behaviour and wrapping the snippet instructions
in a loop.
The new mode can help measurements when the snippet fits in the DSB by
short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but
since these are not part of the critical path, they execute in parallel
with the measured code and do not impact measurements in practice.
Overview of the change:
- New SnippetRepetitor abstraction that handles repeating the snippet.
The assembler delegates repeating the instructions to this class.
- ExegesisTarget learns how to decrement loop counter and jump.
- Some refactoring of the assembler into FunctionFiller/BasicBlockFiller.
Reviewers: gchatelet
Subscribers: mgorny, tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68125
llvm-svn: 373083
Summary:
Right now latency generation can incorrectly select the scratch register
as a dependency-carrying register.
- Move the logic for preventing register selection from Uops
implementation to common SnippetGenerator class.
- Aliasing detection now takes a set of forbidden registers just like
random register assignment does.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68084
llvm-svn: 373048
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Summary: This helps building internal tools on top of the library.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, bdb, ondrasej
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62239
llvm-svn: 361385
It was previously writing this temporary snippet to file,
then reading it back, but leaving the tmp file in place.
This is both unefficient, and results in huge garbage pileup
in /tmp.
One would have thought it would have been caught during D60317..
llvm-svn: 360138
This *APPEARS* to fix a *very* infuriating issue of Yaml's being corrupted,
partially written, truncated. Or at least i'm not seeing the issue
on a new benchmark sweep.
The issue is somewhat rare, happens maybe once in 1000 benchmarks.
Which means there are up to hundreds of broken benchmarks
for a full x86 sweep in a single mode.
llvm-svn: 360124
Summary:
A *lot* of instructions have this special register.
It seems this never really worked, but i finally noticed it only
because it happened to break for `CMOV16rm` instruction.
We serialized that register as "" (empty string), which is naturally
'ignored' during deserialization, so we re-create a `MCInst` with
too few operands.
And when we then happened to try to resolve variant sched class
for this mis-serialized instruction, and the variant predicate
tried to read an operand that was out of bounds since we got less operands,
we crashed.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=41448 | PR41448 ]].
Reviewers: craig.topper, courbet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60517
llvm-svn: 358153
Wanted to check if inablility to measure latency of CMOV32rm
is a regression from D60041 / D60138, but unable to do that
because the llvm-exegesis-{8,9} from debian sid fails
with that cryptic, unhelpful error.
I suspect this will be a better error.
llvm-svn: 357900
Summary:
D60041 / D60138 refactoring changed how CMOV/SETcc opcodes
are handled. concode is now an immediate, with it's own operand type.
This at least allows to not crash on the opcode.
However, this still won't generate all the snippets
with all the condcode enumerators. D60066 does that.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60057
llvm-svn: 357841
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon
Reviewed By: RKSimon
Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60228
llvm-svn: 357802
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri
Reviewed By: andreadb
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60138
llvm-svn: 357801
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.
This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.
Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.
This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.
I plan to make similar changes for SETcc and Jcc.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60041
llvm-svn: 357800
Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59994
llvm-svn: 357263
Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59993
llvm-svn: 357259
Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59951
llvm-svn: 357245
Summary:
This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.
But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.
The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.
The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.
Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.
This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59820
llvm-svn: 357152
Summary:
This prevents "Cannot encode high byte register in REX-prefixed instruction"
from happening on instructions that require REX encoding when AH & co
get selected.
On the down side, these 4 registers can no longer be selected
automatically, but this avoids having to expose all the X86 encoding
complexity.
Reviewers: gchatelet
Subscribers: tschuett, jdoerfert, llvm-commits, bdb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59821
llvm-svn: 357003
Results in much nicer -help output:
```
$ ./bin/llvm-exegesis -help
USAGE: llvm-exegesis [options]
OPTIONS:
Color Options:
-color - Use colors in output (default=autodetect)
General options:
-enable-cse-in-irtranslator - Should enable CSE in irtranslator
-enable-cse-in-legalizer - Should enable CSE in Legalizer
Generic Options:
-help - Display available options (-help-hidden for more)
-help-list - Display list of available options (-help-list-hidden for more)
-version - Display the version of this program
llvm-exegesis analysis options:
-analysis-clustering-epsilon=<number> - dbscan epsilon for benchmark point clustering
-analysis-clusters-output-file=<string> -
-analysis-display-unstable-clusters - if there is more than one benchmark for an opcode, said benchmarks may end up not being clustered into the same cluster if the measured performance characteristics are different. by default all such opcodes are filtered out. this flag will instead show only such unstable opcodes
-analysis-inconsistencies-output-file=<string> -
-analysis-inconsistency-epsilon=<number> - epsilon for detection of when the cluster is different from the LLVM schedule profile values
-analysis-numpoints=<uint> - minimum number of points in an analysis cluster
llvm-exegesis benchmark options:
-ignore-invalid-sched-class - ignore instructions that do not define a sched class
-mode=<value> - the mode to run
=latency - Instruction Latency
=inverse_throughput - Instruction Inverse Throughput
=uops - Uop Decomposition
=analysis - Analysis
-num-repetitions=<uint> - number of time to repeat the asm snippet
-opcode-index=<int> - opcode to measure, by index
-opcode-name=<string> - comma-separated list of opcodes to measure, by name
-snippets-file=<string> - code snippets to measure
llvm-exegesis options:
-benchmarks-file=<string> - File to read (analysis mode) or write (latency/uops/inverse_throughput modes) benchmark results. “-” uses stdin/stdout.
-mcpu=<string> - cpu name to use for pfm counters, leave empty to autodetect
```
llvm-svn: 356364
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).
Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".
These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.
When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.
Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.
Differential Revision: https://reviews.llvm.org/D57244
llvm-svn: 353043