function_ref is non-owning, so if we get it as a parameter in constructor,
our reference goes out-of-scope as soon as constructor returns.
Instead, let's just take it as a parameter to the actual `generate()` call
Summary:
Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on `randomizeUnsetVariables()`
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)
Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.
Which we can do, by taking the existing `randomizeMCOperand()`,
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much `InstructionTemplate`
combinations of these possible values for variables as needed/possible.
Of course, that requires invasive changes to no longer pass just the
naked `Instruction`, but sometimes partially filled `InstructionTemplate`.
As it can be seen from the test, this allows us to explore
`X86::OperandType::OPERAND_COND_CODE` for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74156
Summary:
Simplifies the C++11-style "-> decltype(...)" return-type deduction.
Note that you have to be careful about whether the function return type
is `auto` or `decltype(auto)`. The difference is that bare `auto`
strips const and reference, just like lambda return type deduction. In
some cases that's what we want (or more likely, we know that the return
type is a value type), but whenever we're wrapping a templated function
which might return a reference, we need to be sure that the return type
is decltype(auto).
No functional change.
Subscribers: dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74383
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.
To facilitate this, a new `Error` type has been added which is only used
to log errors to the yaml output.
Differential Revision: https://reviews.llvm.org/D74215
Summary: Commit 141915963b was reverted in
abe01e17f6 because it broke builds testing
without libpfm. A preparatory commit <commit_sha1> was added to enable
this recommit.
Original commit message:
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.
Differential Revision: https://reviews.llvm.org/D74113
Summary: Commit b3576f60eb was reverted in
abe01e17f6 because it broke builds testing
without libpfm. A preparatory commit <commit_sha1> was added to enable
this recommit.
Original commit message:
Fix inconsistencies in error reporting created by mixing
`report_fatal_error()` and `ExitOnErr()`, and add additional information
to the error message to make it more user friendly. Minimize the use
`report_fatal_error()` because it's meant for use in very rare cases and
it results in low information density of the error messages.
Summary of the new design:
* For command line argument errors output `llvm-exegesis: <error_message>`,
which is consistent with the error output format emitted by the backend
which checks correctness of the command line arguments.
* For other errors the format `llvm-exegesis error: <error_message>` is used.
** If the error occurred during file access `<error_message>` will have
of two parts: `'<file_name>': <rest_of_the_error_message>`
Differential Revision: https://reviews.llvm.org/D74085
All errors of type `Failure` are `StringError`s. In order for exit code
mapping to detect that specifically a clustering error has occurred it
needs to have a different type.
This patch also prepares D74085 where termination `report_fatal_error()`
will be replaced with emitting `StringError`s.
Differential Revision: https://reviews.llvm.org/D74124
It broke e.g. all tests under tools/llvm-exegesis/X86/ when libpfm is
not available, see comment on D74085.
This reverts commit b3576f60eb and
141915963b.
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.
Differential Revision: https://reviews.llvm.org/D74113
Fix inconsistencies in error reporting created by mixing
`report_fatal_error()` and `ExitOnErr()`, and add additional information
to the error message to make it more user friendly. Minimize the use
`report_fatal_error()` because it's meant for use in very rare cases and
it results in low information density of the error messages.
Summary of the new design:
* For command line argument errors output `llvm-exegesis: <error_message>`,
which is consistent with the error output format emitted by the backend
which checks correctness of the command line arguments.
* For other errors the format `llvm-exegesis error: <error_message>` is used.
** If the error occurred during file access `<error_message>` will have
of two parts: `'<file_name>': <rest_of_the_error_message>`
Differential Revision: https://reviews.llvm.org/D74085
Summary:
It turns out that CUR_DIRECTION is just an internal placeholder, not an actual
valid encoded value.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73343
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Summary:
What we're redoing already exists in the X86 backend, it's called
`X86II::getOperandBias`.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73340
Since some instruction types aren't allowed as the main instruction also
don't allow them for aliasing instructions.
Differential Revision: https://reviews.llvm.org/D73220
Summary:
... instead of crashing.
On typical exmaple is when there are no available registers.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73196
Summary:
Right now when picking a back-to-back instruction at random, we might select
instructions that we do not know how to handle.
Add a ExegesisTarget hook to possibly filter instructions.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73161
Summary:
Add unit test to show the issue: We must select an *aliasing* output
register, not the exact register.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73095
The addition of `inverse_throughput` mode highlighted the disjointedness
of snippet generators and benchmark runners because it used the
`UopsSnippetGenerator` with the `LatencyBenchmarkRunner`.
To keep the code consistent tie the snippet generators to
parallelization/serialization rather than their benchmark runners.
Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`.
Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`.
Differential Revision: https://reviews.llvm.org/D72928
This causes an error with older versions of clang: constructor for
'llvm::exegesis::InstructionsCache' must explicitly initialize the const
member 'BVC'
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.
It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.
Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.
The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.
In any case, downstream projects which don't know `Address` can pass 0 as
the argument.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72172
Check if the appropriate counters for the specified mode are defined on
the target. This is checked before any other work is done.
Differential Revision: https://reviews.llvm.org/D71927
Summary: This is a follow up on D71137 properly setting up the AsmTargetStreamer prior to AsmParser::Run call.
Reviewers: courbet, mstojanovic
Subscribers: tschuett, mikhail.ramalho, llvm-commits, petarj, atanasyan
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71468
Summary: This is a following up to D70874. It adds the initialization of FPCW in llvm-exegesis.
Reviewers: craig.topper, RKSimon, courbet, gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70891
Summary: This patch is used to initialize the new added register MXCSR.
Reviewers: craig.topper, RKSimon
Subscribers: tschuett, courbet, llvm-commits, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70874
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
Works on this dependency chain:
ArrayRef.h ->
Hashing.h -> --CUT--
Host.h ->
StringMap.h / StringRef.h
ArrayRef is very popular, but Host.h is rarely needed. Move the
IsBigEndianHost constant to SwapByteOrder.h. Clients of that header are
more likely to need it.
llvm-svn: 375316
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.
Noticed with -ftime-trace.
llvm-svn: 375311
The target does just enough to be able to run llvm-exegesis in latency
mode for at least some opcodes.
Patch by Miloš Stojanović.
Differential Revision: https://reviews.llvm.org/D68649
llvm-svn: 374590
Summary: Experiments show that this is the alignment we get (for ELF+Linux), but let's ensure that we have it.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68703
llvm-svn: 374170
This was breaking some bots:
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/include/llvm/Support/Error.h:483:5: required from ‘llvm::Expected<T>::Expected(OtherT&&, typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type*) [with OtherT = std::vector<llvm::exegesis::CodeTemplate>&; T = std::vector<llvm::exegesis::CodeTemplate>; typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type = void]’
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/tools/llvm-exegesis/lib/X86/Target.cpp:238:20: required from here
/usr/include/c++/6/bits/stl_construct.h:75:7: error: use of deleted function ‘llvm::exegesis::CodeTemplate::CodeTemplate(const llvm::exegesis::CodeTemplate&)’
{ ::new(static_cast<void*>(__p)) _T1(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llvm-svn: 374149
Summary:
This will help for PR32326.
This shows the well-known issue with `RBP` and `R13` as base registers.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, RKSimon, andreadb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68646
llvm-svn: 374146
Summary:
This adds a `-max-configs-per-opcode` option to limit the number of
configs per opcode.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68642
llvm-svn: 374054
Summary:
Right now there are no snippet generators that emit the `Config` Field,
but I plan to add it to investigate LEA operands for PR32326.
What was broken was:
- `Config` Was not propagated up until the BenchmarkResult::Key.
- Clustering should really consider different configs as measuring
different things, so we should stabilize on (Opcode, Config) instead of
just Opcode.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, lebedev.ri
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68629
llvm-svn: 374031
Existing clients are converted to use MachineModuleInfoWrapperPass. The
new interface is for defining a new pass manager API in CodeGen.
Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm
Reviewed By: arsenm, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D64183
llvm-svn: 373240
Summary:
Before this change the Executable function was made by duplicating the
snippet. This change adds a --repetion-mode={loop|duplicate} flag that
allows choosing between this behaviour and wrapping the snippet instructions
in a loop.
The new mode can help measurements when the snippet fits in the DSB by
short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but
since these are not part of the critical path, they execute in parallel
with the measured code and do not impact measurements in practice.
Overview of the change:
- New SnippetRepetitor abstraction that handles repeating the snippet.
The assembler delegates repeating the instructions to this class.
- ExegesisTarget learns how to decrement loop counter and jump.
- Some refactoring of the assembler into FunctionFiller/BasicBlockFiller.
Reviewers: gchatelet
Subscribers: mgorny, tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68125
llvm-svn: 373083
Summary:
Right now latency generation can incorrectly select the scratch register
as a dependency-carrying register.
- Move the logic for preventing register selection from Uops
implementation to common SnippetGenerator class.
- Aliasing detection now takes a set of forbidden registers just like
random register assignment does.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68084
llvm-svn: 373048
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Summary: This helps building internal tools on top of the library.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, bdb, ondrasej
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62239
llvm-svn: 361385
It was previously writing this temporary snippet to file,
then reading it back, but leaving the tmp file in place.
This is both unefficient, and results in huge garbage pileup
in /tmp.
One would have thought it would have been caught during D60317..
llvm-svn: 360138
This *APPEARS* to fix a *very* infuriating issue of Yaml's being corrupted,
partially written, truncated. Or at least i'm not seeing the issue
on a new benchmark sweep.
The issue is somewhat rare, happens maybe once in 1000 benchmarks.
Which means there are up to hundreds of broken benchmarks
for a full x86 sweep in a single mode.
llvm-svn: 360124
Summary:
A *lot* of instructions have this special register.
It seems this never really worked, but i finally noticed it only
because it happened to break for `CMOV16rm` instruction.
We serialized that register as "" (empty string), which is naturally
'ignored' during deserialization, so we re-create a `MCInst` with
too few operands.
And when we then happened to try to resolve variant sched class
for this mis-serialized instruction, and the variant predicate
tried to read an operand that was out of bounds since we got less operands,
we crashed.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=41448 | PR41448 ]].
Reviewers: craig.topper, courbet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60517
llvm-svn: 358153
Wanted to check if inablility to measure latency of CMOV32rm
is a regression from D60041 / D60138, but unable to do that
because the llvm-exegesis-{8,9} from debian sid fails
with that cryptic, unhelpful error.
I suspect this will be a better error.
llvm-svn: 357900
Summary:
D60041 / D60138 refactoring changed how CMOV/SETcc opcodes
are handled. concode is now an immediate, with it's own operand type.
This at least allows to not crash on the opcode.
However, this still won't generate all the snippets
with all the condcode enumerators. D60066 does that.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60057
llvm-svn: 357841
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon
Reviewed By: RKSimon
Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60228
llvm-svn: 357802
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri
Reviewed By: andreadb
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60138
llvm-svn: 357801
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.
This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.
Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.
This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.
I plan to make similar changes for SETcc and Jcc.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60041
llvm-svn: 357800
Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59994
llvm-svn: 357263
Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59993
llvm-svn: 357259
Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59951
llvm-svn: 357245
Summary:
This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.
But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.
The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.
The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.
Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.
This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59820
llvm-svn: 357152
Summary:
This prevents "Cannot encode high byte register in REX-prefixed instruction"
from happening on instructions that require REX encoding when AH & co
get selected.
On the down side, these 4 registers can no longer be selected
automatically, but this avoids having to expose all the X86 encoding
complexity.
Reviewers: gchatelet
Subscribers: tschuett, jdoerfert, llvm-commits, bdb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59821
llvm-svn: 357003
Results in much nicer -help output:
```
$ ./bin/llvm-exegesis -help
USAGE: llvm-exegesis [options]
OPTIONS:
Color Options:
-color - Use colors in output (default=autodetect)
General options:
-enable-cse-in-irtranslator - Should enable CSE in irtranslator
-enable-cse-in-legalizer - Should enable CSE in Legalizer
Generic Options:
-help - Display available options (-help-hidden for more)
-help-list - Display list of available options (-help-list-hidden for more)
-version - Display the version of this program
llvm-exegesis analysis options:
-analysis-clustering-epsilon=<number> - dbscan epsilon for benchmark point clustering
-analysis-clusters-output-file=<string> -
-analysis-display-unstable-clusters - if there is more than one benchmark for an opcode, said benchmarks may end up not being clustered into the same cluster if the measured performance characteristics are different. by default all such opcodes are filtered out. this flag will instead show only such unstable opcodes
-analysis-inconsistencies-output-file=<string> -
-analysis-inconsistency-epsilon=<number> - epsilon for detection of when the cluster is different from the LLVM schedule profile values
-analysis-numpoints=<uint> - minimum number of points in an analysis cluster
llvm-exegesis benchmark options:
-ignore-invalid-sched-class - ignore instructions that do not define a sched class
-mode=<value> - the mode to run
=latency - Instruction Latency
=inverse_throughput - Instruction Inverse Throughput
=uops - Uop Decomposition
=analysis - Analysis
-num-repetitions=<uint> - number of time to repeat the asm snippet
-opcode-index=<int> - opcode to measure, by index
-opcode-name=<string> - comma-separated list of opcodes to measure, by name
-snippets-file=<string> - code snippets to measure
llvm-exegesis options:
-benchmarks-file=<string> - File to read (analysis mode) or write (latency/uops/inverse_throughput modes) benchmark results. “-” uses stdin/stdout.
-mcpu=<string> - cpu name to use for pfm counters, leave empty to autodetect
```
llvm-svn: 356364
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).
Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".
These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.
When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.
Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.
Differential Revision: https://reviews.llvm.org/D57244
llvm-svn: 353043
Summary:
... from 8.
`VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1 i_0x0 i_0x1` instruction already has 9 components.
It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here..
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57654
llvm-svn: 353022
Summary:
Up until the point i have looked in the source, i didn't even understood that
i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
(And hoped it wasn't contributing much of the run time.)
While i expect that it has it's use-cases i never once needed it so far.
If i forget to silence it, console is completely flooded with that output.
How about not expecting users to opt-out of analyses,
but to explicitly specify the analyses that should be performed?
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57648
llvm-svn: 353021
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead.
Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body.
We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process.
Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong.
llvm-svn: 350035
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442
........
Patch wasn't approved and breaks buildbots
llvm-svn: 349139
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442
llvm-svn: 349136
Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically:
1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces
2. Add missing header to some files
3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode
4. Fix typo
Differential Revision: https://reviews.llvm.org/D54343
llvm-svn: 347309
Summary:
As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known,
it will contain at most Points_.size() minus one (the center of the cluster)
While that is the upper bound, meaning in the most cases, the actual count
will be much smaller, since D54390 made the allocation persistent,
we no longer have to worry about overly-optimistically `reserve()`ing.
Old: (D54393)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% )
...
6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% )
```
New:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% )
...
6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% )
```
And that is another -~4%.
Since this is the last (as of this moment) patch in this patch series,
it is a good time to summarize:
Old: (svn trunk, as stated in D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s
user 0m24.099s
sys 0m0.785s
```
So these patches, on a given benchmark,
has decreased llvm-exegesis analysis time by 74.62%.
There surely is more room for further improvements.
D54514 may improve thins by -11.5% more (relative to this patch).
Parallelization may improve things further significantly, too.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54415
llvm-svn: 347204
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.487s
user 0m9.745s
sys 0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m9.599s
user 0m8.824s
sys 0m0.772s
```
Not that much, around -9%. But that is not the good part yet, again.
Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB
New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54382
llvm-svn: 347198
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s
user 0m24.099s
sys 0m0.785s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.469s
user 0m9.797s
sys 0m0.672s
```
So -60%. And that isn't the good bit yet.
Old:
* calls to allocation functions: 106560180 (yes, 107 *million* allocations.)
* bytes allocated in total (ignoring deallocations): 12.17 GB
New:
* calls to allocation functions: 3347676 (-96.86%) (just 3 mil)
* bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less)
---
Two points i want to raise:
* `std::unordered_set<>` should not have been used there in the first place.
It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options
* There is no tests, so i'm not fully sure this is correct.
Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok?
* I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc,
this `llvm::SetVector<>` seems to be best here.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet
Subscribers: kristina, bobsayshilol, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54381
llvm-svn: 347197
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.
Reviewers: gchatelet, john.brawn, jsji
Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54297
llvm-svn: 346489
This is patch to add PowerPC target to llvm-exegesis.
The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes.
Differential Revision: https://reviews.llvm.org/D54185
llvm-svn: 346411
Summary:
Add unit tests to check the support for each supported format to avoid
regressions such as the one in PR36906.
Reviewers: gchatelet
Subscribers: tschuett, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D54144
llvm-svn: 346330
Summary: They do not lower to actual MCInsts and have no scheduling info.
Reviewers: gchatelet
Subscribers: llvm-commits, tschuett
Differential Revision: https://reviews.llvm.org/D54147
llvm-svn: 346227
MachineModuleInfo can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
llvm-svn: 346182
Summary: SNB is the only one that has P23 as a single proc res.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53766
llvm-svn: 345480
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).
This also compresses the pfm counter tables (PR37068).
Reviewers: RKSimon, gchatelet
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D52932
llvm-svn: 345243
Summary:
This allows simplifying references of llvm::foo with foo when the needs
come in the future.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: javed.absar, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53455
llvm-svn: 344922
Summary:
We only support the first form because we rely on information that is
only available there.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53430
llvm-svn: 344782
Summary:
This was lost during refactoring in rL342644.
Fix and simplify simplify value size handling: always go through a 80 bit value,
because the value can be 1 byte). Add unit tests.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53423
llvm-svn: 344779
Summary:
We try to recover gracefully on instructions that would crash the
program.
This includes some refactoring of runMeasurement() implementations.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53371
llvm-svn: 344695
Summary: LatencyGenerator now computes all possible mode of serial execution for an Instruction upfront and generates CodeTemplate for the ones that give the best results (e.g. no need to generate a two instructions snippet when repeating a single one would do). The next step is to generate even more configurations for cases (e.g. for XOR we should generate "XOR EAX, EAX, EAX" and "XOR EAX, EAX, EBX")
Reviewers: courbet
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53320
llvm-svn: 344689
Summary: This is part one of the change where I simply changed the signature of the functions. More work need to be done to actually produce more than one CodeTemplate per instruction.
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53209
llvm-svn: 344493