Commit Graph

400 Commits

Author SHA1 Message Date
Roman Lebedev 687bbf85de
[llvm-exegesis] CombinationGenerator: don't store function_ref
function_ref is non-owning, so if we get it as a parameter in constructor,
our reference goes out-of-scope as soon as constructor returns.
Instead, let's just take it as a parameter to the actual `generate()` call
2020-02-12 23:33:23 +03:00
Roman Lebedev 6030fe01f4
[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE
Summary:
Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on `randomizeUnsetVariables()`
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)

Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.

Which we can do, by taking the existing `randomizeMCOperand()`,
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much `InstructionTemplate`
combinations of these possible values for variables as needed/possible.

Of course, that requires invasive changes to no longer pass just the
naked `Instruction`, but sometimes partially filled `InstructionTemplate`.

As it can be seen from the test, this allows us to explore
`X86::OperandType::OPERAND_COND_CODE` for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74156
2020-02-12 21:33:52 +03:00
Justin Lebar fb45968e62 Use C++14-style return type deduction in LLVM.
Summary:
Simplifies the C++11-style "-> decltype(...)" return-type deduction.

Note that you have to be careful about whether the function return type
is `auto` or `decltype(auto)`.  The difference is that bare `auto`
strips const and reference, just like lambda return type deduction.  In
some cases that's what we want (or more likely, we know that the return
type is a value type), but whenever we're wrapping a templated function
which might return a reference, we need to be sure that the return type
is decltype(auto).

No functional change.

Subscribers: dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74383
2020-02-11 07:38:42 -08:00
Bill Wendling c55cf4afa9 Revert "Remove redundant "std::move"s in return statements"
The build failed with

  error: call to deleted constructor of 'llvm::Error'

errors.

This reverts commit 1c2241a793.
2020-02-10 07:07:40 -08:00
Bill Wendling 1c2241a793 Remove redundant "std::move"s in return statements 2020-02-10 06:39:44 -08:00
Miloš Stojanović 205292740d [llvm-exegesis] Improve error reporting in BenchmarkRunner.cpp
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.
To facilitate this, a new `Error` type has been added which is only used
to log errors to the yaml output.

Differential Revision: https://reviews.llvm.org/D74215
2020-02-07 16:29:52 +01:00
Miloš Stojanović 4bd40f71a7 Recommit: "[llvm-exegesis] Improve error reporting in Target.cpp"
Summary: Commit 141915963b was reverted in
abe01e17f6 because it broke builds testing
without libpfm. A preparatory commit <commit_sha1> was added to enable
this recommit.

Original commit message:

Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.

Differential Revision: https://reviews.llvm.org/D74113
2020-02-07 14:34:58 +01:00
Miloš Stojanović 830af528a5 Recommit: "[llvm-exegesis] Improve error reporting"
Summary: Commit b3576f60eb was reverted in
abe01e17f6 because it broke builds testing
without libpfm. A preparatory commit <commit_sha1> was added to enable
this recommit.

Original commit message:

Fix inconsistencies in error reporting created by mixing
`report_fatal_error()` and `ExitOnErr()`, and add additional information
to the error message to make it more user friendly. Minimize the use
`report_fatal_error()` because it's meant for use in very rare cases and
it results in low information density of the error messages.

Summary of the new design:

 * For command line argument errors output `llvm-exegesis: <error_message>`,
   which is consistent with the error output format emitted by the backend
   which checks correctness of the command line arguments.
 * For other errors the format `llvm-exegesis error: <error_message>` is used.
 ** If the error occurred during file access `<error_message>` will have
    of two parts: `'<file_name>': <rest_of_the_error_message>`

Differential Revision: https://reviews.llvm.org/D74085
2020-02-07 14:34:58 +01:00
Miloš Stojanović 446268a223 [llvm-exegesis] Add a custom error for clustering
All errors of type `Failure` are `StringError`s. In order for exit code
mapping to detect that specifically a clustering error has occurred it
needs to have a different type.

This patch also prepares D74085 where termination `report_fatal_error()`
will be replaced with emitting `StringError`s.

Differential Revision: https://reviews.llvm.org/D74124
2020-02-07 14:34:57 +01:00
Hans Wennborg abe01e17f6 Revert "[llvm-exegesis] Improve error reporting" and follow-up.
It broke e.g. all tests under tools/llvm-exegesis/X86/ when libpfm is
not available, see comment on D74085.

This reverts commit b3576f60eb and
141915963b.
2020-02-06 12:53:16 +01:00
Miloš Stojanović 141915963b [llvm-exegesis] Improve error reporting in Target.cpp
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.

Differential Revision: https://reviews.llvm.org/D74113
2020-02-06 12:26:08 +01:00
Miloš Stojanović b3576f60eb [llvm-exegesis] Improve error reporting
Fix inconsistencies in error reporting created by mixing
`report_fatal_error()` and `ExitOnErr()`, and add additional information
to the error message to make it more user friendly. Minimize the use
`report_fatal_error()` because it's meant for use in very rare cases and
it results in low information density of the error messages.

Summary of the new design:

 * For command line argument errors output `llvm-exegesis: <error_message>`,
   which is consistent with the error output format emitted by the backend
   which checks correctness of the command line arguments.
 * For other errors the format `llvm-exegesis error: <error_message>` is used.
 ** If the error occurred during file access `<error_message>` will have
    of two parts: `'<file_name>': <rest_of_the_error_message>`

Differential Revision: https://reviews.llvm.org/D74085
2020-02-06 12:26:08 +01:00
Simon Pilgrim 83d0db59d6 Fix "expression is redundant [misc-redundant-expression]" warning (PR44768)
Be more specific that getOperandConstraint should return -1 or a uint8_t value
2020-02-04 21:24:21 +00:00
Clement Courbet 082dccac90 [llvm-exegesis] Restrict the range of allowable rounding countrols.
Summary:
It turns out that CUR_DIRECTION is just an internal placeholder, not an actual
valid encoded value.

Reviewers: gchatelet

Subscribers: tschuett, mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73343
2020-02-03 11:53:27 +01:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Clement Courbet 2ee218f365 [llvm-exegesis][NFC] Simplify code.
Summary:
What we're redoing already exists in the X86 backend, it's called
`X86II::getOperandBias`.

Reviewers: gchatelet

Subscribers: tschuett, mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73340
2020-01-24 12:45:20 +01:00
Miloš Stojanović e8fc8507da [llvm-exegesis] Don't use unsupported aliasing instructions
Since some instruction types aren't allowed as the main instruction also
don't allow them for aliasing instructions.

Differential Revision: https://reviews.llvm.org/D73220
2020-01-23 12:42:42 +01:00
Clement Courbet 04fd204156 [llvm-exegesis] Allow the randomizer to fail nicely...
Summary:
... instead of crashing.
On typical exmaple is when there are no available registers.

Reviewers: gchatelet

Subscribers: tschuett, mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73196
2020-01-23 11:08:44 +01:00
Clement Courbet 6d2510d30a [llvm-exegesis] Restrict to allowed back-to-back instructions in SerialSnippetGenerator.
Summary: Followup to D73161.

Reviewers: gchatelet, mstojanovic

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73256
2020-01-23 10:00:05 +01:00
Clement Courbet 5be8b2ec4a [llvm-exegesis] Serial snippet: Restrict the set of back-to-back instructions
Summary:
Right now when picking a back-to-back instruction at random, we might select
instructions that we do not know how to handle.
Add a ExegesisTarget hook to possibly filter instructions.

Reviewers: gchatelet

Subscribers: tschuett, mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73161
2020-01-22 11:00:43 +01:00
Clement Courbet 87632b9e06 [llvm-exegesis] Fix support for LEA64_32r.
Summary:
Add unit test to show the issue: We must select an *aliasing* output
register, not the exact register.

Reviewers: gchatelet

Subscribers: tschuett, mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73095
2020-01-21 13:58:23 +01:00
Clement Courbet d6f4cfdbd7 [llvm-exegesis] Add support for AVX512 explicit rounding operands.
Reviewers: gchatelet

Subscribers: tschuett, mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73083
2020-01-21 11:50:17 +01:00
Miloš Stojanović b37f6d3af1 [llvm-exegesis] Remove unused variable after D72928 2020-01-20 18:23:41 +01:00
Miloš Stojanović 24b7b99b7d [llvm-exegesis][NFC] Disassociate snippet generators from benchmark runners
The addition of `inverse_throughput` mode highlighted the disjointedness
of snippet generators and benchmark runners because it used the
`UopsSnippetGenerator` with the  `LatencyBenchmarkRunner`.
To keep the code consistent tie the snippet generators to
parallelization/serialization rather than their benchmark runners.

Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`.
Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`.

Differential Revision: https://reviews.llvm.org/D72928
2020-01-20 16:19:13 +01:00
Fangrui Song ed9cc6404e [llvm-exegesis][mips] Fix -Wunused-function after D72858 2020-01-18 13:57:19 -08:00
Miloš Stojanović ea91758a3c [llvm-exegesis][mips] Add support for memory instructions
Implementing functions used to enable testing of memory instructions.

Differential Revision: https://reviews.llvm.org/D72858
2020-01-17 13:26:09 +01:00
Jonas Devlieghere 09db6e3209 [llvm-exegesis] Initialize const bitvector member
This causes an error with older versions of clang: constructor for
'llvm::exegesis::InstructionsCache' must explicitly initialize the const
member 'BVC'
2020-01-13 17:32:30 -08:00
Miloš Stojanović a70b993239 [llvm-exegesis] Remove unneeded std::move()
Caught by buildbot breakage:

/home/docker/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm/llvm/tools/llvm-exegesis/lib/Mips/Target.cpp:89:12: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
    return std::move(Instructions);
           ^
/home/docker/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm/llvm/tools/llvm-exegesis/lib/Mips/Target.cpp:89:12: note: remove std::move call here
    return std::move(Instructions);
           ^~~~~~~~~~            ~
2020-01-13 14:19:17 +01:00
Miloš Stojanović 804dd67227 [llvm-exegesis][mips] Expand loadImmediate()
Add support for loading 32-bit immediates and enable the use of GPR64
registers.

Differential Revision: https://reviews.llvm.org/D71873
2020-01-13 12:32:13 +01:00
Fangrui Song 6fdd6a7b3f [Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.

If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
2020-01-11 13:34:52 -08:00
Fangrui Song aa708763d3 [MC] Add parameter `Address` to MCInstPrinter::printInst
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.

It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.

Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.

The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.

In any case, downstream projects which don't know `Address` can pass 0 as
the argument.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72172
2020-01-06 20:42:22 -08:00
Miloš Stojanović c7dc4734d2 [llvm-exegesis] Check counters before running
Check if the appropriate counters for the specified mode are defined on
the target. This is checked before any other work is done.

Differential Revision: https://reviews.llvm.org/D71927
2019-12-31 14:17:24 +01:00
Mark de Wever 536c9a604e [Tools] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D71808
2019-12-22 19:11:17 +01:00
Guillaume Chatelet 32d384c020 [llvm-exegesis][NFC] internal changes
Summary:
BitVectors are now cached to lower memory utilization.
Instructions have reference semantics.

Reviewers: courbet

Subscribers: sdardis, tschuett, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71653
2019-12-18 17:24:07 +01:00
Guillaume Chatelet c72bff6821 [llvm-exegesis] Set up AsmTargetStreamer in readSnippets
Summary: This is a follow up on D71137 properly setting up the AsmTargetStreamer prior to AsmParser::Run call.

Reviewers: courbet, mstojanovic

Subscribers: tschuett, mikhail.ramalho, llvm-commits, petarj, atanasyan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71468
2019-12-16 13:37:59 +01:00
Clement Courbet 3540b80fe4 [llvm-exegesis] Fix 44b9942898.
Summary:
Add missing stack release instructions in
loadImplicitRegAndFinalize.

Reviewers: pengfei, gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70903
2019-12-02 16:13:27 +01:00
Wang, Pengfei 76b70f6f75 [X86] Add initialization of FPCW in llvm-exegesis
Summary: This is a following up to D70874. It adds the initialization of FPCW in llvm-exegesis.

Reviewers: craig.topper, RKSimon, courbet, gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70891
2019-12-02 20:18:35 +08:00
Wang, Pengfei 44b9942898 [X86] Add initialization of MXCSR in llvm-exegesis
Summary: This patch is used to initialize the new added register MXCSR.

Reviewers: craig.topper, RKSimon

Subscribers: tschuett, courbet, llvm-commits, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70874
2019-12-02 18:19:32 +08:00
Reid Kleckner 1dfede3122 Move CodeGenFileType enum to Support/CodeGen.h
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
2019-11-13 16:39:34 -08:00
Simon Pilgrim aedb528d43 llvm-exegesis - fix shadow variable warnings. NFCI. 2019-11-09 13:43:09 +00:00
Mirko Brkusanin 4b63ca1379 [Mips] Use appropriate private label prefix based on Mips ABI
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.

Tags: #llvm, #clang, #lldb

Differential Revision: https://reviews.llvm.org/D66795
2019-10-23 12:24:35 +02:00
Reid Kleckner 90c64a3456 Move endian constant from Host.h to SwapByteOrder.h, prune include
Works on this dependency chain:
  ArrayRef.h ->
  Hashing.h -> --CUT--
  Host.h ->
  StringMap.h / StringRef.h

ArrayRef is very popular, but Host.h is rarely needed. Move the
IsBigEndianHost constant to SwapByteOrder.h. Clients of that header are
more likely to need it.

llvm-svn: 375316
2019-10-19 00:48:11 +00:00
Reid Kleckner 1d7b41361f Prune two MachineInstr.h includes, fix up deps
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.

Noticed with -ftime-trace.

llvm-svn: 375311
2019-10-19 00:22:07 +00:00
Simon Atanasyan cf1ba238d4 [Mips][llvm-exegesis] Add a Mips target
The target does just enough to be able to run llvm-exegesis in latency
mode for at least some opcodes.

Patch by Miloš Stojanović.

Differential Revision: https://reviews.llvm.org/D68649

llvm-svn: 374590
2019-10-11 20:26:08 +00:00
Clement Courbet c8eb0547ef [llvm-exegesis] Show noise cluster in analysis output.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68780

llvm-svn: 374533
2019-10-11 11:33:18 +00:00
Clement Courbet 04a9a0eb0d [llvm-exegesis] Ensure that ExecutableFunction are aligned.
Summary: Experiments show that this is the alignment we get (for ELF+Linux), but let's ensure that we have it.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68703

llvm-svn: 374170
2019-10-09 14:25:08 +00:00
Clement Courbet 64a83bb253 [llvm-exegesis] Fix r374158
Some bots complain about missing 'class':

LlvmState.h:70:40: error: declaration of ‘std::unique_ptr<const llvm::TargetMachine> llvm::exegesis::LLVMState::TargetMachine’ [-fpermissive]
   std::unique_ptr<const TargetMachine> TargetMachine;

llvm-svn: 374162
2019-10-09 12:37:56 +00:00
Clement Courbet 50cdd56beb [llvm-exegesis][NFC] Remove extra `llvm::` qualifications.
Summary: Second patch: in the lib.

Reviewers: gchatelet

Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68692

llvm-svn: 374158
2019-10-09 11:58:42 +00:00
Clement Courbet 66f05d7389 [llvm-exegesis] Add missing std::move in rL374146.
This was breaking some bots:

/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/include/llvm/Support/Error.h:483:5:   required from ‘llvm::Expected<T>::Expected(OtherT&&, typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type*) [with OtherT = std::vector<llvm::exegesis::CodeTemplate>&; T = std::vector<llvm::exegesis::CodeTemplate>; typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type = void]’
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/tools/llvm-exegesis/lib/X86/Target.cpp:238:20:   required from here
/usr/include/c++/6/bits/stl_construct.h:75:7: error: use of deleted function ‘llvm::exegesis::CodeTemplate::CodeTemplate(const llvm::exegesis::CodeTemplate&)’
     { ::new(static_cast<void*>(__p)) _T1(std::forward<_Args>(__args)...); }
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

llvm-svn: 374149
2019-10-09 09:07:21 +00:00
Clement Courbet f8d482c07b [llvm-exegesis][NFC] Fix rL374146.
Remove extra semicolon: Target.cpp:187:2: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 374147
2019-10-09 09:03:42 +00:00
Clement Courbet c3a7fb7599 [llvm-exegesis] Explore LEA addressing modes.
Summary:
This will help for PR32326.

This shows the well-known issue with `RBP` and `R13` as base registers.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits, RKSimon, andreadb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68646

llvm-svn: 374146
2019-10-09 08:49:13 +00:00
Clement Courbet 2cd0f28959 [llvm-exegesis] Add options to SnippetGenerator.
Summary:
This adds a `-max-configs-per-opcode` option to limit the number of
configs per opcode.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68642

llvm-svn: 374054
2019-10-08 14:30:24 +00:00
Clement Courbet 4919534ae4 [llvm-exegesis] Finish plumbing the `Config` field.
Summary:
Right now there are no snippet generators that emit the `Config` Field,
but I plan to add it to investigate LEA operands for PR32326.

What was broken was:
 - `Config` Was not propagated up until the BenchmarkResult::Key.
 - Clustering should really consider different configs as measuring
 different things, so we should stabilize on (Opcode, Config) instead of
 just Opcode.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68629

llvm-svn: 374031
2019-10-08 09:06:48 +00:00
Clement Courbet c0292744da [llvm-exegesis][NFC] Rename ExegesisTarget::decrementLoopCounterAndLoop()
Summary: To decrementLoopCounterAndJump, and explicitely take the jump target.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68375

llvm-svn: 373571
2019-10-03 07:56:56 +00:00
Michal Gorny f488cbdcd8 [llvm-exegesis/lib] Fix missing linkage to MCParser
Otherwise, shared-lib build fails with:

lib64/libLLVMExegesis.a(SnippetFile.cpp.o): In function `llvm::exegesis::readSnippets(llvm::exegesis::LLVMState const&, llvm::StringRef)':
SnippetFile.cpp:(.text._ZN4llvm8exegesis12readSnippetsERKNS0_9LLVMStateENS_9StringRefE+0x31f): undefined reference to `llvm::createMCAsmParser(llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo const&, unsigned int)'
SnippetFile.cpp:(.text._ZN4llvm8exegesis12readSnippetsERKNS0_9LLVMStateENS_9StringRefE+0x41c): undefined reference to `llvm::MCAsmParser::setTargetParser(llvm::MCTargetAsmParser&)'
collect2: error: ld returned 1 exit status

llvm-svn: 373332
2019-10-01 13:02:48 +00:00
Yuanfang Chen cc382cf727 [NewPM] Port MachineModuleInfo to the new pass manager.
Existing clients are converted to use MachineModuleInfoWrapperPass. The
new interface is for defining a new pass manager API in CodeGen.

Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm

Reviewed By: arsenm, fedor.sergeev

Differential Revision: https://reviews.llvm.org/D64183

llvm-svn: 373240
2019-09-30 17:54:50 +00:00
Clement Courbet 03a3d29541 [llvm-exegesis][NFC] Move BenchmarkFailure to own file.
Summary: And rename to exegesis::Failure, as it's used everytwhere.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68217

llvm-svn: 373209
2019-09-30 13:53:50 +00:00
Clement Courbet 3e13816be2 [llvm-exegesis][NFC] Refactor snippet file reading out of tool main.
Summary: Add unit tests.

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68212

llvm-svn: 373202
2019-09-30 12:50:25 +00:00
Simon Pilgrim 1a55431a03 Fix MSVC "not all control paths return a value" warning. NFCI.
llvm-svn: 373100
2019-09-27 16:56:07 +00:00
Clement Courbet 9431b72ce9 [llvm-exegesis] Add loop mode for repeating the snippet.
Summary:
Before this change the Executable function was made by duplicating the
snippet. This change adds a --repetion-mode={loop|duplicate} flag that
allows choosing between this behaviour and wrapping the snippet instructions
in a loop.

The new mode can help measurements when the snippet fits in the DSB by
short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but
since these are not part of the critical path, they execute in parallel
with the measured code and do not impact measurements in practice.

Overview of the change:
 - New SnippetRepetitor abstraction that handles repeating the snippet.
   The assembler delegates repeating the instructions to this class.
 - ExegesisTarget learns how to decrement loop counter and jump.
 - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller.

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68125

llvm-svn: 373083
2019-09-27 12:56:24 +00:00
Clement Courbet 8ef97e1aad [llvm-exegesis] Refactor how forbidden registers are computed.
Summary:
Right now latency generation can incorrectly select the scratch register
as a dependency-carrying register.
 - Move the logic for preventing register selection from Uops
   implementation to common SnippetGenerator class.
 - Aliasing detection now takes a set of forbidden registers just like
   random register assignment does.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68084

llvm-svn: 373048
2019-09-27 08:04:10 +00:00
Clement Courbet 06f9ce84fe [llvm-exegesis][NFC] Remove dead code.
Summary: `hasAliasingImplicitRegistersThrough()` is no longer used.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68078

llvm-svn: 372968
2019-09-26 11:32:44 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Fangrui Song d9b948b6eb Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC
F_{None,Text,Append} are kept for compatibility since r334221.

llvm-svn: 367800
2019-08-05 05:43:48 +00:00
Sylvestre Ledru 375297f38f fix a typo unavaliable=>unavailable
llvm-svn: 362878
2019-06-08 15:07:55 +00:00
Clement Courbet b9274f2694 [llvm-exegesis] Move native target initialization code to a separate file.
Summary: This helps building internal tools on top of the library.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits, bdb, ondrasej

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62239

llvm-svn: 361385
2019-05-22 13:50:16 +00:00
Roman Lebedev 9bac7d8165 [llvm-exegesis] BenchmarkRunner::runConfiguration(): write small snippet to memory
It was previously writing this temporary snippet to file,
then reading it back, but leaving the tmp file in place.
This is both unefficient, and results in huge garbage pileup
in /tmp.

One would have thought it would have been caught during D60317..

llvm-svn: 360138
2019-05-07 12:28:08 +00:00
Roman Lebedev 724a68f372 [llvm-exegesis] InstructionBenchmark::writeYamlTo(): don't forget to flush()
This *APPEARS* to fix a *very* infuriating issue of Yaml's being corrupted,
partially written, truncated. Or at least i'm not seeing the issue
on a new benchmark sweep.

The issue is somewhat rare, happens maybe once in 1000 benchmarks.
Which means there are up to hundreds of broken benchmarks
for a full x86 sweep in a single mode.

llvm-svn: 360124
2019-05-07 09:21:13 +00:00
Ali Tamur 7822b46188 Revert "Use llvm::lower_bound. NFC"
This reverts commit rL358161.

This patch have broken the test:
llvm/test/tools/llvm-exegesis/X86/uops-CMOV16rm-noreg.s

llvm-svn: 358199
2019-04-11 17:35:20 +00:00
Fangrui Song 71cce580b9 Use llvm::lower_bound. NFC
llvm-svn: 358161
2019-04-11 10:25:41 +00:00
Roman Lebedev fbb823891d [llvm-exegesis] Fix serialization/deserialization of special NoRegister register (PR41448)
Summary:
A *lot* of instructions have this special register.
It seems this never really worked, but i finally noticed it only
because it happened to break for `CMOV16rm` instruction.

We serialized that register as "" (empty string), which is naturally
'ignored' during deserialization, so we re-create a `MCInst` with
too few operands.

And when we then happened to try to resolve variant sched class
for this mis-serialized instruction, and the variant predicate
tried to read an operand that was out of bounds since we got less operands,
we crashed.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=41448 | PR41448 ]].

Reviewers: craig.topper, courbet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60517

llvm-svn: 358153
2019-04-11 07:20:50 +00:00
Roman Lebedev 1992e8f38e [llvm-exegesis] Pacify bots - don't std::move() - prevents copy elision
llvm-svn: 358079
2019-04-10 12:47:47 +00:00
Roman Lebedev 41bdeb7b12 [llvm-exegesis] YamlContext: fix some missing spaces/quotes/newlines in error strings
llvm-svn: 358077
2019-04-10 12:20:14 +00:00
Roman Lebedev 628f1ae504 [llvm-exegesis] Fix error propagation from yaml writing (from serialization)
Investigating https://bugs.llvm.org/show_bug.cgi?id=41448

llvm-svn: 358076
2019-04-10 12:19:57 +00:00
Roman Lebedev eb1a156d7f [llvm-exegesis] benchmarkMain(): less cryptic error if built w/o libpfm
Wanted to check if inablility to measure latency of CMOV32rm
is a regression from D60041 / D60138, but unable to do that
because the llvm-exegesis-{8,9} from debian sid fails
with that cryptic, unhelpful error.

I suspect this will be a better error.

llvm-svn: 357900
2019-04-08 10:50:31 +00:00
Roman Lebedev a82235843b [llvm-exegesis][X86] Randomize CMOVcc/SETcc OPERAND_COND_CODE CondCodes
Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60066

llvm-svn: 357898
2019-04-08 10:11:00 +00:00
Roman Lebedev 404bdb1c9e [llvm-exegesis][X86] Handle CMOVcc/SETcc OPERAND_COND_CODE OperandType
Summary:
D60041 / D60138 refactoring changed how CMOV/SETcc opcodes
are handled. concode is now an immediate, with it's own operand type.

This at least allows to not crash on the opcode.
However, this still won't generate all the snippets
with all the condcode enumerators. D60066 does that.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60057

llvm-svn: 357841
2019-04-06 14:16:26 +00:00
Craig Topper 80aa2290fb [X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon

Reviewed By: RKSimon

Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60228

llvm-svn: 357802
2019-04-05 19:28:09 +00:00
Craig Topper 7323c2bf85 [X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri

Reviewed By: andreadb

Subscribers: hiraditya, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60138

llvm-svn: 357801
2019-04-05 19:27:49 +00:00
Craig Topper e0bfeb5f24 [X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate.
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.

This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.

Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.

This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.

I plan to make similar changes for SETcc and Jcc.

Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60041

llvm-svn: 357800
2019-04-05 19:27:41 +00:00
Guillaume Chatelet 848df5b509 Add an option do not dump the generated object on disk
Reviewers: courbet

Subscribers: llvm-commits, bdb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60317

llvm-svn: 357769
2019-04-05 15:18:59 +00:00
Roman Lebedev 4d81e87765 [NFC][llvm-exegesis] Also promote getSchedClassPoint() into ResolvedSchedClass.
Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59994

llvm-svn: 357263
2019-03-29 14:58:01 +00:00
Roman Lebedev 1d1330c546 [NFC][llvm-exegesis] Refactor ResolvedSchedClass & friends
Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59993

llvm-svn: 357259
2019-03-29 14:24:27 +00:00
Roman Lebedev b8fb15d412 [NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch()
Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
     This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
     https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
     I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59951

llvm-svn: 357245
2019-03-29 11:36:08 +00:00
Roman Lebedev c2423fe689 [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary:
This is an alternative to D59539.

Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.

Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.

This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59820

llvm-svn: 357152
2019-03-28 08:55:01 +00:00
Clement Courbet 52da938cd0 [llvm-exegesis] Allow the target to disable the selection of some registers.
Summary:
This prevents "Cannot encode high byte register in REX-prefixed instruction"
from happening on instructions that require REX encoding when AH & co
get selected.
On the down side, these 4 registers can no longer be selected
automatically, but this avoids having to expose all the X86 encoding
complexity.

Reviewers: gchatelet

Subscribers: tschuett, jdoerfert, llvm-commits, bdb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59821

llvm-svn: 357003
2019-03-26 15:44:57 +00:00
Roman Lebedev 23629385f1 [llvm-exegesis] Separate tool options into three categories.
Results in much nicer -help output:
```
$ ./bin/llvm-exegesis -help
USAGE: llvm-exegesis [options]

OPTIONS:

Color Options:

  -color                                         - Use colors in output (default=autodetect)

General options:

  -enable-cse-in-irtranslator                    - Should enable CSE in irtranslator
  -enable-cse-in-legalizer                       - Should enable CSE in Legalizer

Generic Options:

  -help                                          - Display available options (-help-hidden for more)
  -help-list                                     - Display list of available options (-help-list-hidden for more)
  -version                                       - Display the version of this program

llvm-exegesis analysis options:

  -analysis-clustering-epsilon=<number>          - dbscan epsilon for benchmark point clustering
  -analysis-clusters-output-file=<string>        -
  -analysis-display-unstable-clusters            - if there is more than one benchmark for an opcode, said benchmarks may end up not being clustered into the same cluster if the measured performance characteristics are different. by default all such opcodes are filtered out. this flag will instead show only such unstable opcodes
  -analysis-inconsistencies-output-file=<string> -
  -analysis-inconsistency-epsilon=<number>       - epsilon for detection of when the cluster is different from the LLVM schedule profile values
  -analysis-numpoints=<uint>                     - minimum number of points in an analysis cluster

llvm-exegesis benchmark options:

  -ignore-invalid-sched-class                    - ignore instructions that do not define a sched class
  -mode=<value>                                  - the mode to run
    =latency                                     -   Instruction Latency
    =inverse_throughput                          -   Instruction Inverse Throughput
    =uops                                        -   Uop Decomposition
    =analysis                                    -   Analysis
  -num-repetitions=<uint>                        - number of time to repeat the asm snippet
  -opcode-index=<int>                            - opcode to measure, by index
  -opcode-name=<string>                          - comma-separated list of opcodes to measure, by name
  -snippets-file=<string>                        - code snippets to measure

llvm-exegesis options:

  -benchmarks-file=<string>                      - File to read (analysis mode) or write (latency/uops/inverse_throughput modes) benchmark results. “-” uses stdin/stdout.
  -mcpu=<string>                                 - cpu name to use for pfm counters, leave empty to autodetect
```

llvm-svn: 356364
2019-03-18 11:32:37 +00:00
Clement Courbet 0ddf81c43d [llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58285

llvm-svn: 354862
2019-02-26 10:54:45 +00:00
Roman Lebedev 542e5d7bb5 [llvm-exegesis] Split Epsilon param into two (PR40787)
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values

What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)

By splitting it into two params it is now possible.

This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

            390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                 0      cpu-migrations            #    0.000 K/sec
              4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
        1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
         185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
         392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
        1839236666      instructions              #    1.18  insn per cycle
                                                  #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
         407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
          10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)

          0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
               262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                 0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
             13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
       27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
        1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
       16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
       17611143370      instructions              #    0.65  insn per cycle
                                                  #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
        3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
         116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)

            6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):

            400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                 0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
              4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
        1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
         199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
         402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
        1847783963      instructions              #    1.15  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
         407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
          10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)

           0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )

lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):

           6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
               217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                 1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
             13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
       27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
        1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
       16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
       17755545931      instructions              #    0.64  insn per cycle
                                                  #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
        3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
         117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)

            6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
```

I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.

Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58476

llvm-svn: 354767
2019-02-25 09:36:12 +00:00
Fangrui Song 990061b6d6 Fix file header issues in fuzzers. NFC
llvm-svn: 354551
2019-02-21 07:57:14 +00:00
Hans Wennborg 14e15ec18d Fix the build with gcc/libstdc++ 4.8.2 after r354441
llvm-svn: 354469
2019-02-20 14:50:08 +00:00
Roman Lebedev 69716394f3 [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.

I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

           6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
               172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                 0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
             31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
       26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
        2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
       13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
       19770706799      instructions              #    0.74  insn per cycle
                                                  #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
         121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)

            6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )
```

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):

           6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
               213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                 1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
             31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
       25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
        1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
       13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
       19752995402      instructions              #    0.76  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
         121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)

            6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):

           6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
               133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                 0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
             31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
       25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
        1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
       13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
       19767554347      instructions              #    0.77  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
        4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
         118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)

            6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):

           6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
               194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                 0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
             31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
       24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
        1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
       13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
       18260877653      instructions              #    0.74  insn per cycle
                                                  #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
        4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
         114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)

            6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58355

llvm-svn: 354441
2019-02-20 09:14:04 +00:00
Andrea Di Biagio edbf06a767 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Roman Lebedev bd84b139b0 [llvm-exegesis] Cut run time of analysis mode by another -35% (*sic*) (YamlContext::getRegNo())
Summary:

Together with the previous patch, it's an -90% improvement,
or roughly -96% improvement if you look starting with rL347204

```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs):

           1483.18 msec task-clock                #    0.999 CPUs utilized            ( +-  0.10% )
                68      context-switches          #   46.085 M/sec                    ( +- 22.62% )
                 0      cpu-migrations            #    0.000 K/sec
             11641      page-faults               # 7850.880 M/sec                    ( +-  0.62% )
        5943246799      cycles                    # 4008184.428 GHz                   ( +-  0.10% )  (83.28%)
         442869514      stalled-cycles-frontend   #    7.45% frontend cycles idle     ( +-  0.41% )  (83.29%)
        1443375663      stalled-cycles-backend    #   24.29% backend cycles idle      ( +-  0.47% )  (33.43%)
        7714006752      instructions              #    1.30  insn per cycle
                                                  #    0.19  stalled cycles per insn  ( +-  0.07% )  (50.17%)
        1977242936      branches                  # 1333472193.855 M/sec              ( +-  0.07% )  (66.79%)
          32624220      branch-misses             #    1.65% of all branches          ( +-  0.18% )  (83.34%)

           1.48438 +- 0.00143 seconds time elapsed  ( +-  0.10% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-newer.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-newer.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-newer.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-newer.html' (9 runs):

            963.28 msec task-clock                #    0.999 CPUs utilized            ( +-  0.37% )
                12      context-switches          #   12.695 M/sec                    ( +- 52.79% )
                 0      cpu-migrations            #    0.000 K/sec
             11599      page-faults               # 12046.971 M/sec                   ( +-  0.59% )
        3860122322      cycles                    # 4009359.596 GHz                   ( +-  0.37% )  (83.19%)
         380300669      stalled-cycles-frontend   #    9.85% frontend cycles idle     ( +-  0.34% )  (83.30%)
        1071910340      stalled-cycles-backend    #   27.77% backend cycles idle      ( +-  1.30% )  (33.51%)
        4773418224      instructions              #    1.24  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.15% )  (50.17%)
        1106990316      branches                  # 1149787979.919 M/sec              ( +-  0.11% )  (66.80%)
          23632231      branch-misses             #    2.13% of all branches          ( +-  0.18% )  (83.33%)

           0.96389 +- 0.00356 seconds time elapsed  ( +-  0.37% )
```
```
$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-bew.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-newer.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-old.html
```

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57658

llvm-svn: 353025
2019-02-04 09:12:25 +00:00
Roman Lebedev 5b94fe9623 [llvm-exegesis] Cut run time of analysis mode by -84% (*sic*) (YamlContext::getInstrOpcode())
Summary:
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           9465.46 msec task-clock                #    1.000 CPUs utilized            ( +-  0.05% )
                60      context-switches          #    6.363 M/sec                    ( +- 79.45% )
                 0      cpu-migrations            #    0.000 K/sec
             11364      page-faults               # 1200.697 M/sec                    ( +-  0.60% )
       37935623543      cycles                    # 4008083.912 GHz                   ( +-  0.05% )  (83.32%)
        2371625356      stalled-cycles-frontend   #    6.25% frontend cycles idle     ( +-  0.37% )  (83.32%)
        8476077875      stalled-cycles-backend    #   22.34% backend cycles idle      ( +-  0.18% )  (33.36%)
       41822439158      instructions              #    1.10  insn per cycle
                                                  #    0.20  stalled cycles per insn  ( +-  0.02% )  (50.03%)
       11607658944      branches                  # 1226405861.486 M/sec              ( +-  0.01% )  (66.69%)
         210864633      branch-misses             #    1.82% of all branches          ( +-  0.06% )  (83.34%)

           9.46636 +- 0.00441 seconds time elapsed  ( +-  0.05% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs):

           1480.66 msec task-clock                #    1.000 CPUs utilized            ( +-  0.19% )
                13      context-switches          #    8.483 M/sec                    ( +- 83.10% )
                 0      cpu-migrations            #    0.075 M/sec                    ( +-100.00% )
             11596      page-faults               # 7834.247 M/sec                    ( +-  0.59% )
        5933732194      cycles                    # 4008977.535 GHz                   ( +-  0.19% )  (83.22%)
         438111928      stalled-cycles-frontend   #    7.38% frontend cycles idle     ( +-  0.37% )  (83.25%)
        1454969705      stalled-cycles-backend    #   24.52% backend cycles idle      ( +-  0.94% )  (33.53%)
        7724218604      instructions              #    1.30  insn per cycle
                                                  #    0.19  stalled cycles per insn  ( +-  0.07% )  (50.14%)
        1979796413      branches                  # 1337599858.945 M/sec              ( +-  0.06% )  (66.74%)
          32641638      branch-misses             #    1.65% of all branches          ( +-  0.18% )  (83.31%)

           1.48128 +- 0.00284 seconds time elapsed  ( +-  0.19% )

$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-bew.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-old.html
```

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57657

llvm-svn: 353024
2019-02-04 09:12:21 +00:00
Roman Lebedev 1a0d595f15 [llvm-exegesis] Throughput support in analysis mode
Summary:
D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 | PR37698 ]] added support for measuring of the inverse throughput.
But the support for the analysis was not added.
This attempts to fix that. (analysis done o bdver2 / piledriver)

First, small-scale experiment:
```
$ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'BSF64rr RAX RDX'
  config:          ''
  register_initial_values:
    - 'RDX=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 }
error:           ''
info:            instruction has no tied variables picking Uses different from defs
assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3
...
```
If we plug `bsfq	%r12, %r10` into llvm-mca:
https://godbolt.org/z/ZtOyhJ
```
Dispatch Width:    4
uOps Per Cycle:    3.00
IPC:               0.50
Block RThroughput: 2.0
```
So RThroughput mismatch exists.

Now, let's upscale and analyse:
{F8207148}
`$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`:
{F8207172}
{F8207197}
And if we now look at https://www.agner.org/optimize/instruction_tables.pdf,
`Reciprocal throughput` for `BSF r,r` is listed as `3`.
Yay?

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57647

llvm-svn: 353023
2019-02-04 09:12:17 +00:00
Roman Lebedev dc78bc277d [llvm-exegesis] deserializeMCInst(): bump SmallVector small size up to 16
Summary:
... from 8.
`VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1  i_0x0  i_0x1` instruction already has 9 components.
It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here..

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57654

llvm-svn: 353022
2019-02-04 09:12:13 +00:00
Roman Lebedev 21193f4b7e [llvm-exegesis] Don't default to running&dumping all analyses to '-'
Summary:
Up until the point i have looked in the source, i didn't even understood that
i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
(And hoped it wasn't contributing much of the run time.)

While i expect that it has it's use-cases i never once needed it so far.
If i forget to silence it, console is completely flooded with that output.

How about not expecting users to opt-out of analyses,
but to explicitly specify the analyses that should be performed?

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57648

llvm-svn: 353021
2019-02-04 09:12:08 +00:00
Clement Courbet 362653f7af [llvm-exegesis] Add throughput mode.
Summary:
This just uses the latency benchmark runner on the parallel uops snippet
generator.

Fixes PR37698.

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D57000

llvm-svn: 352632
2019-01-30 16:02:20 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Clement Courbet 176388c973 Revert rL350035 "[llvm-exegesis] Clustering: don't enqueue a point multiple times"
Let's discuss this on the review thread before submitting.

llvm-svn: 350207
2019-01-02 09:21:00 +00:00
Fangrui Song cd93d7ef43 [llvm-exegesis] Clustering: don't enqueue a point multiple times
Summary:
SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead.

Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body.

We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check.

Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process.

Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong.

llvm-svn: 350035
2018-12-23 20:48:52 +00:00
Simon Pilgrim 96408bb04a Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`

We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.

Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54442
........
Patch wasn't approved and breaks buildbots

llvm-svn: 349139
2018-12-14 09:25:08 +00:00
Fangrui Song 92537ccc7e [llvm-exegesis] Optimize ToProcess in dbScan
Summary:
Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`

We also check `Added[P]` to enqueueing a point more than once, which
also saves us a `ClusterIdForPoint_[Q].isUndef()` check.

Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54442

llvm-svn: 349136
2018-12-14 08:27:35 +00:00
Jinsong Ji 56c74cff70 [llvm-exegesis][NFC] Some code style cleanup
Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically:

1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces
2. Add missing header to some files
3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode
4. Fix typo

Differential Revision: https://reviews.llvm.org/D54343

llvm-svn: 347309
2018-11-20 14:41:59 +00:00
Clement Courbet bbab546a71 [llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands().
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54304

llvm-svn: 347209
2018-11-19 14:31:43 +00:00
Roman Lebedev 71fdb57640 [llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors
Summary:
As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known,
it will contain at most Points_.size() minus one (the center of the cluster)

While that is the upper bound, meaning in the most cases, the actual count
will be much smaller, since D54390 made the allocation persistent,
we no longer have to worry about overly-optimistically `reserve()`ing.

Old: (D54393)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       6553.167456      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.21% )
...
            6.5547 +- 0.0134 seconds time elapsed  ( +-  0.20% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       6315.057872      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.24% )
...
            6.3187 +- 0.0160 seconds time elapsed  ( +-  0.25% )
```
And that is another -~4%.


Since this is the last (as of this moment) patch in this patch series,
it is a good time to summarize:
Old: (svn trunk, as stated in D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m24.884s
user    0m24.099s
sys     0m0.785s
```
So these patches, on a given benchmark,
has decreased llvm-exegesis analysis time by 74.62%.

There surely is more room for further improvements.
D54514 may improve thins by -11.5% more (relative to this patch).
Parallelization may improve things further significantly, too.


Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54415

llvm-svn: 347204
2018-11-19 13:28:41 +00:00
Roman Lebedev 8e315b66c2 [llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header
Summary:
Old: (D54390)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7432.421721      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.15% )
...
            7.4336 +- 0.0115 seconds time elapsed  ( +-  0.15% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       6569.936144      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.22% )
...
            6.5711 +- 0.0143 seconds time elapsed  ( +-  0.22% )
```
And another -12%. You'd think it would be `inline`d anyway, but no! :)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54393

llvm-svn: 347203
2018-11-19 13:28:36 +00:00
Roman Lebedev 666d855fbb [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into llvm::SmallVectorImpl& output parameter
Summary:
I do believe this is the correct fix.
We call `rangeQuery()` *very* often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help.

Old: (D54389)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7934.528363      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.19% )
...
            7.9354 +- 0.0148 seconds time elapsed  ( +-  0.19% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7383.793440      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.47% )
...
            7.3868 +- 0.0340 seconds time elapsed  ( +-  0.46% )
```
And another -7%. And that isn't even the good bit yet.

Old:
* calls to allocation functions: 2081419
* temporary allocations: 219658 (10.55%)
* bytes allocated in total (ignoring deallocations): 4.31 GB

New:
* calls to allocation functions: 1880295 (-10%)
* temporary allocations: 18758 (1%) (-91% *sic*)
* bytes allocated in total (ignoring deallocations): 545.15 MB (-88% *sic*)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54390

llvm-svn: 347202
2018-11-19 13:28:31 +00:00
Roman Lebedev 5c5b1ea725 [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace std::vector<> with std::deque<> in llvm::SetVector<>
Summary:
Old: (D54388)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       8606.323981      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.11% )
...
           8.60773 +- 0.00978 seconds time elapsed  ( +-  0.11% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7971.403653      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.14% )
...
            7.9728 +- 0.0113 seconds time elapsed  ( +-  0.14% )
```
Another -~7%.

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, RKSimon

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54389

llvm-svn: 347201
2018-11-19 13:28:26 +00:00
Roman Lebedev 8aecb0c489 [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage.
Summary:
Old: (D54383)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       9098.781978      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.16% )
...
            9.1015 +- 0.0148 seconds time elapsed  ( +-  0.16% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       8553.352480      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.12% )
...
            8.5539 +- 0.0105 seconds time elapsed  ( +-  0.12% )
```
So another -6%.
That is because the `SmallVector` **doubles** it size when reallocating, which is great here,
since we can't `reserve()` since we can't know how many `Neighbors` we will have.

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54388

llvm-svn: 347200
2018-11-19 13:28:22 +00:00
Roman Lebedev b311c1d6b8 [llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54382)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       9024.354355      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.18% )
...
            9.0262 +- 0.0161 seconds time elapsed  ( +-  0.18% )
```
New time:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       8996.541057      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.19% )
...
            9.0045 +- 0.0172 seconds time elapsed  ( +-  0.19% )
```
-~0.3%, not that much. But this isn't the important part.

Old:
* calls to allocation functions: 2109712
* temporary allocations: 33112
* bytes allocated in total (ignoring deallocations): 4.43 GB

New:
* calls to allocation functions: 2095345 (-0.68%)
* temporary allocations: 18745 (-43.39% !!!)
* bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54383

llvm-svn: 347199
2018-11-19 13:28:17 +00:00
Roman Lebedev f8b28e9bf4 [llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m10.487s
user    0m9.745s
sys     0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m9.599s
user    0m8.824s
sys     0m0.772s

```
Not that much, around -9%. But that is not the good part yet, again.

Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB

New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54382

llvm-svn: 347198
2018-11-19 13:28:14 +00:00
Roman Lebedev 0b4b512826 [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<>
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m24.884s
user    0m24.099s
sys     0m0.785s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m10.469s
user    0m9.797s
sys     0m0.672s
```
So -60%. And that isn't the good bit yet.

Old:
* calls to allocation functions: 106560180  (yes, 107 *million* allocations.)
* bytes allocated in total (ignoring deallocations): 12.17 GB

New:
* calls to allocation functions: 3347676  (-96.86%)  (just 3 mil)
* bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less)

---

Two points i want to raise:
* `std::unordered_set<>` should not have been used there in the first place.
  It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options
* There is no tests, so i'm not fully sure this is correct.
  Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok?
* I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc,
  this `llvm::SetVector<>` seems to be best here.

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet

Subscribers: kristina, bobsayshilol, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54381

llvm-svn: 347197
2018-11-19 13:28:09 +00:00
Clement Courbet eee2e06e2a [llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target.
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.

Reviewers: gchatelet, john.brawn, jsji

Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54297

llvm-svn: 346489
2018-11-09 13:15:32 +00:00
Jinsong Ji 5fd3e75478 [PowerPC][llvm-exegesis] Add a PowerPC target
This is patch to add PowerPC target to llvm-exegesis.
The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes.

Differential Revision: https://reviews.llvm.org/D54185

llvm-svn: 346411
2018-11-08 16:51:42 +00:00
Clement Courbet 54c2fa1202 [llvm-exegesis][NFC] Add missing header guard + cosmetics.
Reviewers: gchatelet

Reviewed By: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54252

llvm-svn: 346400
2018-11-08 12:37:56 +00:00
Clement Courbet 0d79aaf1a7 Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes."
This reverts accidental commit rL346394.

llvm-svn: 346398
2018-11-08 12:09:45 +00:00
Clement Courbet c0950ae990 [llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes.
llvm-svn: 346394
2018-11-08 11:45:14 +00:00
Clement Courbet 5b0d783078 [llvm-exegesis] Remove superfluous move.
/Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
    return std::move(Error);
           ^
/Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: note: remove std::move call here
    return std::move(Error);
           ^~~~~~~~~~     ~

llvm-svn: 346333
2018-11-07 16:52:50 +00:00
Clement Courbet c544838f87 [llvm-exegesis] Correclty handle all X86 memory encoding formats.
Summary:
Add unit tests to check the support for each supported format to avoid
regressions such as the one in PR36906.

Reviewers: gchatelet

Subscribers: tschuett, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D54144

llvm-svn: 346330
2018-11-07 16:14:55 +00:00
Clement Courbet 7066769223 [llvm-exegesis] Increasing wrapping limit.
Summary: Fixes PR39097.

Reviewers: gchatelet

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D54151

llvm-svn: 346328
2018-11-07 15:46:45 +00:00
Clement Courbet 003e08ff28 [llvm-exegesis] Ignore X86 pseudo instructions.
Summary: They do not lower to actual MCInsts and have no scheduling info.

Reviewers: gchatelet

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D54147

llvm-svn: 346227
2018-11-06 14:11:58 +00:00
Matthias Braun 3d849f67cb MachineModuleInfo: Store more specific reference to LLVMTargetMachine; NFC
MachineModuleInfo can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.

llvm-svn: 346182
2018-11-05 23:49:13 +00:00
Clement Courbet 4d837fce88 [llvm-exegesis] Fix SNB counter definition and handling.
Summary: SNB is the only one that has P23 as a single proc res.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53766

llvm-svn: 345480
2018-10-28 19:09:14 +00:00
Simon Pilgrim 2a9c728088 Fix MSVC llvm-exegesis build. NFCI.
MSVC is a bit funny about is_pod.....

llvm-svn: 345252
2018-10-25 10:45:38 +00:00
Clement Courbet b4b6ec01c6 [llvm-exegesis] Add missing initializer.
This is a better fix than rL345245.

llvm-svn: 345246
2018-10-25 08:11:35 +00:00
Clement Courbet fa99b36e4d [llvm-exegesis] Fix VC build of r345243.
"const members cannot be default initialized unless their type has a user defined default constructor"

Make members non-const.

llvm-svn: 345245
2018-10-25 08:08:58 +00:00
Clement Courbet 8902c885d6 [llvm-exegesis] Fix warning in r345243.
warning C4099: 'llvm::exegesis::PfmCountersInfo': type name first seen using 'class' now seen using 'struct'

llvm-svn: 345244
2018-10-25 08:06:35 +00:00
Clement Courbet 41c8af3924 [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).

This also compresses the pfm counter tables (PR37068).

Reviewers: RKSimon, gchatelet

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D52932

llvm-svn: 345243
2018-10-25 07:44:01 +00:00
Guillaume Chatelet da11b85606 [llvm-exegesis] Implements a cache of Instruction objects.
llvm-svn: 345130
2018-10-24 11:55:06 +00:00
Fangrui Song a342834b24 [llvm-exegesis] Fix name lookup ambiguity in MSVC after 344922
llvm-svn: 344927
2018-10-22 17:52:31 +00:00
Fangrui Song 32401afd8c [llvm-exegesis] Move namespace exegesis inside llvm::
Summary:
This allows simplifying references of llvm::foo with foo when the needs
come in the future.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: javed.absar, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53455

llvm-svn: 344922
2018-10-22 17:10:47 +00:00
Guillaume Chatelet 18ef4a4a0d [llvm-exegesis] Crash when assembling invalid Operand
llvm-svn: 344907
2018-10-22 15:06:10 +00:00
Guillaume Chatelet 02f70a3fde [llvm-exegesis] Mark x86 segment register instructions as unsupported.
Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53499

llvm-svn: 344906
2018-10-22 14:55:43 +00:00
Guillaume Chatelet 3c639f33b4 [llvm-exegesis] Reject x86 instructions that use non uniform memory accesses
Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53438

llvm-svn: 344905
2018-10-22 14:46:08 +00:00
Clement Courbet 8d0dd0ba0e [llvm-exegesis] Mark second-form X87 instructions as unsupported.
Summary:
We only support the first form because we rely on information that is
only available there.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53430

llvm-svn: 344782
2018-10-19 12:24:49 +00:00
Clement Courbet 22bad0497e [llvm-exegesis] Re-enable liveliness tracker.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53429

llvm-svn: 344780
2018-10-19 12:08:05 +00:00
Clement Courbet c51f45239d [llvm-exegesis] X87 RFP setup code.
Summary:
This was lost during refactoring in rL342644.

Fix and simplify simplify value size handling: always go through a 80 bit value,
because the value can be 1 byte). Add unit tests.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53423

llvm-svn: 344779
2018-10-19 09:56:54 +00:00
Fangrui Song 2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Guillaume Chatelet 6a208e8c5f [llvm-exegesis] Fix off by one error
llvm-svn: 344731
2018-10-18 08:20:50 +00:00
Krasimir Georgiev 11bc3a18e2 [llvm-exegesis] Mark destructor virtual after r344695
This was causing a -Wnon-virtual-dtor warning.

llvm-svn: 344721
2018-10-18 02:06:16 +00:00
Clement Courbet f973c2df9d [llvm-exegesis] Allow measuring several instructions in a single run.
Summary:
We try to recover gracefully on instructions that would crash the
program.

This includes some refactoring of runMeasurement() implementations.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53371

llvm-svn: 344695
2018-10-17 15:04:15 +00:00
Guillaume Chatelet 6f4bc17309 Fix uninitialized variable
llvm-svn: 344692
2018-10-17 12:27:46 +00:00
Guillaume Chatelet 952b121a9c BuildBot fix, compiler complains about array decay to pointer
llvm-svn: 344690
2018-10-17 12:09:21 +00:00
Guillaume Chatelet fcbb6f3c2b [llvm-exegeis] Computing Latency configuration upfront so we can generate many CodeTemplates at once.
Summary: LatencyGenerator now computes all possible mode of serial execution for an Instruction upfront and generates CodeTemplate for the ones that give the best results (e.g. no need to generate a two instructions snippet when repeating a single one would do). The next step is to generate even more configurations for cases (e.g. for XOR we should generate "XOR EAX, EAX, EAX" and "XOR EAX, EAX, EBX")

Reviewers: courbet

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53320

llvm-svn: 344689
2018-10-17 11:37:28 +00:00
Guillaume Chatelet a3849490b1 [llvm-exegesis] Fix missing std::move.
llvm-svn: 344496
2018-10-15 09:21:21 +00:00
Guillaume Chatelet 296a862cbe [llvm-exegesis][NFC] Return many CodeTemplates instead of one.
Summary: This is part one of the change where I simply changed the signature of the functions. More work need to be done to actually produce more than one CodeTemplate per instruction.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53209

llvm-svn: 344493
2018-10-15 09:09:19 +00:00
Guillaume Chatelet 946fb0517a [llvm-exegesis][NFC] Simplify code at the cost of small code duplication
Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53198

llvm-svn: 344351
2018-10-12 15:12:22 +00:00
Guillaume Chatelet f33b60258e [llvm-exegesis] Fix always true assert
llvm-svn: 344151
2018-10-10 16:16:43 +00:00