Summary:
As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219,
i believe it may be somewhat useful to show //some// aggregates
over all the sea of statistics provided.
Example:
```
Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage
[0] [1] [2] [3]
0. 3 1.0 1.0 4.7 vmulps %xmm0, %xmm1, %xmm2
1. 3 2.7 0.0 2.3 vhaddps %xmm2, %xmm2, %xmm3
2. 3 6.0 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4
3 3.2 0.3 2.3 <total>
```
I.e. we average the averages.
Reviewers: andreadb, mattd, RKSimon
Reviewed By: andreadb
Subscribers: gbedwell, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68714
llvm-svn: 374361
Flag -show-encoding enables the printing of instruction encodings as part of the
the instruction info view.
Example (with flags -mtriple=x86_64-- -mcpu=btver2):
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[7]: Encoding Size
[1] [2] [3] [4] [5] [6] [7] Encodings: Instructions:
1 2 1.00 4 c5 f0 59 d0 vmulps %xmm0, %xmm1, %xmm2
1 4 1.00 4 c5 eb 7c da vhaddps %xmm2, %xmm2, %xmm3
1 4 1.00 4 c5 e3 7c e3 vhaddps %xmm3, %xmm3, %xmm4
In this example, column Encoding Size is the size in bytes of the instruction
encoding. Column Encodings reports the actual instruction encodings as byte
sequences in hex (objdump style).
The computation of encodings is done by a utility class named mca::CodeEmitter.
In future, I plan to expose the CodeEmitter to the instruction builder, so that
information about instruction encoding sizes can be used by the simulator. That
would be a first step towards simulating the throughput from the decoders in the
hardware frontend.
Differential Revision: https://reviews.llvm.org/D65948
llvm-svn: 368432
This patch adds a new llvm-mca flag named -print-imm-hex.
By default, the instruction printer prints immediate operands as decimals. Flag
-print-imm-hex enables the instruction printer to print those operands in hex.
This patch also adds support for MASM binary and hex literal numbers (example
0FFh, 101b).
Added tests to verify the behavior of the new flag. Tests also verify that masm
numeric literal operands are now recognized.
Differential Revision: https://reviews.llvm.org/D65588
llvm-svn: 367671
Sphinx allows for definitions of command-line options using
`.. option <name>` and references to those options via `:option:<name>`.
However, it looks like there is no scoping of these options by default,
meaning that links can end up pointing to incorrect documents. See for
example the llvm-mca document, which contains references to -o that,
prior to this patch, pointed to a different document. What's worse is
that these links appear to be non-deterministic in which one is picked
(on my machine, some references end up pointing to opt, whereas on the
live docs, they point to llvm-dwarfdump, for example).
The fix is to add the .. program <name> tag. This essentially namespaces
the options (definitions and references) to the named program, ensuring
that the links are kept correct.
Reviwed by: andreadb
Differential Revision: https://reviews.llvm.org/D63873
llvm-svn: 364538
This patch fixes PR41523
https://bugs.llvm.org/show_bug.cgi?id=41523
Regions can now nest/overlap provided that they have different names.
Anonymous regions cannot overlap.
Region end markers must specify the region name. The only exception is for when
there is only one user-defined region; in that particular case, the region end
marker doesn't need to specify a name.
Incorrect region end markers are no longer ignored. Instead, the tool reports an
error and we exit with an error code.
Added test cases to verify the new diagnostic error messages.
Updated the llvm-mca docs to reflect this feature change.
Differential Revision: https://reviews.llvm.org/D61676
llvm-svn: 360351
It makes more sense to print out the number of micro opcodes that are issued
every cycle rather than the number of instructions issued per cycle.
This behavior is also consistent with the dispatch-stats: numbers from the two
views can now be easily compared.
llvm-svn: 357919
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.
MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks. The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).
From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle. Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.
This patch teaches how to identify situations where backend pressure increases
due to:
- unavailable pipeline resources.
- data dependencies.
Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.
Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.
An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.
ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.
Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.
The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".
Example of bottleneck report:
```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
Resource Pressure [ 0.00% ]
Data Dependencies: [ 99.89% ]
- Register Dependencies [ 0.00% ]
- Memory Dependencies [ 99.89% ]
```
A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.
About the time complexity:
Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.
The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.
We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.
This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494
Differential Revision: https://reviews.llvm.org/D58728
llvm-svn: 355308
RetireControlUnitStatistics now reports extra information about the ROB and the
avg/maximum number of entries consumed over the entire simulation.
Example:
Retire Control Unit - number of cycles where we saw N instructions retired:
[# retired], [# cycles]
0, 109 (17.9%)
1, 102 (16.7%)
2, 399 (65.4%)
Total ROB Entries: 64
Max Used ROB Entries: 35 ( 54.7% )
Average Used ROB Entries per cy: 32 ( 50.0% )
Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to
reflect this change.
llvm-svn: 347493
This patch introduces the following changes to the DispatchStatistics view:
* DispatchStatistics now reports the number of dispatched opcodes instead of
the number of dispatched instructions.
* The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of
stall cycles against the total simulated cycles.
This change allows users to easily compare dispatch group sizes with the
processor DispatchWidth.
Before this change, it was difficult to correlate the two numbers, since
DispatchStatistics view reported numbers of instructions (instead of opcodes).
DispatchWidth defines the maximum size of a dispatch group in terms of number of
micro opcodes.
The other change introduced by this patch is related to how DispatchStage
generates "instruction dispatch" events.
In particular:
* There can be multiple dispatch events associated with a same instruction
* Each dispatch event now encapsulates the number of dispatched micro opcodes.
The number of micro opcodes declared by an instruction may exceed the processor
DispatchWidth. Therefore, we cannot assume that instructions are always fully
dispatched in a single cycle.
DispatchStage knows already how to handle instructions declaring a number of
opcodes bigger that DispatchWidth. However, DispatchStage always emitted a
single instruction dispatch event (during the first simulated dispatch cycle)
for instructions dispatched.
With this patch, DispatchStage now correctly notifies multiple dispatch events
for instructions that cannot be dispatched in a single cycle.
A few views had to be modified. Views can no longer assume that there can only
be one dispatch event per instruction.
Tests (and docs) have been updated.
Differential Revision: https://reviews.llvm.org/D51430
llvm-svn: 341055
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.
Example:
```
Iterations: 100
Instructions: 300
Total Cycles: 414
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.69
IPC: 0.72
Block RThroughput: 4.0
```
This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.
llvm-svn: 340946
Before this patch, the SchedulerStatistics only printed the maximum number of
buffer entries consumed in each scheduler's queue at a given point of the
simulation.
This patch restructures the reported table, and adds an extra field named
"Average number of used buffer entries" to it.
This patch also uses different colors to help identifying bottlenecks caused by
high scheduler's buffer pressure.
llvm-svn: 340746
This patch is a follow-up to r338702.
We don't need to use a map to model the wait/ready/issued sets. It is much more
efficient to use a vector instead.
This patch gives us an average 7.5% speedup (on top of the ~12% speedup obtained
after r338702).
llvm-svn: 338883
This patch replaces all the remaining occurrences of string "MCA" with
":program:`llvm-mca`". Somehow I missed those strings when I committed r338394.
This patch also improves section "Instruction Dispatch".
llvm-svn: 338881
Summary:
This patch mostly copies the existing Instruction Flow, and stage descriptions
from the mca README. I made a few text tweaks, but no semantic changes,
and made reference to the "default pipeline." I also removed the internals
references (e.g., reference to class names and header files). I did leave the
LSUnit name around, but only as an abbreviated word for the load-store unit.
Reviewers: andreadb, courbet, RKSimon, gbedwell, filcab
Reviewed By: andreadb
Subscribers: tschuett, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D49692
llvm-svn: 338319
Summary: The original text was lifted from the MCA README. I re-ran the dot-product example and updated the output seen in the docs. I also added a few paragraphs discussing the instruction issued and retired histograms, as well as discussing the register file stats.
Reviewers: andreadb, RKSimon, courbet, gbedwell, filcab
Reviewed By: andreadb
Subscribers: tschuett
Differential Revision: https://reviews.llvm.org/D49614
llvm-svn: 337648
For the most part, these changes were from the RFC. I made a few minor
word/structure changes, but nothing significant. I also regenerated the
example output, and adjusted the text accordingly.
Differential Revision: https://reviews.llvm.org/D49527
llvm-svn: 337496
We're going to work on this in a separate review focusing more on documenting
the View and probably removing some of the less-interesting/less-useful pieces.
This reverts r337219,337225
llvm-svn: 337295
This patch introduces a brief description of the components of MCA. The main
focus is on Views. This is a work in progress, and more descriptions will be
introduced later. I want to flesh-out the Views section more and provide a
detailed description of eventing in MCA. Eventually a brief code example of a
View should accompany the description.
Also, we should consider moving the MCA internals guide elsewhere at some point.
llvm-svn: 337219
This is copied from Andrea's text in PR36875:
https://bugs.llvm.org/show_bug.cgi?id=36875
As noted there, this is a hack...but it's a good one!
It's important to show potential workflows up-front
with examples, so customers can copy and experiment
with them.
llvm-svn: 329726
This patch moves the logic that collects and analyzes dispatch events to the
DispatchStatistics view.
Added flag -dispatch-stats to print statistics related to the dispatch logic.
llvm-svn: 329708
This patch teaches llvm-mca how to parse code comments in search for special
"markers" used to select regions of code.
Example:
# LLVM-MCA-BEGIN My Code Region
....
# LLVM-MCA-END
The MCAsmLexer now delegates to an object of class MCACommentParser (i.e. an
AsmCommentConsumer) the parsing of code comments to search for begin/end code
region markers.
A comment starting with substring "LLVM-MCA-BEGIN" marks the beginning of a new
region of code. A comment starting with substring "LLVM-MCA-END" marks the end
of the last region.
This implementation doesn't allow regions to overlap. Each region can have a
optional description; internally, each region is identified by a range of source
code locations (SMLoc).
MCInst objects are added to a region R only if the source location for the
MCInst is in the range of locations specified by R.
By default, the tool allocates an implicit "Default" code region which contains
every source location. See new tests llvm-mca-marker-*.s for a few examples.
A new Backend object is created for every region. So, the analysis is conducted
on every parsed code region. The final report is the union of the reports
generated for every code region. Note that empty regions are skipped.
Special "[#] Code Region - ..." strings are used in the report to mark the
portion which is specific to a code region only. For example, see
llvm-mca-markers-5.s.
Differential Revision: https://reviews.llvm.org/D45433
llvm-svn: 329590
Scheduling models can now describe processor register files and retire control
units. This updates the existing documentation and the README file.
llvm-svn: 329311
This is done in preparation for D45259.
With D45259, models can specify the size of the reorder buffer, and the retire
throughput directly via tablegen.
llvm-svn: 329274
Before this patch, the "BackendStatistics" view was responsible for printing the
register file usage (as well as many other statistics).
Now users can enable register file usage statistics using the command line flag
`-register-file-stats`. By default, the tool doesn't print register file
statistics.
llvm-svn: 329083
Document that flag -resource-pressure can be used to enable/disable the resource
pressure view. This change should have been part of r328305.
llvm-svn: 328492
The goal of this patch is to address most of PR36874. To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.
The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".
We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.
At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view. That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.
Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.
Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.
Differential Revision: https://reviews.llvm.org/D44839
llvm-svn: 328487
llvm-mca is an LLVM based performance analysis tool that can be used to
statically measure the performance of code, and to help triage potential
problems with target scheduling models.
llvm-mca uses information which is already available in LLVM (e.g. scheduling
models) to statically measure the performance of machine code in a specific cpu.
Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.
The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.
Given an assembly code sequence, llvm-mca estimates the IPC (instructions per
cycle), as well as hardware resources pressure. The analysis and reporting style
were mostly inspired by the IACA tool from Intel.
This patch is related to the RFC on llvm-dev visible at this link:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html
Differential Revision: https://reviews.llvm.org/D43951
llvm-svn: 326998