Commit Graph

19 Commits

Author SHA1 Message Date
Andrea Di Biagio 7aa0dbb664 [MCA] Slightly refactor the logic in ResourceManager. NFCI
This patch slightly changes the API in the attempt to simplify resource buffer
queries. It is done in preparation for a patch that will enable support for
macro fusion.

llvm-svn: 368994
2019-08-15 12:39:55 +00:00
Andrea Di Biagio 47db08dbb1 [MCA] Further refactor the bottleneck analysis view. NFCI.
llvm-svn: 362933
2019-06-10 12:50:08 +00:00
Andrea Di Biagio 6a989c358c [MCA][Scheduler] Change how memory instructions are dispatched to the pending set. NFCI
llvm-svn: 362302
2019-06-01 15:22:37 +00:00
Andrea Di Biagio 280ac1fd1d [MCA] Refactor class LSUnit. NFCI
This should be the last bit of refactoring in preparation for a patch that would
finally fix PR37494.

This patch introduces the concept of memory dependency groups (class
MemoryGroup) and "Load/Store Unit token" (LSUToken) to track the status of a
memory operation.

A MemoryGroup is a node of a memory dependency graph. It is used internally to
classify memory operations based on the memory operations they depend on.  Let I
and J be two memory operations, we say that I and J equivalent (for the purpose
of mapping instructions to memory dependency groups) if the set of memory
operations they depend depend on is identical.

MemoryGroups are identified by so-called LSUToken (a unique group identifier
assigned by the LSUnit to every group). When an instruction I is dispatched to
the LSUnit, the LSUnit maps I to a group, and then returns a LSUToken.
LSUTokens are used by class Scheduler to track memory dependencies.

This patch simplifies the LSUnit interface and moves most of the implementation
details to its base class (LSUnitBase). There is no user visible change to the
output.

llvm-svn: 361950
2019-05-29 11:38:27 +00:00
Andrea Di Biagio c2493ce4a4 [MCA][Scheduler] Improved critical memory dependency computation.
This fixes a problem where back-pressure increases caused by register
dependencies were not correctly notified if execution was also delayed by memory
dependencies.

llvm-svn: 361740
2019-05-26 19:50:31 +00:00
Andrea Di Biagio a549dd2560 [MCA] Refactor the logic that computes the critical memory dependency info. NFCI
CriticalRegDep has been renamed CriticalDependency, and it is now used by class
Instruction to store information about the critical register dependency and the
critical memory dependency. No functional change intendend.

llvm-svn: 361737
2019-05-26 18:41:35 +00:00
Andrea Di Biagio dd0d9e01ee [MCA] Introduce class LSUnitBase and let LSUnit derive from it.
Class LSUnitBase provides a abstract interface for all the concrete LS units in
llvm-mca.

Methods exposed by the public abstract LSUnitBase interface are:
 - Status isAvailable(const InstRef&);
 - void dispatch(const InstRef &);
 - const InstRef &isReady(const InstRef &);

LSUnitBase standardises the API, but not the data structures internally used by
LS units. This allows for more flexibility.
Previously, only method `isReady()` was declared virtual by class LSUnit.
Also, derived classes had to inherit all the internal data members of LSUnit.

No functional change intended.

llvm-svn: 361496
2019-05-23 13:42:47 +00:00
Andrea Di Biagio 0460a3629b [MCA] Notify event listeners when instructions transition to the Pending state. NFCI
llvm-svn: 359983
2019-05-05 16:07:27 +00:00
Andrea Di Biagio be3281a281 [MCA] Highlight kernel bottlenecks in the summary view.
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.

MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks.  The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).

From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle.  Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.

This patch teaches how to identify situations where backend pressure increases
due to:
 - unavailable pipeline resources.
 - data dependencies.

Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.

Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.

An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.

ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.

Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.

The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".

Example of bottleneck report:

```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
  Resource Pressure       [ 0.00% ]
  Data Dependencies:      [ 99.89% ]
   - Register Dependencies [ 0.00% ]
   - Memory Dependencies   [ 99.89% ]
```

A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.

About the time complexity:

Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.

The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.

We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.

This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494

Differential Revision: https://reviews.llvm.org/D58728

llvm-svn: 355308
2019-03-04 11:52:34 +00:00
Andrea Di Biagio c032e2ab7c [MCA] Always check if scheduler resources are unavailable when reporting dispatch stalls.
Dispatch stall cycles may be associated to multiple dispatch stall events.
Before this patch, each stall cycle was associated with a single stall event.
This patch also improves a couple of code comments, and adds a helper method to
query the Scheduler for dispatch stalls.

llvm-svn: 354877
2019-02-26 14:19:00 +00:00
Andrea Di Biagio 3316eb5bb8 [MCA][Scheduler] Collect resource pressure and memory dependency bottlenecks.
Every cycle, the Scheduler checks if instructions in the ReadySet can be issued
to the underlying pipelines. If an instruction cannot be issued because one or
more pipeline resources are unavailable, then field
Instruction::CriticalResourceMask is updated with the resource identifier of the
unavailable resources.

If an instruction cannot be promoted from the PendingSet to the ReadySet because
of a memory dependency, then field Instruction::CriticalMemDep is updated with
the identifier of the dependending memory instruction.

Bottleneck information is collected after every cycle for instructions that are
waiting to execute. The idea is to help identify causes of bottlenecks; this
information can be used in future to implement a bottleneck analysis.

llvm-svn: 354490
2019-02-20 18:01:49 +00:00
Andrea Di Biagio 7a950ed587 [MCA] Slightly refactor method writeStartEvent in WriteState and ReadState. NFCI
This is another change in preparation for PR37494.
No functional change intended.

llvm-svn: 354261
2019-02-18 11:27:11 +00:00
Andrea Di Biagio 5ad52e35a8 [MCA][LSUnit] Return the ID of the dependent memory operation from method
isReady(). NFCI

This is yet another change in preparation for a fix for PR37494.

llvm-svn: 354150
2019-02-15 18:05:59 +00:00
Andrea Di Biagio 318f990aee [MCA][Scheduler] Use latency information to further classify busy instructions.
This patch introduces a new instruction stage named 'IS_PENDING'.
An instruction transitions from the IS_DISPATCHED to the IS_PENDING stage if
input registers are not available, but their latency is known.

This patch also adds a new set of instructions named 'PendingSet' to class
Scheduler. The idea is that the PendingSet will only contain instructions that
have reached the IS_PENDING stage.
By construction, an instruction in the PendingSet is only dependent on
instructions that have already reached the execution stage. The plan is to use
this knowledge to identify bottlenecks caused by data dependencies (see
PR37494).

Differential Revision: https://reviews.llvm.org/D58066

llvm-svn: 353937
2019-02-13 11:02:42 +00:00
Andrea Di Biagio 23ff2aa47c [MCA][Scheduler] Track resources that were found busy when issuing an instruction.
This is a follow up of r353706. When the scheduler fails to issue a ready
instruction to the underlying pipelines, it now updates a mask of 'busy resource
units'. That information will be used in future to obtain the set of
"problematic" resources in the case of bottlenecks caused by resource pressure.
No functional change intended.

llvm-svn: 353728
2019-02-11 17:55:47 +00:00
Andrea Di Biagio 83e68854d5 [MCA] Return a mask of busy resources from method ResourceManager::checkAvailability(). NFCI
In case of bottlenecks caused by pipeline pressure, we want to be able to
correctly report the set of problematic pipelines. This is a first step towards
adding support for bottleneck hints in llvm-mca (see PR37494). No functional
change intended.

llvm-svn: 353706
2019-02-11 14:53:04 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Andrea Di Biagio 3f4b54850f [MCA] Improved handling of in-order issue/dispatch resources.
Added field 'MustIssueImmediately' to the instruction descriptor of instructions
that only consume in-order issue/dispatch processor resources.
This speeds up queries from the hardware Scheduler, and gives an average ~5%
speedup on a release build.

No functional change intended.

llvm-svn: 350397
2019-01-04 15:08:38 +00:00
Clement Courbet cc5e6a72de [llvm-mca] Move llvm-mca library to llvm/lib/MCA.
Summary: See PR38731.

Reviewers: andreadb

Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D55557

llvm-svn: 349332
2018-12-17 08:08:31 +00:00