This fixes a problem introduced by r344334. A write from a non-zero move
eliminated at register renaming stage was not correctly handled by the PRF. This
would have led to an assertion failure if the processor model declares a PRF
that enables non-zero move elimination.
llvm-svn: 344392
This patch adds the ability to identify instructions that are "move elimination
candidates". It also allows scheduling models to describe processor register
files that allow move elimination.
A move elimination candidate is an instruction that can be eliminated at
register renaming stage.
Each subtarget can specify which instructions are move elimination candidates
with the help of tablegen class "IsOptimizableRegisterMove" (see
llvm/Target/TargetInstrPredicate.td).
For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated.
The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this:
```
def : IsOptimizableRegisterMove<[
InstructionEquivalenceClass<[
// GPR variants.
MOV32rr, MOV64rr,
// MMX variants.
MMX_MOVQ64rr,
// SSE variants.
MOVAPSrr, MOVUPSrr,
MOVAPDrr, MOVUPDrr,
MOVDQArr, MOVDQUrr,
// AVX variants.
VMOVAPSrr, VMOVUPSrr,
VMOVAPDrr, VMOVUPDrr,
VMOVDQArr, VMOVDQUrr
], CheckNot<CheckSameRegOperand<0, 1>> >
]>;
```
Definitions of IsOptimizableRegisterMove from processor models of a same
Target are processed by the SubtargetEmitter to auto-generate a target-specific
override for each of the following predicate methods:
```
bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI)
const;
bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned
CPUID) const;
```
By default, those methods return false (i.e. conservatively assume that there
are no move elimination candidates).
Tablegen class RegisterFile has been extended with the following information:
- The set of register classes that allow move elimination.
- Maxium number of moves that can be eliminated every cycle.
- Whether move elimination is restricted to moves from registers that are
known to be zero.
This patch is structured in three part:
A first part (which is mostly boilerplate) adds the new
'isOptimizableRegisterMove' target hooks, and extends existing register file
descriptors in MC by introducing new fields to describe properties related to
move elimination.
A second part, uses the new tablegen constructs to describe move elimination in
the BtVer2 scheduling model.
A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove'
hook to mark instructions that are candidates for move elimination. It also
teaches class RegisterFile how to describe constraints on move elimination at
PRF granularity.
llvm-mca tests for btver2 show differences before/after this patch.
Differential Revision: https://reviews.llvm.org/D53134
llvm-svn: 344334
Flag 'AllowZeroMoveEliminationOnly' should have been a property of the PRF, and
not set at register granularity.
This change also restricts move elimination to writes that update a full
physical register. We assume that there is a strong correlation between
logical registers that allow move elimination, and how those same registers are
allocated to physical registers by the register renamer.
This is still a no functional change, because this experimental code path is
disabled for now. This is done in preparation for another patch that will add
the ability to describe how move elimination works in scheduling models.
llvm-svn: 343787
This should help with catching inconsistent definitions of instructions with
zero opcodes, which also declare to consume scheduler/pipeline resources.
llvm-svn: 343766
This patch teaches class RegisterFile how to analyze register writes from
instructions that are move elimination candidates.
In particular, it teaches it how to check if a move can be effectively eliminated
by the underlying PRF, and (if necessary) how to perform move elimination.
The long term goal is to allow processor models to describe instructions that
are valid move elimination candidates.
The idea is to let register file definitions in tablegen declare if/when moves
can be eliminated.
This patch is a non functional change.
The logic that performs move elimination is currently disabled. A future patch
will add support for move elimination in the processor models, and enable this
new code path.
llvm-svn: 343691
Summary:
This is redundant, as FetchStage::getNextInstruction already checks this
and returns llvm::ErrorSuccess() as appropriate.
NFC.
Reviewers: andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D52642
llvm-svn: 343555
This change is in preparation for a future work on improving support for
optimizable register moves. We already know if a write is from a zero-idiom, so
we can propagate that bit of information to the PRF. We use an APInt mask to
identify registers that are set to zero.
llvm-svn: 343307
Summary:
There isn't any actual dependency - there's one #include from CodeGen
but nothing from the header is actually used.
With this change we can use the MCA library from CodeGen without
circular dependencies (e.g. for scheduling).
Reviewers: andreadb
Reviewed By: andreadb
Authored By: orodley
Subscribers: mgorny, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D52288
llvm-svn: 342706
This patch adds the ability for processor models to describe dependency breaking
instructions.
Different processors may specify a different set of dependency-breaking
instructions.
That means, we cannot assume that all processors of the same target would use
the same rules to classify dependency breaking instructions.
The main goal of this patch is to provide the means to describe dependency
breaking instructions directly via tablegen, and have the following
TargetSubtargetInfo hooks redefined in overrides by tabegen'd
XXXGenSubtargetInfo classes (here, XXX is a Target name).
```
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
return false;
}
virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
return isZeroIdiom(MI);
}
```
An instruction MI is a dependency-breaking instruction if a call to method
isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to
true. Similarly, an instruction MI is a special case of zero-idiom dependency
breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine
operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of
dependency-breaking. In the absence of external information, those method calls
would always return false.
A new tablegen class named STIPredicate has been added by this patch to let
processor models classify instructions that have properties in common. The idea
is that, a MCInstrPredicate definition can be used to "generate" an instruction
equivalence class, with the idea that instructions of a same class all have a
property in common.
STIPredicate definitions are essentially a collection of instruction equivalence
classes.
Also, different processor models can specify a different variant of the same
STIPredicate with different rules (i.e. predicates) to classify instructions.
Tablegen backends (in this particular case, the SubtargetEmitter) will be able
to process STIPredicate definitions, and automatically generate functions in
XXXGenSubtargetInfo.
This patch introduces two special kind of STIPredicate classes named
IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a
definition for those in the BtVer2 scheduling model only.
This patch supersedes the one committed at r338372 (phabricator review: D49310).
The main advantages are:
- We can describe subtarget predicates via tablegen using STIPredicates.
- We can describe zero-idioms / dep-breaking instructions directly via
tablegen in the scheduling models.
In future, the STIPredicates framework can be used for solving other problems.
Examples of future developments are:
- Teach how to identify optimizable register-register moves
- Teach how to identify slow LEA instructions (each subtarget defining its own
concept of "slow" LEA).
- Teach how to identify instructions that have undocumented false dependencies
on the output registers on some processors only.
It is also (in my opinion) an elegant way to expose knowledge to both external
tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when
internally constructing the data dependency graph for a code region.
This new design feature is also an "opt-in" feature. Processor models don't have
to use the new STIPredicates. It has all been designed to be as unintrusive as
possible.
Differential Revision: https://reviews.llvm.org/D52174
llvm-svn: 342555
This patch adds two new boolean fields:
- Field `ReadState::IndependentFromDef`.
- Field `WriteState::WritesZero`.
Field `IndependentFromDef` is set for ReadState objects associated with
dependency-breaking instructions. It is used by the simulator when updating data
dependencies between registers.
Field `WritesZero` is set by WriteState objects associated with dependency
breaking zero-idiom instructions. It helps the PRF identify which writes don't
consume any physical registers.
llvm-svn: 342483
Summary:
This patch removes the storing of accumulated floating point data
within the llvm-mca library.
This patch splits-up the two quantities: cycles and number of resource units.
By splitting-up these two quantities, we delay the calculation of "cycles per resource unit"
until that value is read, reducing the chance of accumulating floating point error.
I considered using the APFloat, but after measuring performance, for a large (many iteration)
sample, I decided to go with this faster solution.
Reviewers: andreadb, courbet, RKSimon
Reviewed By: andreadb
Subscribers: llvm-commits, javed.absar, tschuett, gbedwell
Differential Revision: https://reviews.llvm.org/D51903
llvm-svn: 341980
This patch introduces the following changes to the DispatchStatistics view:
* DispatchStatistics now reports the number of dispatched opcodes instead of
the number of dispatched instructions.
* The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of
stall cycles against the total simulated cycles.
This change allows users to easily compare dispatch group sizes with the
processor DispatchWidth.
Before this change, it was difficult to correlate the two numbers, since
DispatchStatistics view reported numbers of instructions (instead of opcodes).
DispatchWidth defines the maximum size of a dispatch group in terms of number of
micro opcodes.
The other change introduced by this patch is related to how DispatchStage
generates "instruction dispatch" events.
In particular:
* There can be multiple dispatch events associated with a same instruction
* Each dispatch event now encapsulates the number of dispatched micro opcodes.
The number of micro opcodes declared by an instruction may exceed the processor
DispatchWidth. Therefore, we cannot assume that instructions are always fully
dispatched in a single cycle.
DispatchStage knows already how to handle instructions declaring a number of
opcodes bigger that DispatchWidth. However, DispatchStage always emitted a
single instruction dispatch event (during the first simulated dispatch cycle)
for instructions dispatched.
With this patch, DispatchStage now correctly notifies multiple dispatch events
for instructions that cannot be dispatched in a single cycle.
A few views had to be modified. Views can no longer assume that there can only
be one dispatch event per instruction.
Tests (and docs) have been updated.
Differential Revision: https://reviews.llvm.org/D51430
llvm-svn: 341055
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.
Example:
```
Iterations: 100
Instructions: 300
Total Cycles: 414
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.69
IPC: 0.72
Block RThroughput: 4.0
```
This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.
llvm-svn: 340946
This patch also uses colors to highlight problematic wait-time entries.
A problematic entry is an entry with an high wait time that tends to match (or
exceed) the size of the scheduler's buffer.
Color RED is used if an instruction had to wait an average number of cycles
which is bigger than (or equal to) the size of the underlying scheduler's
buffer.
Color YELLOW is used if the time (in cycles) spend waiting for the
operands or pipeline resources is bigger than half the size of the underlying
scheduler's buffer.
Color MAGENTA is used if an instruction does not consume buffer resources
according to the scheduling model.
llvm-svn: 340825
Summary:
This patch introduces llvm-mca as a library. The driver (llvm-mca.cpp), views, and stats, are not part of the library.
Those are separate components that are not required for the functioning of llvm-mca.
The directory has been organized as follows:
All library source files now reside in:
- `lib/HardwareUnits/` - All subclasses of HardwareUnit (these represent the simulated hardware components of a backend).
(LSUnit does not inherit from HardwareUnit, but Scheduler does which uses LSUnit).
- `lib/Stages/` - All subclasses of the pipeline stages.
- `lib/` - This is the root of the library and contains library code that does not fit into the Stages or HardwareUnit subdirs.
All library header files now reside in the `include` directory and mimic the same layout as the `lib` directory mentioned above.
In the (near) future we would like to move the library (include and lib) contents from tools and into the core of llvm somewhere.
That change would allow various analysis and optimization passes to make use of MCA functionality for things like cost modeling.
I left all of the non-library code just where it has always been, in the root of the llvm-mca directory.
The include directives for the non-library source file have been updated to refer to the llvm-mca library headers.
I updated the llvm-mca/CMakeLists.txt file to include the library headers, but I made the non-library code
explicitly reference the library's 'include' directory. Once we eventually (hopefully) migrate the MCA library
components into llvm the include directives used by the non-library source files will be updated to point to the
proper location in llvm.
Reviewers: andreadb, courbet, RKSimon
Reviewed By: andreadb
Subscribers: mgorny, javed.absar, tschuett, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D50929
llvm-svn: 340755
Before this patch, the SchedulerStatistics only printed the maximum number of
buffer entries consumed in each scheduler's queue at a given point of the
simulation.
This patch restructures the reported table, and adds an extra field named
"Average number of used buffer entries" to it.
This patch also uses different colors to help identifying bottlenecks caused by
high scheduler's buffer pressure.
llvm-svn: 340746
Choosing to revert the change and do it again, hopefully preserving the history
of the changes by using svn copy instead of simply creating a new file from the
contents within Scheduler.
llvm-svn: 340661
Thanks to @waltl for reporting this issue.
I have also added an assert to check for invalid null strategy objects, and I
have reworded a couple of code comments in Scheduler.h
llvm-svn: 340545
With this patch, users can now customize the pipeline selection strategy for
scheduler resources. The resource selection strategy can be defined at processor
resource granularity. This enables the definition of different strategies for
different hardware schedulers.
To override the strategy associated with a processor resource, users can call
method ResourceManager::setCustomStrategy(), and pass a 'ResourceStrategy'
object in input.
Class ResourceStrategy is an abstract class which declares virtual method
`ResourceStrategy::select()`. Method select() is meant to implement the actual
strategy; it is responsible for picking the next best resource from a set of
available pipeline resources. Custom strategy must simply override that method.
By default, processor resources are associated with instances of
'DefaultResourceStrategy'. A 'DefaultResourceStrategy' internally implements a
simple round-robin selector. For more details, please refer to the code comments
in Scheduler.h.
llvm-svn: 340536