This patch removes a (potentially) slow while loop in
DefaultResourceStrategy::select(). A better (and faster) approach is to do some
bit manipulation in order to shrink the range of candidate resources.
On a release build, this change gives an average speedup of ~10%.
llvm-svn: 348007
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.
A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues. Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`. Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).
At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.
With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".
About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage. This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.
Differential Revision: https://reviews.llvm.org/D54957
llvm-svn: 347857
This reapplies r347767 (originally reviewed at: https://reviews.llvm.org/D55000)
with a fix for the missing std::move of the Error returned by the call to
Pipeline::runCycle().
Below is the original commit message from r347767.
If a user only cares about the overall latency, then the best/quickest way is to
change method Pipeline::run() so that it returns the total number of cycles to
the caller.
When the simulation pipeline is run, the number of cycles (or an error) is
returned from method Pipeline::run().
The advantage is that no hardware event listener is needed for computing that
latency. So, the whole process should be faster (and simpler - at least for that
particular use case).
llvm-svn: 347795
If a user only cares about the overall latency, then the best/quickest way is to
change method Pipeline::run() so that it returns the total number of cycles to
the caller.
When the simulation pipeline is run, the number of cycles (or an error) is
returned from method Pipeline::run().
The advantage is that no hardware event listener is needed for computing that
latency. So, the whole process should be faster (and simpler - at least for that
particular use case).
llvm-svn: 347767
By default, llvm-mca conservatively assumes that a register operand from the
variadic sequence is both a register read and a register write. That is because
MCInstrDesc doesn't describe extra variadic operands; we don't have enough
dataflow information to tell which register operands from the variadic sequence
is a definition, and which is a use instead.
However, if a variadic instruction is flagged 'mayStore' (but not 'mayLoad'),
and it has no 'unmodeledSideEffects', then llvm-mca (very) optimistically
assumes that any register operand in the variadic sequence is a register read
only. Conversely, if a variadic instruction is marked as 'mayLoad' (but not
'mayStore'), and it has no 'unmodeledSideEffects', then llvm-mca optimistically
assumes that any extra register operand is a register definition only.
These assumptions work quite well for variadic load/store multiple instructions
defined by the ARM backend.
llvm-svn: 347522
With this change, InstrBuilder emits an error if the MCInst sequence contains an
instruction with a variadic opcode, and a non-zero number of variadic operands.
Currently we don't know how to correctly analyze variadic opcodes. The problem
with variadic operands is that there is no information for them in the opcode
descriptor (i.e. MCInstrDesc). That means, we don't know which variadic operands
are defs, and which are uses.
In future, we could try to conservatively assume that any extra register
operands is both a register use and a register definition.
This patch fixes a subtle bug in the evaluation of read/write operands for ARM
VLD1 with implicit index update. Added test vld1-index-update.s
llvm-svn: 347503
RetireControlUnitStatistics now reports extra information about the ROB and the
avg/maximum number of entries consumed over the entire simulation.
Example:
Retire Control Unit - number of cycles where we saw N instructions retired:
[# retired], [# cycles]
0, 109 (17.9%)
1, 102 (16.7%)
2, 399 (65.4%)
Total ROB Entries: 64
Max Used ROB Entries: 35 ( 54.7% )
Average Used ROB Entries per cy: 32 ( 50.0% )
Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to
reflect this change.
llvm-svn: 347493
Also, try to minimize the number of queries to the memory queues to speedup the
analysis.
On average, this change gives a small 2% speedup. For memcpy-like kernels, the
speedup is up to 5.5%.
llvm-svn: 347469
This avoids a heap allocation most of the times.
This patch gives a small but consistent 3% speedup on a release build (up to ~5%
on a debug build).
llvm-svn: 347464
This patch fixes an invalid memory read introduced by r346487.
Before this patch, partial register write had to query the latency of the
dependent full register write by calling a method on the full write descriptor.
However, if the full write is from an already retired instruction, chances are
that the EntryStage already reclaimed its memory.
In some parial register write tests, valgrind was reporting an invalid
memory read.
This change fixes the invalid memory access problem. Writes are now responsible
for tracking dependent partial register writes, and notify them in the event of
instruction issued.
That means, partial register writes no longer need to query their associated
full write to check when they are ready to execute.
Added test X86/BtVer2/partial-reg-update-7.s
llvm-svn: 347459
When looking at the tests committed by Roman at r346587, I noticed that numbers
reported by the resource pressure for PdAGU01 were wrong.
In particular, according to the aut-generated CHECK lines in tests
memcpy-like-test.s and store-throughput.s, resource pressure for PdAGU01
was not uniformly distributed among the two AGEN pipes.
It turns out that the reason why pressure was not correctly distributed, was
because the "resource selection strategy" object associated with PdAGU01 was not
correctly updated on the event of AGEN pipe used.
As a result, llvm-mca was not simulating a round-robin pipeline allocation for
PdAGU01. Instead, PdAGU1 was always prioritized over PdAGU0.
This patch fixes the issue; now processor resource strategy objects for
resources declaring multiple units, are correctly notified in the event of
"resource used".
llvm-svn: 346650
This was noticed when working on PR3946.
By construction, a group cannot be used as a "Super" resource. That constraint
is enforced by method `SubtargetEmitter::ExpandProcResource()`.
A Super resource S can be part of a group G. However, method
`SubtargetEmitter::ExpandProcResource()` would not update the number of
consumed resource cycles in G based on S.
In practice, this is perfectly fine because the resource usage is correctly
computed for processor resource units. However, llvm-mca should still check if G
is a buffered resource.
Before this patch, llvm-mca didn't correctly check if S was part of a group that
defines a buffer. So, the instruction descriptor was not correctly set.
For now, the semantic change introduced by this patch doesn't affect any of the
upstream scheduling models. However, it will allow to make some progress on PR3946.
llvm-svn: 346545
Use a simple SmallVector to track the lifetime of simulated instructions.
An ordered map was not needed because instructions are already picked in program
order. It is also much faster if we avoid searching for already retired
instructions at the end of every cycle.
The new policy only triggers a "garbage collection" when the number of retired
instructions becomes significantly big when compared with the total size of the
vector.
While working on this, I noticed that instructions were correctly retired, but
their internal state was not updated (i.e. there was no transition from the
EXECUTED state, to the RETIRED state). While this was not a problem for the
views, it prevented the EntryStage from correctly garbage collecting already
retired instructions. That was a bad oversight, and this patch fixes it.
The observed speedup on a debug build of llvm-mca after this patch is ~6%.
On a release build of llvm-mca, the observed speedup is ~%15%.
llvm-svn: 346487
This fixes PR39261.
FetchStage is a misnomer. It causes confusion with the frontend fetch stage,
which we don't currently simulate. I decided to rename it into EntryStage
mainly because this is meant to be a "source" stage for all pipelines.
Differential Revision: https://reviews.llvm.org/D54268
llvm-svn: 346419
Summary:
This patch introduces a CodeRegionGenerator class which is responsible for parsing some type of input and creating a 'CodeRegions' instance for use by llvm-mca. In the future, we will also have a CodeRegionGenerator subclass for converting an input object file into CodeRegions. For now, we only have the subclass for converting input assembly into CodeRegions.
This is mostly a NFC patch, as the logic remains close to the original, but now encapsulated in its own class and moved outside of llvm-mca.cpp.
Reviewers: andreadb, courbet, RKSimon
Reviewed By: andreadb
Subscribers: mgorny, tschuett, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D54179
llvm-svn: 346344
This patch teaches view RegisterFileStatistics how to report events for
optimizable register moves.
For each processor register file, view RegisterFileStatistics reports the
following extra information:
- Number of optimizable register moves
- Number of register moves eliminated
- Number of zero moves (i.e. register moves that propagate a zero)
- Max Number of moves eliminated per cycle.
Differential Revision: https://reviews.llvm.org/D53976
llvm-svn: 345865
Summary: This allows to remove `using namespace llvm;` in those *.cpp files
When we want to revisit the decision (everything resides in llvm::mca::*) in the future, we can move things to a nested namespace of llvm::mca::, to conceptually make them separate from the rest of llvm::mca::*
Reviewers: andreadb, mattd
Reviewed By: andreadb
Subscribers: javed.absar, tschuett, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D53407
llvm-svn: 345612
Before this change, the lowering of instructions from llvm::MCInst to
mca::Instruction was done as part of the first stage of the pipeline (i.e. the
FetchStage). In particular, FetchStage was responsible for picking the next
instruction from the source sequence, and lower it to an mca::Instruction with
the help of an object of class InstrBuilder.
The dependency on InstrBuilder was problematic for a number of reasons. Class
InstrBuilder only knows how to lower from llvm::MCInst to mca::Instruction.
That means, it is hard to support a different scenario where instructions
in input are not instances of class llvm::MCInst. Even if we managed to
specialize InstrBuilder, and generalize most of its internal logic, the
dependency on InstrBuilder in FetchStage would have caused more troubles (other
than complicating the pipeline logic).
With this patch, the lowering step is done before the pipeline is run. The
pipeline is no longer responsible for lowering from MCInst to mca::Instruction.
As a consequence of this, the FetchStage no longer needs to interact with an
InstrBuilder. The mca::SourceMgr class now simply wraps a reference to a
sequence of mca::Instruction objects.
This simplifies the logic of FetchStage, and increases the usability of it. As
a result, on a debug build, we see a 7-9% speedup; on a release build, the
speedup is around 3-4%.
llvm-svn: 345500
This patch introduces a new base class for Instruction named InstructionBase.
Class InstructionBase is responsible for tracking data dependencies with the
help of ReadState and WriteState objects. Class Instruction now derives from
InstructionBase, and adds extra information related to the `InstrStage` as well
as the `RCUTokenID`.
ReadState and WriteState objects are no longer unique pointers. This avoids
extra heap allocation and pointer checks that weren't really needed. Now, those
objects are simply stored into SmallVectors. We use a SmallVector instead of a
std::vector because we expect most instructions to only have a very small number
of reads and writes. By using a simple SmallVector we also avoid extra heap
allocations most of the time.
In a debug build, this improves the performance of llvm-mca by roughly 10% (I
still have to verify the impact in performance on a release build).
llvm-svn: 345280
Also, removed the initialization of vectors used for processor resource masks.
Support function 'computeProcResourceMasks()' already calls method resize on
those vectors.
No functional change intended.
llvm-svn: 345161
Added begin()/end() methods to allow the usage of SourceMgr in foreach loops.
With this change, method getMCInstFromIndex() (as well as a couple of other
methods) are now redundant, and can be removed from the public interface.
llvm-svn: 345147
A new class named InstructionError has been added to Support.h in order to
improve the error reporting from class InstrBuilder.
The llvm-mca driver is responsible for handling InstructionError objects, and
printing them out to stderr.
The goal of this patch is to remove all the remaining error handling logic from
the library code.
In particular, this allows us to:
- Simplify the logic in InstrBuilder by removing a needless dependency from
MCInstrPrinter.
- Centralize all the error halding logic in a new function named 'runPipeline'
(see llvm-mca.cpp).
This is also a first step towards generalizing class InstrBuilder, so that in
future, we will be able to reuse its logic to also "lower" MachineInstr to
mca::Instruction objects.
Differential Revision: https://reviews.llvm.org/D53585
llvm-svn: 345129
This fixes a problem introduced by r344334. A write from a non-zero move
eliminated at register renaming stage was not correctly handled by the PRF. This
would have led to an assertion failure if the processor model declares a PRF
that enables non-zero move elimination.
llvm-svn: 344392
This patch adds the ability to identify instructions that are "move elimination
candidates". It also allows scheduling models to describe processor register
files that allow move elimination.
A move elimination candidate is an instruction that can be eliminated at
register renaming stage.
Each subtarget can specify which instructions are move elimination candidates
with the help of tablegen class "IsOptimizableRegisterMove" (see
llvm/Target/TargetInstrPredicate.td).
For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated.
The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this:
```
def : IsOptimizableRegisterMove<[
InstructionEquivalenceClass<[
// GPR variants.
MOV32rr, MOV64rr,
// MMX variants.
MMX_MOVQ64rr,
// SSE variants.
MOVAPSrr, MOVUPSrr,
MOVAPDrr, MOVUPDrr,
MOVDQArr, MOVDQUrr,
// AVX variants.
VMOVAPSrr, VMOVUPSrr,
VMOVAPDrr, VMOVUPDrr,
VMOVDQArr, VMOVDQUrr
], CheckNot<CheckSameRegOperand<0, 1>> >
]>;
```
Definitions of IsOptimizableRegisterMove from processor models of a same
Target are processed by the SubtargetEmitter to auto-generate a target-specific
override for each of the following predicate methods:
```
bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI)
const;
bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned
CPUID) const;
```
By default, those methods return false (i.e. conservatively assume that there
are no move elimination candidates).
Tablegen class RegisterFile has been extended with the following information:
- The set of register classes that allow move elimination.
- Maxium number of moves that can be eliminated every cycle.
- Whether move elimination is restricted to moves from registers that are
known to be zero.
This patch is structured in three part:
A first part (which is mostly boilerplate) adds the new
'isOptimizableRegisterMove' target hooks, and extends existing register file
descriptors in MC by introducing new fields to describe properties related to
move elimination.
A second part, uses the new tablegen constructs to describe move elimination in
the BtVer2 scheduling model.
A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove'
hook to mark instructions that are candidates for move elimination. It also
teaches class RegisterFile how to describe constraints on move elimination at
PRF granularity.
llvm-mca tests for btver2 show differences before/after this patch.
Differential Revision: https://reviews.llvm.org/D53134
llvm-svn: 344334
Flag 'AllowZeroMoveEliminationOnly' should have been a property of the PRF, and
not set at register granularity.
This change also restricts move elimination to writes that update a full
physical register. We assume that there is a strong correlation between
logical registers that allow move elimination, and how those same registers are
allocated to physical registers by the register renamer.
This is still a no functional change, because this experimental code path is
disabled for now. This is done in preparation for another patch that will add
the ability to describe how move elimination works in scheduling models.
llvm-svn: 343787
This should help with catching inconsistent definitions of instructions with
zero opcodes, which also declare to consume scheduler/pipeline resources.
llvm-svn: 343766
This patch teaches class RegisterFile how to analyze register writes from
instructions that are move elimination candidates.
In particular, it teaches it how to check if a move can be effectively eliminated
by the underlying PRF, and (if necessary) how to perform move elimination.
The long term goal is to allow processor models to describe instructions that
are valid move elimination candidates.
The idea is to let register file definitions in tablegen declare if/when moves
can be eliminated.
This patch is a non functional change.
The logic that performs move elimination is currently disabled. A future patch
will add support for move elimination in the processor models, and enable this
new code path.
llvm-svn: 343691
Summary:
This is redundant, as FetchStage::getNextInstruction already checks this
and returns llvm::ErrorSuccess() as appropriate.
NFC.
Reviewers: andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D52642
llvm-svn: 343555