memory-barrier instructions to providing targets and developers a convenient
way to explicitly declare which instructions are memory-barriers.
Differential Revision: https://reviews.llvm.org/D116779
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
This patch fixes the warning
InstructionTables.cpp:27:56: error: loop variable 'Resource' of type
'const std::pair<const uint64_t, ResourceUsage> &' (aka 'const
pair<const unsigned long, llvm::mca::ResourceUsage> &') binds to a
temporary constructed from type 'const std::pair<unsigned long,
llvm::mca::ResourceUsage> &' [-Werror,-Wrange-loop-construct]
Note that Resource is declared as:
SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources;
without "const" for uint64_t.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This fixes some redundant move in return statement [-Wredundant-move] gcc 9.3.0
warnings.
This also fixes a minor coverity issue reported agaist class MCAOperand about
the lack of proper initialization for field Index.
No functional change intended.
My last change to the RegisterFile (PR51495) has introduced a bug in the logic
that allocates physical registers in the PRF.
In some cases, this bug could have triggered a nasty unsigned wrap in the number
of allocated registers, thus resulting in mca being stuck forever in a loop of
PRF availability checks.
It turns out that SchedWrite WriteIMulH was always assigned to the low half of
the result of a MULX (rather than to the high half).
To avoid confusion, this patch swaps the two MULX writes in the tablegen
definition of MULX32/64. That way, write names better describe what they
actually refer to; this also avoids further complications if in future we decide
to reuse the same MulH writes to also model other scalar integer multiply
instructions. I also had to swap the latency values for the two MULX writes to
make sure that the change is effectively an NFC. In fact, none of the existing
x86 tests were affected by this small refactoring.
This patch also fixes a bug in MCA: a wrong latency value was propagated for
instructions that perform multiple writes to a same register. This last issue
was found by Roman while testing MULX on targets that define a different latency
for the Low/High part of the result.
Differential Revision: https://reviews.llvm.org/D108727
Moved View.h and View.cpp from /tools/llvm-mca/Views/ to /lib/MCA/ and
/include/llvm/MCA/. This is so that targets can define their own Views within
the /lib/Target/ directory (so that the View can use backend functionality).
To enable these Views within mca, targets will need to add them to the vector of
Views returned by their target's CustomBehaviour::getViews() methods.
Differential Revision: https://reviews.llvm.org/D108520
Load/Store unit is used to enforce order of loads and stores if they
alias (controlled by --noalias=false option).
Fixes PR50483 - [MCA] In-order pipeline doesn't track memory
load/store dependencies.
Differential Revision: https://reviews.llvm.org/D103955
0 latency instructions now get processed and retired properly within the in-order pipeline. Had to fix a bug within TimelineView.cpp as well that would show up when a 0 latency instruction was the first instruction in the source.
Differential Revision: https://reviews.llvm.org/D104675
This patch will allow developers to remove unwanted instruction Defs (most likely from within a target specific InstrPostProcess) by setting that Def's RegisterID to 0.
Differential Revision: https://reviews.llvm.org/D104433
Put the dtor of mca::CustomBehaviour into the cpp file to avoid
undefined vtable when linking libLLVMMCACustomBehaviourAMDGPU as shared
library.
Differential Revision: https://reviews.llvm.org/D104401
The original change was pushed in main as commit f7a23ecece.
It was then reverted by commit a04f01bab2 because it caused linker failures
on buildbots that don't build the AMDGPU target.
--
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.
More details are available in the original commit log message (f7a23ecece).
Differential Revision: https://reviews.llvm.org/D104149
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.
Fixes PR50725
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.
Implementation details:
llvm-mca does its best to extract relevant register, resource, and memory
information from every MCInst when lowering them to an mca::Instruction. It then
uses this information to detect dependencies and simulate stalls within the
pipeline. For some instructions, the information that gets captured within the
mca::Instruction is not enough for mca to simulate them properly. In these
cases, there are two main possibilities:
1. The instruction has a dependency that isn’t detected by mca.
2. mca is incorrectly enforcing a dependency that shouldn’t exist.
For the rest of this discussion, I will be focusing on (1), but I have put some
thought into (2) and I may revisit it in the future.
So we have an instruction that has dependencies that aren’t picked up by mca.
The basic idea for both pipelines in mca is that when an instruction wants to be
dispatched, we first check for register hazards and then we check for resource
hazards. This is where CB is injected. If no register or resource hazards have
been detected, we make a call to CustomBehaviour::checkCustomHazard() to give
the target specific CB the chance to detect and enforce any custom dependencies.
The return value for checkCustomHazaard() is an unsigned int representing the
(minimum) number of cycles that the instruction needs to stall for. It’s fine to
underestimate this value because when StallCycles gets down to 0, we’ll end up
checking for all the hazards again before the instruction is actually
dispatched. However, it’s important not to overestimate the value and the more
accurate your estimate is, the more efficient mca’s execution can be.
In general, for checkCustomHazard() to be able to detect these custom
dependencies, it needs information about the current instruction and also all of
the instructions that are still executing within the pipeline. The mca pipeline
uses mca::Instruction rather than MCInst and the current information encoded
within each mca::Instruction isn’t sufficient for my use cases. I had to add a
few extra attributes to the mca::Instruction class and have them get set by the
MCInst during instruction building. For example, the current mca::Instruction
doesn’t know its opcode, and it also doesn’t know anything about its immediate
operands (both of which I had to add to the class).
With information about the current instruction, a list of all currently
executing instructions, and some target specific objects (MCSubtargetInfo and
MCInstrInfo which the base CB class has references to), developers should be
able to detect and enforce most custom dependencies within checkCustomHazard. If
you need more information than is present in the mca::Instruction, feel free to
add attributes to that class and have them set during the lowering sequence from
MCInst.
Fortunately, in the in-order pipeline, it’s very convenient for us to pass these
arguments to checkCustomHazard. The hazard checking is taken care of within
InOrderIssueStage::canExecute(). This function takes a const InstRef as a
parameter (representing the instruction that currently wants to be dispatched)
and the InOrderIssueStage class maintains a SmallVector<InstRef, 4> which holds
all of the currently executing instructions. For the out-of-order pipeline, it’s
a bit trickier to get the list of executing instructions and this is why I have
held off on implementing it myself. This is the main topic I will bring up when
I eventually make a post to discuss and ask for feedback.
CB is a base class where targets implement their own derived classes. If a
target specific CB does not exist (or we pass in the -disable-cb flag), the base
class is used. This base class trivially returns 0 from its checkCustomHazard()
implementation (meaning that the current instruction needs to stall for 0 cycles
aka no hazard is detected). For this reason, targets or users who choose not to
use CB shouldn’t see any negative impacts to accuracy or performance (in
comparison to pre-patch llvm-mca).
Differential Revision: https://reviews.llvm.org/D104149
This patch fixes the logic that checks for variadic register definitions,
Before llvm-svn 348114 (commit 4cf35b4ab0), it was not possible to explicitly
mark variadic operands as definitions. By default, variadic operands of an
MCInst were always assumed to be uses. A number of had-hoc checks were
introduced in the InstrBuilder to fix the processing of variadic register
operands of ARM ldm/stm variants.
This patch simply replaces those old (and buggy) checks with a much simpler (and
correct) check for MCID::Flag::VariadicOpsAreDefs.
This is based on the assumption that most simulated instructions don't define
more than one or two registers. This is true for example on x86, where
most instruction definitions don't declare more than one register write.
The default code region size has been increased from 8 to 16. This is based on
the assumption that, for small microbenchmarks, the typical code snippet size is
often less than 16 instructions.
mca::Instruction now uses bitfields to pack flags.
No functional change intended.
The constructor of InOrderIssueStage no longer takes as input a reference to the
target scheduling model. The stage can always query the subtarget to obtain a
reference to the scheduling model.
The ResourceManager is no longer stored internally as a unique_ptr.
Moved a couple of method definitions to the .cpp file.
Moved the logic that checks for RAW hazards from the InOrderIssueStage to the
RegisterFile.
Changed how the InOrderIssueStage keeps track of backend stalls. Stall events
are now generated from method notifyStallEvent().
No functional change intended.
Conservatively use the instruction latency to compute the last write-back cycle.
Before this patch, the last write cycle computation was incorrect for store
instructions that didn't declare any register writes.
This patch lifts the restriction on the number of read/write registers for a
move elimination candidate. With this patch, move elimination candidates with
exactly two reads and two writes are treated like register swap operations for
the purpose of move elimination.
This patch currently doesn't affect any upstream model. However, it should help
unblock the progress on PR50258.
The register file should always check if the destination register is from a
register class that allows move elimination.
Before this change, the check on the register class was only performed in a few
very specific cases. However, it should have always been performed.
This patch fixes the issue.
Note that none of the upstream scheduling models is currently affected by this
bug, so there is no test for it. The issue was found by Roman while working on
the znver3 model. I was able to reproduce the issue locally by tweaking the
btver2 model. I then verified that this patch fixes the issue.
Early exit from method DispatchStage::isAvailable() if the dispatch group is
already full. Not all instructions declare at least one uOP.
Fixes PR50174.
Make sure that the `CriticalMemoryInstruction` of a memory group is invalidated
if it references an already executed instruction. This avoids a potential
use-after-free if the critical memory info becomes stale, and the value is
read after the instruction has executed.
Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.
The patch fixes PR49712.
Differential Revision: https://reviews.llvm.org/D99339
This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order
When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.
Differential Revision: https://reviews.llvm.org/D98628
Before this patch, register writes were always invalidated by the
RegisterFile at instruction commit stage. So,
the RegisterFile was often losing the knowledge about the `execute
cycle` of writes already committed. While this was not problematic
for non-delayed reads, this was sometimes leading to inaccurate read
latency computations in the presence of negative read-advance cycles.
This patch fixes the issue by changing how the RegisterFile component
internally keeps track of the `execute cycle` information of each
write. On every instruction executed, the RegisterFile gets notified
by the RetireStage, so that it can internally record the execute
cycle of each executed write.
The `execute cycle` information is stored within WriteRef itself, and
it is not invalidated when the write is committed.
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.
In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.
Differential Revision: https://reviews.llvm.org/D94928
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
This is likely to be a regressigion introduced by my last refactoring of the
LSUnit (commit 5578ec32f9). Before this patch, the
"CurrentStoreBarrierGroupID" index was not correctly reset on store barrier
executions. This was leading to unexpected crashes like the one reported as
PR48024.
This fixes a bug reported by Alex Renda on LLVMDev where mca did not correctly
mark a resource group as "reserved".
(See http://lists.llvm.org/pipermail/llvm-dev/2020-May/141485.html).
The issue was caused by a wrong check in function `initializeUsedResources`.
As a consequence of this, a resource group was left unreserved, and its field
`NumUnits` incorrectly reported an unrealistic number of consumed resource
units.
This patch fixes the issue with the handling of reserved resources in the
InstrBuilder class, and adds a simple test for it. Ideally, as suggested by
Andy Trick, most of these problems will disappear if in the future we will
introduce a (optional) DelayCycles vector for SchedWriteRes.
This fixes a regression introduced by a very old commit 280ac1fd1d (was
llvm-svn 361950).
Commit 280ac1fd1d redesigned the logic in the LSUnit with the goal of
speeding up isReady() queries, and stabilising the LSUnit API (while also making
the load store unit more customisable).
The concept of MemoryGroup (effectively an alias set) was added by that commit
to better describe and track dependencies between memory operations. However,
that concept was not just used for alias dependencies, but it was also used for
describing memory "order" dependencies (enforced by the memory consistency
model).
Instructions of a same memory group were considered "equivalent" as in:
independent operations that can potentially execute in parallel. The problem
was that the cost of a dependency (in terms of number of cycles) should have
been different for "order" dependency. Instructions in an order dependency
simply have to have to wait until their predecessors are "issued" to an
underlying pipeline (rather than having to wait until predecessors have beeng
fully executed). For simple "order" dependencies, this was effectively
introducing an artificial delay on the "issue" of independent loads and stores.
This patch fixes the issue and adds a new test named 'independent-load-stores.s'
to a bunch of x86 targets. That test contains the reproducible posted by Fabian
Ritter on PR45793.
I had to rerun the update-mca-tests script on several files. To avoid expected
regressions on some Exynos tests, I have added a -noalias=false flag (to match
the old strict behavior on latencies).
Some tests for processor Barcelona are improved/fixed by this change and they
now show better results. In a few tests we were incorrectly counting the time
spent by instructions in a scheduler queue. In one case in particular we now
correctly see a store executed out of order. That test was affected by the same
underlying issue reported as PR45793.
Reviewers: mattd
Differential Revision: https://reviews.llvm.org/D79351