An instruction is a meta-instruction if it doesn't produce any output
in the form of executable instructions. So in the concept, a
meta-instruction does not have to be target independent.
Before this patch, `isMetaInstruction` is implemented by checking the
opcode of the instruction, add we have no way to add target dependent
opcode to the list, which does not make sense.
After this patch, a bit `isMeta` is added for class `Instruction` in
tablegen, which is used to indicate whether it's a meta instruction.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D121600
Most notably, Pass.h is no longer included by TargetMachine.h
before: 1063570306
after: 1063332844
Differential Revision: https://reviews.llvm.org/D121168
InstrRefBasedLDV allocates some big tables of ValueIDNum, to store live-in
and live-out block values in, that then get passed around as pointers
everywhere. This patch wraps the allocation in a std::unique_ptr, names
some types based on unique_ptr, and passes references to those around
instead. There's no functional change, but it makes it clearer to the
reader that references to these tables are borrowed rather than owned, and
we get some extra validity assertions too.
Differential Revision: https://reviews.llvm.org/D118774
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h
Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745
Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
Delay reading global metadata until the first function or the end of
the file is emitted. That way, earlier module passes can set metadata
that is emitted in the ELF.
`emitStartOfAsmFile` gets called when the passes are initialized,
which prevented earlier passes from changing the metadata.
This fixes issues encountered after converting
AMDGPUResourceUsageAnalysis to a Module pass in D117504.
Differential Revision: https://reviews.llvm.org/D118492
Was reverted in 1c1b670a73 as it broke all non-x86 bots. Original commit
message:
[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out
In certain circumstances with things like autogenerated code and asan, you
can end up with thousands of Values live at the same time, causing a large
working set and a lot of information spilled to the stack. Unfortunately
InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory
when there are many many stack slots. See the reproducer in D116821.
It seems very unlikely that a developer would be able to reason about
hundreds of live named local variables at the same time, so a huge working
set and many stack slots is an indicator that we're likely analysing
autogenerated or instrumented code. In those cases: gracefully degrade by
setting an upper bound on the amount of stack slots to track. This limits
peak memory consumption, at the cost of dropping some variable locations,
but in a rare scenario where it's unlikely someone is actually going to
use them.
In terms of the patch, this adds a cl::opt for max number of stack slots to
track, and has the stack-slot-numbering code optionally return None. That
then filters through a number of code paths, which can then chose to not
track a spill / restore if it touches an untracked spill slot. The added
test checks that we drop variable locations that are on the stack, if we
set the limit to zero.
Differential Revision: https://reviews.llvm.org/D118601
In certain circumstances with things like autogenerated code and asan, you
can end up with thousands of Values live at the same time, causing a large
working set and a lot of information spilled to the stack. Unfortunately
InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory
when there are many many stack slots. See the reproducer in D116821.
It seems very unlikely that a developer would be able to reason about
hundreds of live named local variables at the same time, so a huge working
set and many stack slots is an indicator that we're likely analysing
autogenerated or instrumented code. In those cases: gracefully degrade by
setting an upper bound on the amount of stack slots to track. This limits
peak memory consumption, at the cost of dropping some variable locations,
but in a rare scenario where it's unlikely someone is actually going to
use them.
In terms of the patch, this adds a cl::opt for max number of stack slots to
track, and has the stack-slot-numbering code optionally return None. That
then filters through a number of code paths, which can then chose to not
track a spill / restore if it touches an untracked spill slot. The added
test checks that we drop variable locations that are on the stack, if we
set the limit to zero.
Differential Revision: https://reviews.llvm.org/D118601
This commit sometimes causes a crash when compiling a vtable thunk. E.g.:
clang '--target=aarch64-grtev4-linux-gnu' -xc++ - -c -o /dev/null <<EOF
struct a {
virtual int f();
};
struct c {
virtual int &g() const;
};
struct d : a, c {
int &g() const;
};
int &d::g() const {}
EOF
Some follow-up commits have been reverted as well:
Revert "IR: Make getRetAlign check callee function attributes"
Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC."
Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC."
This reverts commit 4f414af6a7.
This reverts commit a5507d2e25.
This reverts commit 3d2d208f6a.
This reverts commit 07ddfa95e3.
SizeOf() method of DIE values(unsigned SizeOf(const AsmPrinter *AP, dwarf::Form Form) const)
depends on AsmPrinter. AsmPrinter is too specific class here. This patch removes dependency
on AsmPrinter and use dwarf::FormParams structure instead. It allows calculate DIE values
size without using AsmPrinter. That refactoring is useful for D96035([dsymutil][DWARFlinker]
implement separate multi-thread processing for compile units.)
Differential Revision: https://reviews.llvm.org/D116997
Artifact combiner is not able to access individual elements after using
LCMTy style merge/unmerge, extract and insert to change vector number of
elements (pad with undef or split to sub-vector instructions).
Use unmerge to individual elements instead and then merge elements into
requested types.
Change argument lowering for vectors and moreElementsVector to use
buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements.
FewerElementsVector had a few helpers that had different behavior,
introduce new helper for most of the opcodes.
FewerElementsVector helper is more flexible since it can create leftover
instruction smaller then requested type (useful in case target wants to
avoid pad with undef and use fewer registers). If target does not want
leftover of different type it should call more elements first.
Some helpers were performing more elements first to have split without
leftover. Opcodes that used this helper use clampMaxNumElementsStrict
(does more elements first) in LegalizerInfo to avoid test changes.
Fixes failures caused by failing to combine artifacts created during
more/fewer elements vector.
Differential Revision: https://reviews.llvm.org/D114198
Add the calculation of a score, which will be used during ML training. The
score qualifies the quality of a regalloc policy, and is independent of
what we train (currently, just eviction), or the regalloc algo itself.
We can then use scores to guide training (which happens offline), by
formulating a reward based on score variation - the goal being lowering
scores (currently, that reward is percentage reduction relative to
Greedy's heuristic)
Currently, we compute the score by factoring different instruction
counts (loads, stores, etc) with the machine basic block frequency,
regardless of the instructions' provenance - i.e. they could be due to
the regalloc policy or be introduced previously. This is different from
RAGreedy::reportStats, which accummulates the effects of the allocator
alone. We explored this alternative but found (at least currently) that
the more naive alternative introduced here produces better policies. We
do intend to consolidate the two, however, as we are actively
investigating improvements to our reward function, and will likely want
to re-explore scoring just the effects of the allocator.
In either case, we want to decouple score calculation from allocation
algorighm, as we currently evaluate it after a few more passes after
allocation (also, because score calculation should be reusable
regardless of allocation algorithm).
We intentionally accummulate counts independently because it facilitates
per-block reporting, which we found useful for debugging - for instance,
we can easily report the counts indepdently, and then cross-reference
with perf counter measurements.
Differential Revision: https://reviews.llvm.org/D115195
InstrRefBasedLDV used to crash on the added test -- the exit block is not
in scope for the variable being propagated, but is still considered because
it contains an assignment. The failure-mode was vlocJoin ignoring
assign-only blocks and not updating DIExpressions, but pickVPHILoc would
still find a variable location for it. That led to DBG_VALUEs created with
the wrong fragment information.
Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will
now consider assign-only blocks and will update their expressions.
Differential Revision: https://reviews.llvm.org/D114727
This change exposes isBuildVectorConstantSplat() to the llvm namespace
and uses it to implement the constant splat versions of
m_SpecificICst().
CombinerHelper::matchOrShiftToFunnelShift() can now work with vector
types and CombinerHelper::matchMulOBy2()'s match for a constant splat is
simplified.
Differential Revision: https://reviews.llvm.org/D114625
If we have a variable where its fragments are split into overlapping
segments:
DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16)
...
DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32)
we should only propagate the most recently assigned fragment out of a
block. LiveDebugValues only deals with live-in variable locations, as
overlaps within blocks is DbgEntityHistoryCalculators domain.
InstrRefBasedLDV has kept the accumulateFragmentMap method from
VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's
produced a mapping of variable / fragments to the overlapped variable /
fragments, VLocTracker uses it to identify when a debug instruction needs
to terminate the other parts it overlaps with. The test is updated for
some standard "InstrRef picks different registers" variation, and the
order of some unrelated DBG_VALUEs changes.
Differential Revision: https://reviews.llvm.org/D114603
This is a first attempt at a constant value consecutive store merging pass,
a counterpart to the DAGCombiner's store merging optimization.
The high level goals of this pass:
* Have a simple and efficient algorithm. As close to linear time as we can get.
Thus, prioritizing scalability of the algorithm over merging every corner case
we can find. The DAGCombiner's store merging code has been the source of
compile time and complexity issues in the past and I wanted to avoid that.
* Don't introduce any new data structures for ordering memory operations. In MIR,
we don't have the concept of chains like we do in the DAG, and the instruction
order is stricter than enforcing ordering with graph edges. Although I
considered adding something similar, I couldn't justify the overhead.
The pass is current split into 3 main parts. The main store merging code focuses
on identifying candidate stores and managing the candidate group that's under
consideration for merging. Analyzing addressing of stores is a potentially
complex part and for now there's just a basic implementation to identify easy
cases. Finally, the other main bit of complexity is the alias analysis, which
tries to follow the same logic as the DAG's AA.
Currently this implementation only supports merging of constant stores. Stores
of arbitrary variables are technically possible with a very small change, but
the DAG chooses not to do this. Doing so here makes most code worse since
there's extra overhead in merging values into wider registers.
On AArch64 -Os, this optimization results in very minor savings on CTMark.
Differential Revision: https://reviews.llvm.org/D109131
Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth.
Helps prevent future scheduler model mismatches like those that were only addressed in D44687.
Differential Revision: https://reviews.llvm.org/D113302
Over in e7084ceab3 the InstrRefBasedLDV class grew a MachineRegisterInfo
pointer to lookup register sizes -- however, that field wasn't initialized
in the corresponding unit tests. This patch initializes it!
Fixes a buildbot failure reported on D112006
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.
While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.
Differential Revision: https://reviews.llvm.org/D112333
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.
The test added runs placeMLocPHIs directly, and tests that:
* A def of the lower 8 bits of a stack slot causes all aliasing regs to
have PHIs placed,
* It doesn't cause the equivalent location to x86's $ah, which isn't
aliased, to have a PHI placed.
Differential Revision: https://reviews.llvm.org/D112324
Sometimes we generate code that writes to a subregister, then spills /
restores a super-register to the stack, for example:
$eax = MOV32ri 0
MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
$rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg
This patch takes a different approach: it adds another index to
MLocTracker that identifies a size/offset within a stack slot. A location
on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and
restoring now involves pairing up the src/dest register numbers, and the
dest/src stack position to be transferred to/from. Location coverage
improves as a result, compile-time performance decreases, alas.
One limitation is that if a PHI occurs inside a stack slot:
DBG_PHI %stack.0, 1
We don't know how large the resulting value is, and so might have
difficulty picking which value to use. DBG_PHI might need to be augmented
in the future with such a size.
Unit tests added ensure that spills and restores correctly transfer to
positions in the Location => Value map, and that different register classes
written to the stack will correctly clobber all other positions in the
stack slot.
Differential Revision: https://reviews.llvm.org/D112133
This patch adds some unit tests for the machine-location transfer-function
building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR
into the transfer-function building code, does it create the correct
transfer function.
There are a number of minor defects that get corrected in the process:
* The unit test was selecting the x86 (i.e. 32 bit) backend rather than
x86_64's 64 bit backend,
* COPY instructions weren't actually having their subregister values
correctly represented in the transfer function. Subregisters were being
defined by the COPY, rather than taking the value in the source register.
* SP aliases were at risk of being clobbered, if an SP subregister was
clobbered.
Differential Revision: https://reviews.llvm.org/D112006
Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
* It's easier to reason about one variable at a time in your mind,
* It improves performance, apparently from increased locality.
The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.
Differential Revision: https://reviews.llvm.org/D111799
This is purely a performance patch: InstrRefBasedLDV used to use three
DenseMaps to store variable values, two for long term storage and one as a
working set. This patch eliminates the working set, and updates the long
term storage in place, thus avoiding two DenseMap comparisons and two
DenseMap assignments, which can be expensive.
Differential Revision: https://reviews.llvm.org/D111716
Fix a dangling else that gcc-11 warned about. The EXPECT_EQ macro
expands to an if-else, so the whole construction contains a hidden
dangling else.
Differential Revision: https://reviews.llvm.org/D112044
This patch is very similar to D110173 / a3936a6c19, but for variable
values rather than machine values. This is for the second instr-ref
problem, calculating the correct variable value on entry to each block.
The previous lattice based implementation was broken; we now use LLVMs
existing PHI placement utilities to work out where values need to merge,
then eliminate un-necessary ones through value propagation.
Most of the deletions here happen in vlocJoin: it was trying to pick a
location for PHIs to happen in, badly, leading to an infinite loop in the
MIR test added, where it would repeatedly switch between register
locations. The new approach is simpler: either PHIs can be eliminated, or
they can't, and the location of the value is a different problem.
Various bits and pieces move to the header so that they can be tested in
the unit tests. The DbgValue class grows a "VPHI" kind to represent
variable value PHIS that haven't been eliminated yet.
Differential Revision: https://reviews.llvm.org/D110630
InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.
This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.
Differential Revision: https://reviews.llvm.org/D110173
This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.
Differential Revision: https://reviews.llvm.org/D110165
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
In order to not generate an unnecessary G_CTLZ, I extended the constant folder
in the CSEMIRBuilder to handle G_CTLZ. I also added some extra handing of
vector constants too. It seems we don't have any support for doing constant
folding of vector constants, so the tests show some other useless G_SUB
instructions too.
Differential Revision: https://reviews.llvm.org/D111036
Based on the reasoning of D53903, register operands of DBG_VALUE are
invariably treated as RegState::Debug operands. This change enforces
this invariant as part of MachineInstr::addOperand so that all passes
emit this flag consistently.
RegState::Debug is inconsistently set on DBG_VALUE registers throughout
LLVM. This runs the risk of a filtering iterator like
MachineRegisterInfo::reg_nodbg_iterator to process these operands
erroneously when not parsed from MIR sources.
This issue was observed in the development of the llvm-mos fork which
adds a backend that relies on physical register operands much more than
existing targets. Physical RegUnit 0 has the same numeric encoding as
$noreg (indicating an undef for DBG_VALUE). Allowing debug operands into
the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision
of register numbers with different zero semantics). Eventually, this
causes an assert where DBG_VALUE instructions are prohibited from
participating in live register ranges.
Reviewed By: MatzeB, StephenTozer
Differential Revision: https://reviews.llvm.org/D110105
Deriving NoAlias based on having the same index in two BaseIndexOffset
expressions seemed weird (and as shown in the added unittest the
correctness of doing so depended on undocumented pre-conditions that
the user of BaseIndexOffset::computeAliasing would need to take care
of.
This patch removes the code that dereived NoAlias based on indices
being the same. As a compensation, to avoid regressions/diffs in
various lit test, we also add a new check. The new check derives
NoAlias in case the two base pointers are based on two different
GlobalValue:s (neither of them being a GlobalAlias).
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D110256
This fixes a bug detected in DAGCombiner when using global alias
variables. Here is an example:
@foo = global i16 0, align 1
@aliasFoo = alias i16, i16 * @foo
define i16 @bar() {
...
store i16 7, i16 * @foo, align 1
store i16 8, i16 * @aliasFoo, align 1
...
}
BaseIndexOffset::computeAliasing would incorrectly derive NoAlias
for the two accesses in the example above, resulting in DAGCombiner
miscompiles.
This patch fixes the problem by a defensive approach letting
BaseIndexOffset::computeAliasing return false, i.e. that the aliasing
couldn't be determined, when comparing two global values and at least
one is a GlobalAlias. In the future we might improve this with a
deeper analysis to look at the aliasee for the GlobalAlias etc. But
that is a bit more complicated considering that we could have
'local_unnamed_addr' and situations with several 'alias' variables.
Fixes PR51878.
Differential Revision: https://reviews.llvm.org/D110064
The delayed stack protector feature which is currently used for SDAG (and thus
allows for more commonly generating tail calls) depends on being able to extract
the tail call into a separate return block. To do this it also has to extract
the vreg->physreg copies that set up the call's arguments, since if it doesn't
then the call inst ends up using undefined physregs in it's new spliced block.
SelectionDAG implementations can do this because they delay emitting register
copies until *after* the stack arguments are set up. GISel however just
processes and emits the arguments in IR order, so stack arguments always end up
last, and thus this breaks the code that looks for any register arg copies that
precede the call instruction.
This patch adds a thunk argument to the assignValueToReg() and custom assignment
hooks. For outgoing arguments, register assignments use this return param to
return a thunk that does the actual generating of the copies. We collect these
until all the outgoing stack assignments have been done and then execute them,
so that the copies (and perhaps some artifacts like G_SEXTs) are placed after
any stores.
Differential Revision: https://reviews.llvm.org/D110610
Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.
Differential Revision: https://reviews.llvm.org/D104410
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT
Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.
Relevant matchers now return both DefVReg and APInt/APFloat.
Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant
In other places use integer only constant match.
Differential Revision: https://reviews.llvm.org/D104409