Prior to this patch, tail duplication handled debug info poorly -
specifically, debug instructions would be dropped instead of being set
undef, potentially extending the lifetimes of prior debug values that
should be killed. The pass was also very aggressive with dropping debug
info, dropping debug info even when the SSA value it referred to was
still present. This patch attempts to handle debug info more carefully,
checking to see whether each affected debug value can still be live,
setting it undef if not.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D106875
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.
This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.
The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.
Differential Revision: https://reviews.llvm.org/D114449
This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case.
Differential Revision: https://reviews.llvm.org/D114676
combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.
But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.
This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.
Differential Revision: https://reviews.llvm.org/D113371
This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.
BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads
Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).
Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.
The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.
The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.
PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).
The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.
Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.
The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D112426
The REM DAG combine uses the visitDivLike functions to try and get an
optimized DIV node to provide better codegen, however in some cases this
visitDivLike call ends up in the BuildSDIVPow2 target hook, which in
turn sometimes will return the same node passed in to indicate not to
change it. The REM DAG combine does not anticipate this and creates a
cycle in the DAG because of it.
Fix this by ensuring any such optimized div node returned is distinct
from the node being combined.
Differential Revision: https://reviews.llvm.org/D114716
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.
A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.
This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.
Differential Revision: https://reviews.llvm.org/D111976
It causes builds to fail with this assert:
llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.
See comment on the code review.
> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976
This reverts commit 52ff3b0093.
Over in D114631 and [0] there's a plan for turning instruction referencing
on by default for x86. This patch adds / removes all the relevant bits of
code, with the aim that the final patch is extremely small, for an easy
revert. It should just be a condition in CommandFlags.cpp and removing the
XFail on instr-ref-flag.ll.
[0] https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html
InstrRefBasedLDV used to crash on the added test -- the exit block is not
in scope for the variable being propagated, but is still considered because
it contains an assignment. The failure-mode was vlocJoin ignoring
assign-only blocks and not updating DIExpressions, but pickVPHILoc would
still find a variable location for it. That led to DBG_VALUEs created with
the wrong fragment information.
Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will
now consider assign-only blocks and will update their expressions.
Differential Revision: https://reviews.llvm.org/D114727
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.
A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.
This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.
Differential Revision: https://reviews.llvm.org/D111976
This change exposes isBuildVectorConstantSplat() to the llvm namespace
and uses it to implement the constant splat versions of
m_SpecificICst().
CombinerHelper::matchOrShiftToFunnelShift() can now work with vector
types and CombinerHelper::matchMulOBy2()'s match for a constant splat is
simplified.
Differential Revision: https://reviews.llvm.org/D114625
Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example
%reg101 = ...
= ... reg101
%reg103 = ADD %reg101, %reg102
We can create mapping between %reg101 and %reg103.
Differential Revision: https://reviews.llvm.org/D113193
There are 2 eviction queries. One is made by tryAssign, when it attempts to
free an interference occupying the hint of the candidate. The other is
during 'regular' interference resolution, where we scan over all
physical registers and try to see if we can evict live ranges in favor
of the candidate. We currently use the same logic in both cases, just
that the former never passes the cost to any subsequent query.
Technically, the 2 decisions could be implemented with different
policies.
This patch splits the 2.
RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html
Differential Revision: https://reviews.llvm.org/D114019
If we have a variable where its fragments are split into overlapping
segments:
DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16)
...
DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32)
we should only propagate the most recently assigned fragment out of a
block. LiveDebugValues only deals with live-in variable locations, as
overlaps within blocks is DbgEntityHistoryCalculators domain.
InstrRefBasedLDV has kept the accumulateFragmentMap method from
VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's
produced a mapping of variable / fragments to the overlapped variable /
fragments, VLocTracker uses it to identify when a debug instruction needs
to terminate the other parts it overlaps with. The test is updated for
some standard "InstrRef picks different registers" variation, and the
order of some unrelated DBG_VALUEs changes.
Differential Revision: https://reviews.llvm.org/D114603
Usually dbg.declares get translated into either entries in an MF
side-table, or a DBG_VALUE on entry to the function with IsIndirect set
(including in instruction referencing mode). Much rarer is a dbg.declare
attached to a non-argument value, such as in the test added in this patch
where there's a variable-length-array. Such dbg.declares become SDDbgValue
nodes with InIndirect=true.
As it happens, we weren't correctly emitting DBG_INSTR_REFs with the
additional indirection. This patch adds the extra indirection, encoded as
adding an additional DW_OP_deref to the expression.
Differential Revision: https://reviews.llvm.org/D114440
InstrRefBasedLDV observes when variable locations are clobbered, scans what
values are available in the machine, and re-issues a DBG_VALUE for the
variable if it can find another location. Unfortunately, I hadn't joined up
the Indirectness flag, so if it did this to an Indirect Value, the
indirectness would be dropped.
Fix this, and add a test that if we clobber a variable value (on the stack
in this case), then the recovered variable location keeps the Indirect
flag.
Differential Revision: https://reviews.llvm.org/D114378
This solves a problem with non-deterministic output from opt due
to not performing dominator tree updates in a deterministic order.
The problem that was analysed indicated that JumpThreading was using
the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When
preparing the list of updates to send to DomTreeUpdater::applyUpdates
we iterated over a SmallPtrSet, which didn't give a well-defined
order of updates to perform.
The added domtree-updates.ll test case is an example that would
result in non-deterministic printouts of the domtree. Semantically
those domtree:s are equivalent, but it show the fact that when we
use the domtree iterator the order in which nodes are visited depend
on the order in which dominator tree updates are performed.
Since some passes (at least EarlyCSE) are iterating over nodes in the
dominator tree in a similar fashion as the domtree printer, then the
order in which transforms are applied by such passes, transitively,
also depend on the order in which dominator tree updates are
performed. And taking EarlyCSE as an example the end result could be
different depending on in which order the transforms are applied.
Reviewed By: nikic, kuhar
Differential Revision: https://reviews.llvm.org/D110292
This allows the generic DAG combine to fold fp_extend/fp_trunc into
loads/stores which we can then lower into a integer extending
load/truncating store plus an FP_EXTEND/FP_ROUND.
The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked
types hence lowering them introduces an unpack/zip. By allowing these
nodes to be combined with loads/store we make it much easier to have
this unpack/zip combined into the load/store by our custom lowering.
Differential Revision: https://reviews.llvm.org/D114580
In most common cases the @llvm.get.active.lane.mask intrinsic maps directly
to the SVE whilelo instruction, which already takes overflow into account.
However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower
this immediately to a generic sequence of instructions that explicitly
take overflow into account. This makes it very difficult to then later
transform back into a single whilelo instruction. Therefore, this patch
introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if
we should lower/expand this to a sequence of generic ISD nodes, or instead
just leave it as an intrinsic for the target to lower.
You can see the significant improvement in code quality for some of the
tests in this file:
CodeGen/AArch64/active_lane_mask.ll
Differential Revision: https://reviews.llvm.org/D114542
No functional changes intended.
Before this patch DwarfCompileUnit::createScopeChildrenDIE() and
DwarfCompileUnit::createAndAddScopeChildrenDIE() used to emit child subtrees
and then when all the children get created, attach them to a parent scope DIE.
However, when a DIE doesn't have a parent, all the requests for its unit DIE
fail.
Currently, this is not a big issue since it isn't usually needed to know unit DIE
for a local (function-scoped) entity. But once we introduce lexical blocks as
a valid scope for global variables (static locals) and type DIEs, any requests
for a unit DIE need to be guarded against local scope due to the potential
absence of the DIE's parent.
To avoid the aforementioned issue, this patch refactors a few DwarfCompileUnit
methods to support the idea of attaching a DIE to its parent as close to the
creation of this DIE as possible.
Reviewed By: ellis
Differential Revision: https://reviews.llvm.org/D114350
This change folds a basic funnel shift idiom:
- (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt)
- (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt)
This also helps in folding to rotate shift if x and y are equal since we
already have a funnel shift to rotate combine.
Differential Revision: https://reviews.llvm.org/D114499
Currently the generic lowering of llvm.get.active.lane.mask is done
in SelectionDAGBuilder::visitIntrinsicCall and currently assumes
only fixed-width vectors are used. This patch changes the code to be
more generic and support scalable vectors too. I have added tests
for SVE here:
CodeGen/AArch64/active_lane_mask.ll
although the code quality leaves a lot to be desired. The code will
be improved significantly in a later patch that makes use of the
SVE whilelo instruction.
Differential Revision: https://reviews.llvm.org/D114541
In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs
after SelectionDAG, even in instruction referencing mode (if the variable
is an argument). If the corresponding argument value is spilt to the stack,
then we have:
* Indirection from it being on the stack,
* Indirection from it being a dbg.declare or a dbg.addr.
However InstrRefBasedLDV only emits one level of indirection. This patch
adds the second, by adding an extra DW_OP_deref if necessary. The two
tests modified fail otherwise -- they feature some NRVO, and require two
levels of indirection to be correct.
Differential Revision: https://reviews.llvm.org/D114364
This is a performance patch -- LiveDebugVariables can behave quadratically
if a lot of debug instructions are inserted back into the same place, and
we have to repeatedly step-over hte ones we've already inserted.
To get around it, whenever we insert a debug instruction at a slot index,
check whether there are more debug instructions to insert at this point,
and insert them too. That avoids the repeated lookup and stepping through.
It relies on the container for unlinked debug instructions being recorded
in-order, which is how LiveDebugVariables currently does it.
Differential Revision: https://reviews.llvm.org/D114587
DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the
lexical scope of their variable. It's arguable as to what we should do
about this, however VarLocBasedLDV permits such variable locations to be
propagated, so let's allow it in InstrRefBasedLDV.
It's necessary for the modified test to work.
Differential Revision: https://reviews.llvm.org/D114578