InstrRefBasedLDV observes when variable locations are clobbered, scans what
values are available in the machine, and re-issues a DBG_VALUE for the
variable if it can find another location. Unfortunately, I hadn't joined up
the Indirectness flag, so if it did this to an Indirect Value, the
indirectness would be dropped.
Fix this, and add a test that if we clobber a variable value (on the stack
in this case), then the recovered variable location keeps the Indirect
flag.
Differential Revision: https://reviews.llvm.org/D114378
This solves a problem with non-deterministic output from opt due
to not performing dominator tree updates in a deterministic order.
The problem that was analysed indicated that JumpThreading was using
the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When
preparing the list of updates to send to DomTreeUpdater::applyUpdates
we iterated over a SmallPtrSet, which didn't give a well-defined
order of updates to perform.
The added domtree-updates.ll test case is an example that would
result in non-deterministic printouts of the domtree. Semantically
those domtree:s are equivalent, but it show the fact that when we
use the domtree iterator the order in which nodes are visited depend
on the order in which dominator tree updates are performed.
Since some passes (at least EarlyCSE) are iterating over nodes in the
dominator tree in a similar fashion as the domtree printer, then the
order in which transforms are applied by such passes, transitively,
also depend on the order in which dominator tree updates are
performed. And taking EarlyCSE as an example the end result could be
different depending on in which order the transforms are applied.
Reviewed By: nikic, kuhar
Differential Revision: https://reviews.llvm.org/D110292
This allows the generic DAG combine to fold fp_extend/fp_trunc into
loads/stores which we can then lower into a integer extending
load/truncating store plus an FP_EXTEND/FP_ROUND.
The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked
types hence lowering them introduces an unpack/zip. By allowing these
nodes to be combined with loads/store we make it much easier to have
this unpack/zip combined into the load/store by our custom lowering.
Differential Revision: https://reviews.llvm.org/D114580
In most common cases the @llvm.get.active.lane.mask intrinsic maps directly
to the SVE whilelo instruction, which already takes overflow into account.
However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower
this immediately to a generic sequence of instructions that explicitly
take overflow into account. This makes it very difficult to then later
transform back into a single whilelo instruction. Therefore, this patch
introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if
we should lower/expand this to a sequence of generic ISD nodes, or instead
just leave it as an intrinsic for the target to lower.
You can see the significant improvement in code quality for some of the
tests in this file:
CodeGen/AArch64/active_lane_mask.ll
Differential Revision: https://reviews.llvm.org/D114542
No functional changes intended.
Before this patch DwarfCompileUnit::createScopeChildrenDIE() and
DwarfCompileUnit::createAndAddScopeChildrenDIE() used to emit child subtrees
and then when all the children get created, attach them to a parent scope DIE.
However, when a DIE doesn't have a parent, all the requests for its unit DIE
fail.
Currently, this is not a big issue since it isn't usually needed to know unit DIE
for a local (function-scoped) entity. But once we introduce lexical blocks as
a valid scope for global variables (static locals) and type DIEs, any requests
for a unit DIE need to be guarded against local scope due to the potential
absence of the DIE's parent.
To avoid the aforementioned issue, this patch refactors a few DwarfCompileUnit
methods to support the idea of attaching a DIE to its parent as close to the
creation of this DIE as possible.
Reviewed By: ellis
Differential Revision: https://reviews.llvm.org/D114350
This change folds a basic funnel shift idiom:
- (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt)
- (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt)
This also helps in folding to rotate shift if x and y are equal since we
already have a funnel shift to rotate combine.
Differential Revision: https://reviews.llvm.org/D114499
Currently the generic lowering of llvm.get.active.lane.mask is done
in SelectionDAGBuilder::visitIntrinsicCall and currently assumes
only fixed-width vectors are used. This patch changes the code to be
more generic and support scalable vectors too. I have added tests
for SVE here:
CodeGen/AArch64/active_lane_mask.ll
although the code quality leaves a lot to be desired. The code will
be improved significantly in a later patch that makes use of the
SVE whilelo instruction.
Differential Revision: https://reviews.llvm.org/D114541
In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs
after SelectionDAG, even in instruction referencing mode (if the variable
is an argument). If the corresponding argument value is spilt to the stack,
then we have:
* Indirection from it being on the stack,
* Indirection from it being a dbg.declare or a dbg.addr.
However InstrRefBasedLDV only emits one level of indirection. This patch
adds the second, by adding an extra DW_OP_deref if necessary. The two
tests modified fail otherwise -- they feature some NRVO, and require two
levels of indirection to be correct.
Differential Revision: https://reviews.llvm.org/D114364
This is a performance patch -- LiveDebugVariables can behave quadratically
if a lot of debug instructions are inserted back into the same place, and
we have to repeatedly step-over hte ones we've already inserted.
To get around it, whenever we insert a debug instruction at a slot index,
check whether there are more debug instructions to insert at this point,
and insert them too. That avoids the repeated lookup and stepping through.
It relies on the container for unlinked debug instructions being recorded
in-order, which is how LiveDebugVariables currently does it.
Differential Revision: https://reviews.llvm.org/D114587
DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the
lexical scope of their variable. It's arguable as to what we should do
about this, however VarLocBasedLDV permits such variable locations to be
propagated, so let's allow it in InstrRefBasedLDV.
It's necessary for the modified test to work.
Differential Revision: https://reviews.llvm.org/D114578
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.
For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.
Reapplied with fix for AArch64 rev patterns to matching the ARM fix.
https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)
Differential Revision: https://reviews.llvm.org/D114354
The changes in D113888 / 32b6c17b29 altered the memory size of a
masked store, as it will store an unknown number of bytes not the full
vector size. We can have situations where the masked stores is legalized
and then turned to a normal store, as the mask is known to be all ones.
This creates a store with an unknown size MMO that was hitting this
assert.
The store created can be given a better size in a followup patch. This
currently adjusts the assert to handle unknown sizes.
Almost all of the time, call instructions don't actually lead to SP being
different after they return. An exception is win32's _chkstk, which which
implements stack probes. We need to recognise that as modifying SP, so
that copies of the value are tracked as distinct vla pointers.
This patch adds a target frame-lowering hook to see whether stack probe
functions will modify the stack pointer, store that in an internal flag,
and if it's true then scan CALL instructions to see whether they're a
stack probe. If they are, recognise them as defining a new stack-pointer
value.
The added test exercises this behaviour: two calls to _chkstk should be
considered as producing two different values.
Differential Revision: https://reviews.llvm.org/D114443
Avoid un-necessarily recreating DBG_VALUEs on call instructions.
In LiveDebugvalues we choose to ignore any clobbers of SP by call
instructions, as they're irrelevant to our model of the machine. We
currently do so for tracking register values (MTracker); do the same for
tracking variable locations (TTracker).
Test modified to endure that a duplicate DBG_VALUE is not created after the
call in struction in this test.
Differential Revision: https://reviews.llvm.org/D114365
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.
For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.
https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)
Differential Revision: https://reviews.llvm.org/D114354
In quite a few places we were calling getCurSDLoc() to get the debug
location, but this is already a local variable `sdl`.
Differential Revision: https://reviews.llvm.org/D114447
It appears that we can emit all the instructions for a function, including
debug instructions, and then optimise some of the values out late.
Specifically, in the attached test case, an argument gets optimised out
after DBG_VALUE / DBG_INSTR_REFs are created. This confuses
MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a
defining instruction, and crashes instead.
Fix this by identifying when there's no defining instruction, and
translating that instead into a DBG_VALUE $noreg.
Differential Revision: https://reviews.llvm.org/D114476
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
The MIR sample loader changes the branch probability but not BFI.
Here we force a recompute of BFI if the branch probabilities are
changed.
Also register the MIR FSAFDO passes properly.
Differential Revision: https://reviews.llvm.org/D114400
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with primary in `LiveRangeUtils.h`.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D114191
Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>. Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro).
A follow-up patch has documentation on how the macros are (intended) to be used.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D114144
A masked load or store will load a potentially unknown number of bytes
from a memory location - that is not generally known at compile time.
They do not necessarily load/store the entire vector width, and treating
them as such can lead to incorrect aliasing information (for example, if
the underlying object is smaller than the size of the vector).
This makes sure that the MMO is given an unknown size to represent this.
which is less accurate that "may load/store from up to 16 bytes", but
less incorrect that "will load/store from 16 bytes".
Differential Revision: https://reviews.llvm.org/D113888
This basically reverts 1778831a3d, which split them.
Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of
features that usually weren't added to EmitMSInlineAsmStr(), and
that was usually a mistake. D71677, D113932, D114167 are all examples
of where things were backported to EmitMSInlineAsmStr().
The names were also not great. EmitMSInlineAsmStr() used to be called for `asm
inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as
well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand,
EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for
GCC-style __asm__ / asm statements with -masm=att (the default).
It's also less code (23 insertions, 188 deletions).
No behavior change.
Differential Revision: https://reviews.llvm.org/D114330
This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass
with `asm inteldialect` too.
I don't know if this is something one can hit in practice with inline
asm. The test is from 2007 (4646aa3e33) but in 2009 blockaddr was
introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");`
compiles to
call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1))
nowadays (thanks to jrtc27 for that example!).
(6c4d255bf3 switched clang to blockaddress on an opt-in basis,
e4801f7844 added docs for it, 31b132c0b7 added IR support.)
I half-heartedly tried to build clang 2.8 locally, but it didn't
just build. And 2.8 didn't have a prebuilt clang binary yet.
The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr()
more alike, and maybe we should delete this code form EmitGCCInlineAsmStr()
instead. But since it's just 3 lines and it's reachable from LLVM IR,
let's do the safer thing for now.
Differential Revision: https://reviews.llvm.org/D114329
This makes the following program build with -masm=intel:
int foo(int count) {
asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop);
return count;
stop:
return 0;
}
It's also is another step towards merging EmitGCCInlineAsmStr() and
EmitMSInlineAsmStr().
Differential Revision: https://reviews.llvm.org/D114167
No intended behavior change.
EmitGCCInlineAsmStr() used to explicitly check for modifier 'l'
after handling block address and machine basic block operands.
This prevented passing a MachineOperand with 'l' modifier to
PrintAsmMemoryOperand(). Conceptually that seems kind of nice,
but in practice the overrides of PrintAsmMemoryOperand() in all (*)
AsmPrinter subclasses already reject modifiers they don't know about,
and none of them don't know about 'l'. So removing this doesn't have
a behavior difference, is less code, and it makes EmitGCCInlineAsmStr()
and EmitMSInlineAsmStr() more similar, to prepare for merging them later.
(Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that
always works with X86AsmPrinter I think, and
X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l'
modifier, so it's hard to motivate adding that branch.)
*: The one exception was AVRAsmPrinter, which had an llvm_unreachable instead
of returning true. So this commit changes that, so that the AVR target keeps
emitting an error instead of crashing when passing a mem operand with a :l
modifier to it. All the other targets already don't crash on this.
Differential Revision: https://reviews.llvm.org/D114216
Patch to fix some of the regressions in D77804.
By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.
Differential Revision: https://reviews.llvm.org/D113192
Fixed the vector type issue that where we used getVectorNumElements()
should be replaced by getVectorElementCount() when lowering these
intrinsics.
This is similar to D94149
Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D109809
Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type.
We have special handling for CTTZ expansion to use those ops with a
small conversion. The setup for that doesn't generate extra code or
large constants so we don't gain anything from expanding early and we
make CTTZ_ZERO_UNDEF codegen worse.
Follow up from post commit feedback on D112268. We don't seem to have
any in tree tests that care about this.
This is preparation for D113707, where I want to make `-masm=intel`
emit `asm inteldialect` instructions.
`{movq %rbx, %rax|mov rax, rbx}` is supposed to evaluate to the bit
between { and | for att and to the bit between | and } for intel.
Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(),
EmitMSInlineAsmStr() has to support variants as well.
(clang translates `{...|...}` to `$(...$|...$)`. I'm not sure why
it doesn't just send along only the first `...` or the second `...`
to LLVM, but given the notes in PR23933 let's not do a big
reorganization in this codepath.)
Differential Revision: https://reviews.llvm.org/D113932
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.
The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.
This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.
We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.
Differential Revision: https://reviews.llvm.org/D111904
`asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm
input), so EmitGCCInlineAsmStr() always has to pick the same variant since it
cares about the input asm string, not the output asm string.
For PowerPC, that default variant is 1. For other targets, it's 0.
Without this, the included test case errors out with
error: unknown use of instruction mnemonic without a size suffix
mov rax, rbx
since it picks the intel branch and then tries to interpret it as AT&T
when selecting intel-style output with `-x86-asm-syntax=intel`.
Differential Revision: https://reviews.llvm.org/D113894
If possible fold fneg into instruction above if users cannot fold mods and we
know it will decrease instruction count.
Follows same logic as SDAG combiner in choosing opportunities to combine.
Differential Revision: https://reviews.llvm.org/D112827
When getTypeConversion returns TypeScalarizeScalableVector we were
sometimes returning a non-simple type from getTypeLegalizationCost.
However, many callers depend upon this being a simple type and will
crash if not. This patch changes getTypeLegalizationCost to ensure
that we always a return sensible simple VT. If the vector type
contains unusual integer types, e.g. <vscale x 2 x i3>, then we just
set the type to MVT::i64 as a reasonable default.
A test has been added here that demonstrates the vectoriser can
correctly calculate the cost of vectorising a "zext i3 to i64"
instruction with a VF=vscale x 1:
Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
Differential Revision: https://reviews.llvm.org/D113777
If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D113493
This change make WidenVecRes_SELECT work for scalable vectors.
This patch is split from [D110319](https://reviews.llvm.org/D110319)
Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D110388
Instead of popping them and then immediately throwing them away, we can
just filter out globals and items in different scopes before adding them
to WorkList. Shouldn't change anything but keep the queue smaller.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D113864
It was being used occasionally already, and using it on the constructor
and getDbgEntityID has obvious type safety benefits.
Also use llvm_unreachable in the switch as usual, but since only these
two values are used in constructor calls I think it's still NFC.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D113862
This is a first attempt at a constant value consecutive store merging pass,
a counterpart to the DAGCombiner's store merging optimization.
The high level goals of this pass:
* Have a simple and efficient algorithm. As close to linear time as we can get.
Thus, prioritizing scalability of the algorithm over merging every corner case
we can find. The DAGCombiner's store merging code has been the source of
compile time and complexity issues in the past and I wanted to avoid that.
* Don't introduce any new data structures for ordering memory operations. In MIR,
we don't have the concept of chains like we do in the DAG, and the instruction
order is stricter than enforcing ordering with graph edges. Although I
considered adding something similar, I couldn't justify the overhead.
The pass is current split into 3 main parts. The main store merging code focuses
on identifying candidate stores and managing the candidate group that's under
consideration for merging. Analyzing addressing of stores is a potentially
complex part and for now there's just a basic implementation to identify easy
cases. Finally, the other main bit of complexity is the alias analysis, which
tries to follow the same logic as the DAG's AA.
Currently this implementation only supports merging of constant stores. Stores
of arbitrary variables are technically possible with a very small change, but
the DAG chooses not to do this. Doing so here makes most code worse since
there's extra overhead in merging values into wider registers.
On AArch64 -Os, this optimization results in very minor savings on CTMark.
Differential Revision: https://reviews.llvm.org/D109131
Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D113834
It's possible that the mask is already a NOT. At least if InstCombine
hasn't canonicalized the input. In that case we will form an ANDN with
X instead of with Y. So we don't need to worry about Y being a constant.
We might need to check that X isn't a constant instead, but we don't
have a test case for that yet.
This fixes a size regression found when trying to enable this combine
for RISCV in D113937.
Differential Revision: https://reviews.llvm.org/D113948
https://reviews.llvm.org/D71677 copied a bunch of code from
EmitGCCInlineAsmStr() to EmitMSInlineAsmStr() but made a few small
(likely unintentional) changes. This makes these pieces look the same.
No behavior change.
(Why are these functions two copies? No great reason as far as I can tell.
https://reviews.llvm.org/rG1778831a3d1d24ab6545635f63da4d9c5f8f0ac7 did the
split; we might want to undo them at some point. But PR23933 suggests
that a bigger change is planned for this file in the future, so keeping
this incremental for now.)
Differential Revision: https://reviews.llvm.org/D113924
There's really no reason why anyone should use these special names in a variant.
I noticed this while reading the code: all other writes to OS are guarded by
this conditional, and the behavior with the check seems more correct, so
let's add the check.
Differential Revision: https://reviews.llvm.org/D113909
MachineVerifier verified the subranges of a live interval if
they existed, but did not complain if they did not exist.
This patch changes the verifier to complain if there are no
subranges in the live interval for a subreg operand (so long
as MachineRegisterInfo says we should be tracking subreg
liveness for that register). This matches the conditions for
LiveIntervalCalc to create subranges in the first place.
Differential Revision: https://reviews.llvm.org/D112556
In a LTO build, the `end_sequence` in debug_line table for each compile unit (CU) points the end of text section which merged all CUs. The `end_sequence` needs to point to the end of each CU's range. This bug often causes invalid `debug_line` table in the final `.dSYM` binary for MachO after running `dsymutil` which tries to compensate an out-of-range address of `end_sequence`.
The fix is to sync the line table termination with the range operations that are already maintained in DwarfDebug. When CU or section changes, or nodebug functions appear or module is finished, the prior pending line table is terminated using the last range label. In the MC path where no range is tracked, the old logic is conservatively used to end the line table using the section end symbol.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D108261
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.
Differential Revision: https://reviews.llvm.org/D113839
This was noted as a follow-up to D113212 / D113426:
4fc1fc40057e30404c3b11522cfcadhttps://alive2.llvm.org/ce/z/e4o96b
The canonicalization rules for these IR patterns are complicated,
and we were not matching the expected forms in 2 out of the 3
cases. We can make codegen more robust by matching the swapped
forms (and that will also work if these patterns are created late).
This modifies the preconditions of TypePromotion's isSafeWrap
method, to allow it to work from all constants from the ICmp.
Using the code:
%a = add %x, C1
%c = icmp ult %a, C2
According to Alive, we can prove that is equivalent to
icmp ult (add zext(%x), sext(C1)), zext(C2) given
C1 <=s 0 and C1 >s C2.
https://alive2.llvm.org/ce/z/CECYZB
Which is similar to what is already present. We can also
prove icmp ult (add zext(%x), sext(C1)), sext(C2) given
C1 <=s 0 and C1 <=s C2.
https://alive2.llvm.org/ce/z/KKgyeL
The PrepareWrappingAdds method was removed, and the
constants are now altered to sext or zext directly as
required by the above methods.
Differential Revision: https://reviews.llvm.org/D113678
For global variables and common blocks there is no way to create entities
through getOrCreateContextDIE(), so no need to obtain the context first.
Differential Revision: https://reviews.llvm.org/D113651
Register uses that are MRI->isConstantPhysReg() should not inhibit
sinking transformation.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D111531
(Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1
https://alive2.llvm.org/ce/z/mGCBrd
This was suggested as a potential enhancement in D113212 (also 7e30404c3b ).
There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ).
For x86, we have a counter-acting fold for most cases that turns the shift+not
back into a setcc, so that needs a work-around to get more cases to use "pandn":
D113603
Note that this pattern (and a previous one) are not currently canonical forms
in IR:
https://alive2.llvm.org/ce/z/e4o96b
Adding swapped variants is left as a TODO item here, but is planned as
a near-term follow-up patch.
Differential Revision: https://reviews.llvm.org/D113426
When printing a LiveInterval, tweak the use of single and double spaces
to try to make it clearer that the valnos are associated with the
preceding range or subrange, not the following subrange.
Compare the output before and then after this patch:
%1 [32r,144r:0) 0@32r L000000000000000C [32r,144r:0) 0@32r L00000000000000F3 [32r,32d:0) 0@32r weight:0.000000e+00
%1 [32r,144r:0) 0@32r L000000000000000C [32r,144r:0) 0@32r L00000000000000F3 [32r,32d:0) 0@32r weight:0.000000e+00
Differential Revision: https://reviews.llvm.org/D113671
In TwoAddressInstructionPass::processTiedPairs when updating live
intervals after moving the last use of RegB back to the newly inserted
copy, update any affected subranges as well as the main range.
Differential Revision: https://reviews.llvm.org/D110411
Enable FoldConstantArithmetic to constant fold bitcasted constant build vectors. These have typically been bitcasted for type legalization purposes.
By extracting the raw constant bit data, performing the constant fold, and then casting the constant bit data back to the (legalized) type, we can perform constant folding on integer types after legalization.
This in particular helps 32-bit targets which need to handle vXi64 build vectors - during legalization the (unsupported) i64 elements are split to create a bitcasted v2Xi32 build vector.
Addresses some regressions in D113192.
Differential Revision: https://reviews.llvm.org/D113564
At least I think that's what the 32 here is. Use RegisterBitWidth
instead.
While there replace zext with zextOrSelf to simplify the code.
Reviewed By: samparker, dmgreen
Differential Revision: https://reviews.llvm.org/D113495
The introduction of this legalization, D111248, forgot to replace the
old chain with the new. This could manifest itself in the old
(illegally-typed) value remaining in the DAG, though the simple test
cases didn't catch this.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D113561
NFC refactor of D113351, pulling out the APInt split/merge code from the BuildVectorSDNode bits extraction into a BuildVectorSDNode::recastRawBits helper. This is to allow us to reuse the code when we're packing constant folded APInt data back together.
This patch fixes a compiler crash when widening scalable-vector loads
and stores which end up breaking down to element-wise store operations.
It does so by providing a way for targets with support for
vector-predicated loads and stores to use those instead. By widening the
operation but maintaining the original effective operation length via
the EVL, only the intended vector elements are loaded or stored.
This method should in theory be possible and even preferred for
fixed-length vector types, but all fixed-length types can be broken down
into their elements, and regardless I have observed regressions in the
generated code when doing so. I believe this is simply due to
VP_LOAD/VP_STORE not being up to par with LOAD/STORE in terms of
optimization. It does improve performance on smaller self-contained
examples, however, so the potential is there.
While the only target that benefits from this is RISCV, the legalization
is generic and so was placed centrally.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D111248
The existing code didn't add all necessary successors, which resulted in
disjoint basic blocks. These would end up not being legalized which, in the
best case, caused a fallback only in assert builds.
Here's an example:
https://godbolt.org/z/ndx15Enfj
We also end up getting weird codegen here as well.
Refactoring the code here allows us to correctly attach all successors. With
this patch, the above example gives correct codegen at -O0 with and without
asserts.
Also autogen the testcase to show that we add all the successors now.
Differential Revision: https://reviews.llvm.org/D113437
Changes from commit 1db137b185
added iteration over hash map that can result in non-deterministic
order. Fix that by using a SmallMapVector to preserve the order.
Differential Revision: https://reviews.llvm.org/D113468
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:
* DIExpression can currently be parsed from IR or read from bitcode
as `distinct`, but this property is silently dropped when printing
to IR. This causes accepted IR to fail to round-trip. As DIExpression
appears inline at each use in the canonical form of IR, it cannot
actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
restricted to only appearing in contexts where there is no syntax for
`distinct`, but for consistency it is treated equivalently to
DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
along with adding general support for the inverse restriction I went
ahead and described this in Metadata.def and updated the parser to be
general. Future nodes which have this restriction can share this
support.
The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.
The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.
A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:
!named = !{!0}
!0 = !DIExpression()
Instead we would only accept the equivalent inlined version:
!named = !{!DIExpression()}
This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:
!named = !{!0}
; error: 'distinct' not allowed for !DIExpression()
!0 = distinct !DIExpression()
Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.
Reviewed By: StephenTozer, t-tye
Differential Revision: https://reviews.llvm.org/D104827
This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic.
Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding.
There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled.
Differential Revision: https://reviews.llvm.org/D113300
This fixes an assertion failure with -early-live-intervals when trying
to update the live intervals for a debug instruction, which don't even
have slot indexes.
Differential Revision: https://reviews.llvm.org/D113116
operand bundle "clang.arc.attachedcall" with ObjC runtime functions
The existing code only handles the case where the intrinsic being
rewritten is used as the called function pointer of a call/invoke.
When emitting a reloc for the Wasm global __stack_pointer, it was inadvertedly added to the symbols used for generating aranges, which caused some aranges to use it as the end symbol in a symbol diff, which caused a reloc for it to be emitted, which then caused an assert in `wasm64` since we have no 64-bit relocs for Wasm globals.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52376
Differential Revision: https://reviews.llvm.org/D113438
As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper.
I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious.
Differential Revision: https://reviews.llvm.org/D113396
We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation.
This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data.
Differential Revision: https://reviews.llvm.org/D113351
Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.
Differential Revision: https://reviews.llvm.org/D113191
This is the 'or' sibling for the fold added with:
D113212
https://alive2.llvm.org/ce/z/tgnp7K
Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.
We do not have the scalar version of *this* fold,
however, so that is another follow-up.
Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic.
We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code.
Differential Revision: https://reviews.llvm.org/D113276
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y
We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.
Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.
Differential Revision: https://reviews.llvm.org/D113212
NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode.
Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.
This symbol is defined in libc.so so it is definitely not DSO-Local.
Marking it as such causes problems on some platforms (such as PowerPC).
Differential revision: https://reviews.llvm.org/D109090
To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands.
The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support.
One of the yak shaving fixes for D113192....
Differential Revision: https://reviews.llvm.org/D113202
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.
Differential Revision: https://reviews.llvm.org/D113191
To correctly use Query, one had to first call collectInterferingVRegs to
pre-cache the query result, then call interferingVRegs. Failing the
former, interferingVRegs could be stale. This did cause a bug which was
addressed in D98232, but the underlying usability issue of the Query API
wasn't.
This patch addresses the latter by making collectInterferingVRegs an
implementation detail, and having interferingVRegs play both roles. One
side-effect of this is that interferingVRegs is not const anymore.
Differential Revision: https://reviews.llvm.org/D112882
These were added to prevent functions from being removed by WPO.
But that doesn't make sense, correct WPO will not remove functions we actually use.
I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112971
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.
Reviewed By: frasercrmck, craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D108129
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.
Also use the optimized funnel shift lowering for rotates.
Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept
(Branched from D108058 as getting this completed should help unlock some other WIP patches).
Original Patch: @efriedma (Eli Friedman)
Differential Revision: https://reviews.llvm.org/D112443
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
when Taildup hit loop with multiple latches like:
// 1 -> 2 <-> 3 |
// \ <-> 4 |
// \ <-> 5 |
// \---> rest |
it may transform this loop into multiple loops by duplicate loop header.
However, this change may has little benefit while makes cfg much complex.
In some uncommon cases, it causes large compile time regression (offered by
@alexfh in D106056).
This patch disable tail-duplicate of such cases.
TestPlan: check-llvm
Differential Revision: https://reviews.llvm.org/D110613
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.
There may be some more places where it could be used,
but these are the most obvious ones.
Register operands with superclasses can possibly have multiple regBanks
if they have different register types. The regBank ambiguity resolved
during regbankselect should be used to constrain the operand regclass
instead of obtaining one from the MCInstrDesc.
This is a prerequisite patch for D109300 that introduces allocatable AV_*
Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to
restrain the regclass to either A or V based on the incoming regbank.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112323
This patch removes an internal failure found in FindMemType and "bubbles
it up" to the users of that method: GenWidenVectorLoads and
GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns
an optional value, returning None if no such type is found.
Each of the aforementioned users now pre-calculates the list of types it
will use to widen the memory access. If the type breakdown is not
possible they will signal a failure, at which point the compiler will
crash as it does currently.
This patch is preparing the ground for alternative legalization
strategies for vector loads and stores, such as using vector-predication
versions of loads or stores.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112000
The TailDuplicator merged two blocks, even if the first one ended with
a terminator, resulting in invalid MIR, where a terminator is in the
middle of a block.
Abort merging if the first block ends with a terminator.
Differential Revision: https://reviews.llvm.org/D112226
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.
Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions
Reviewed By: logan
Differential Revision: https://reviews.llvm.org/D111703
In function convertInstTo3Addr, after converting a two address instruction into
three address instruction, only the last new instruction is inserted into
DistanceMap. This is wrong, DistanceMap should track all instructions from the
beginning of current MBB to the working instruction. When a two address
instruction is converted to three address instruction, multiple instructions may
be generated (usually an extra COPY is generated), all of them should be
inserted into DistanceMap.
Similarly when unfolding memory operand in function tryInstructionTransform
DistanceMap is not maintained correctly.
Differential Revision: https://reviews.llvm.org/D111857
This is a (very) small move towards making the machine dominators more
aligned with the IR dominators:
* DominatorTree / MachineDomTree is the class holding the dominator tree
* DominatorTreeWrapperPass / MachineDominatorTree is the corresponding
(machine) function pass
This alignment will be used by analyses that are designed as templates
that work with LLVM IR as well as Machine IR.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D112690
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.
Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions
Reviewed By: logan
Differential Revision: https://reviews.llvm.org/D111703
Save the instruction list of a block before selecting banks.
This allows to cope with moved instructions, even if they are reordered
or splitted into multiple basic blocks.
Differential Revision: https://reviews.llvm.org/D111223
- When an unconditional branch is expanded into an indirect branch, if
there is no scavenged register, an SGPR pair needs spilling to enable
the destination PC calculation. In addition, before jumping into the
destination, that clobbered SGPR pair need restoring.
- As SGPR cannot be spilled to or restored from memory directly, the
spilling/restoring of that SGPR pair reuses the regular SGPR spilling
support but without spilling it into memory. As that spilling and
restoring points are fully controlled, we only need to spill that SGPR
into the temporary VGPR, which needs spilling into its emergency slot.
- The target-specific hook is revised to take additional restore block,
where the restoring code is filled. After that, the relaxation will
place that restore block directly before the destination block and
insert an unconditional branch in any fall-through block into the
destination block.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D106449
PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which
results in a sign-extended load. If the type returned by getExtensionType()
for the load being promoted is something other than `NON_EXTLOAD`, we
should instead pass this to getMaskedLoad() as the extension type.
Reviewed By: CarolineConcatto
Differential Revision: https://reviews.llvm.org/D112320
Widens the result and first input vector because they have the same size.
The subvector to be inserted is widened in the operand widen function.
Differential Revision: https://reviews.llvm.org/D112187
Add an optional bool RemoveDeadValNo argument to the
removeSegment(iterator) overload, for consistency with the other
overloads. This gives clients a way to remove dead valnos while also
getting an updated iterator returned (in the manner of vector::erase).
Use this to clean up some inefficient code in
LiveIntervals::repairOldRegInRange. NFC.
Differential Revision: https://reviews.llvm.org/D110560
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.
Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions
Reviewed By: logan
Differential Revision: https://reviews.llvm.org/D111703
getShiftAmountTyForConstant is a special helper that changes the
shift amount to i32 if the type chosen by
TargetLowering::getShiftAmountTy can't represent all possible values.
This is needed to satisfy an assert in SelectionDAG::getNode.
It requires additional consideration to know when this helper should be used.
I'm not sure that we are always using it when we should.
This patch merges the getShiftAmountTyForConstant handling into
TargetLowering::getShiftAmountTy so we don't need to think about it
anymore.
Technically this may slightly increase compile times since the majority
of callers of getShiftAmountTy won't need this. Hopefully, this isn't
an issue in practice.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112469
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.
While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.
Differential Revision: https://reviews.llvm.org/D112333
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.
The test added runs placeMLocPHIs directly, and tests that:
* A def of the lower 8 bits of a stack slot causes all aliasing regs to
have PHIs placed,
* It doesn't cause the equivalent location to x86's $ah, which isn't
aliased, to have a PHI placed.
Differential Revision: https://reviews.llvm.org/D112324
During register allocation, some instructions can have stack spills fused
into them. It means that when vregs are allocated on the stack we can
convert:
SETCCr %0
DBG_VALUE %0
to
SETCCm %stack.0
DBG_VALUE %stack.0
Unfortunately instruction referencing finds this harder: a store to the
stack doesn't have a specific operand number, therefore we don't substitute
the old operand for a new operand, and the location is dropped. This patch
implements a solution: just recognise the memory operand attached to an
instruction with a Special Number (TM), and record a substitution between
the old value and the new one.
This patch adds substitution code to InlineSpiller to record such fused
spills, and tracking in InstrRefBasedLDV to recognise such values, and
produce the value numbers for them. Everything to do with the movement of
stack-defined values is already handled in InstrRefBasedLDV.
Differential Revision: https://reviews.llvm.org/D111317
This patch swaps two lines -- the CurSucc reference can be invalidated
by the call to DFS.push_back, therefore that should happen last. The
usual hat-tip to asan for catching this.
This patch also swaps an ealier call to ToAdd.insert and DFS.push_back,
where a stable iterator (from successors()) is being used. This isn't
strictly necessary, but is good for consistency and avoiding readers
asking themselves why the two code portions have a different order.
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128
As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC
We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.
Differential Revision: https://reviews.llvm.org/D112377
This removes a condition and the corresponding FIXME comment, because
the Hexagon assertion it refers to has apparently been fixed, probably
by D76134.
NFCI. This just gives targets the opportunity to adjust latencies that
were set to 0 by the generic code because they involve "implicit pseudo"
operands.
Differential Revision: https://reviews.llvm.org/D112306
Sometimes we generate code that writes to a subregister, then spills /
restores a super-register to the stack, for example:
$eax = MOV32ri 0
MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
$rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg
This patch takes a different approach: it adds another index to
MLocTracker that identifies a size/offset within a stack slot. A location
on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and
restoring now involves pairing up the src/dest register numbers, and the
dest/src stack position to be transferred to/from. Location coverage
improves as a result, compile-time performance decreases, alas.
One limitation is that if a PHI occurs inside a stack slot:
DBG_PHI %stack.0, 1
We don't know how large the resulting value is, and so might have
difficulty picking which value to use. DBG_PHI might need to be augmented
in the future with such a size.
Unit tests added ensure that spills and restores correctly transfer to
positions in the Location => Value map, and that different register classes
written to the stack will correctly clobber all other positions in the
stack slot.
Differential Revision: https://reviews.llvm.org/D112133
We might be promoting a large non-power of 2 type and the new type
may need to be split. Once we split it we may have a ctlz/cttz/ctpop
instruction for the split type.
I'm also concerned that we may create large shifts with shift amounts
that are too small.
EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.
This patch adds some unit tests for the machine-location transfer-function
building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR
into the transfer-function building code, does it create the correct
transfer function.
There are a number of minor defects that get corrected in the process:
* The unit test was selecting the x86 (i.e. 32 bit) backend rather than
x86_64's 64 bit backend,
* COPY instructions weren't actually having their subregister values
correctly represented in the transfer function. Subregisters were being
defined by the COPY, rather than taking the value in the source register.
* SP aliases were at risk of being clobbered, if an SP subregister was
clobbered.
Differential Revision: https://reviews.llvm.org/D112006
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112331
Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.
This is similar to what has already been done to BSWAP and BITREVERSE.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112268
There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.
I have another patch to add more callers of these so I thought
I'd clean up the interface first.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112267
By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.
v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.
This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112254
It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112248
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128
I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ
Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!
The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).
On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.
Differential Revision: https://reviews.llvm.org/D112085
When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D111996
With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.
Fixes PR51827.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D111596
MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.
That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.
Differential Revision: https://reviews.llvm.org/D107859
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html
The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.
This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.
I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D108872
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112141
Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.
This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.
This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.
For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.
Differential Revision: https://reviews.llvm.org/D111919
Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
* It's easier to reason about one variable at a time in your mind,
* It improves performance, apparently from increased locality.
The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.
Differential Revision: https://reviews.llvm.org/D111799
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.
For some insert:
nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)
The offset was not scaled by vscale:
orr x8, x8, #0x4
st1h { z0.h }, p0, [sp]
st1h { z1.d }, p1, [x8]
ld1h { z0.h }, p0/z, [sp]
And is changed to:
mov x8, sp
st1h { z0.h }, p0, [sp]
st1h { z1.d }, p1, [x8, #1, mul vl]
ld1h { z0.h }, p0/z, [sp]
Differential Revision: https://reviews.llvm.org/D111633
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....
Differential Revision: https://reviews.llvm.org/D111998
This is purely a performance patch: InstrRefBasedLDV used to use three
DenseMaps to store variable values, two for long term storage and one as a
working set. This patch eliminates the working set, and updates the long
term storage in place, thus avoiding two DenseMap comparisons and two
DenseMap assignments, which can be expensive.
Differential Revision: https://reviews.llvm.org/D111716
This field gets assigned when the relevant object starts being used; but it
remains uninitialized beforehand. This risks introducing hard-to-detect
bugs if something changes, so zero-initialize the field.
When compiling for the RWPI relocation model the debug information is wrong:
* the debug location is described as { DW_OP_addr Var }
instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32
Differential Revision: https://reviews.llvm.org/D111404
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )
Differential Revision: https://reviews.llvm.org/D111891
gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.
This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.
Differential Revision: https://reviews.llvm.org/D111397
The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.
However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).
This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.
Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D111885
We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05
After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D111688
After D80369, the retainedTypes in CU's should not have any subprograms
so we should not handle that case when emitting debug info.
Differential Revision: https://reviews.llvm.org/D111593
This patch is very similar to D110173 / a3936a6c19, but for variable
values rather than machine values. This is for the second instr-ref
problem, calculating the correct variable value on entry to each block.
The previous lattice based implementation was broken; we now use LLVMs
existing PHI placement utilities to work out where values need to merge,
then eliminate un-necessary ones through value propagation.
Most of the deletions here happen in vlocJoin: it was trying to pick a
location for PHIs to happen in, badly, leading to an infinite loop in the
MIR test added, where it would repeatedly switch between register
locations. The new approach is simpler: either PHIs can be eliminated, or
they can't, and the location of the value is a different problem.
Various bits and pieces move to the header so that they can be tested in
the unit tests. The DbgValue class grows a "VPHI" kind to represent
variable value PHIS that haven't been eliminated yet.
Differential Revision: https://reviews.llvm.org/D110630
Some functions get opted out of instruction referencing if they're being
compiled with no optimisations, however the LiveDebugValues pass picks one
implementation and then sticks with it through the rest of compilation.
This leads to a segfault if we encounter a function that doesn't use
instr-ref (because it's optnone, for example), but we've already decided
to use InstrRefBasedLDV which expects to be passed a DomTree.
Solution: keep both implementations around in the pass, and pick whichever
one is appropriate to the current function.
In D110173 we start using the existing LLVM IDF calculator to place PHIs as
we reconstruct an SSA form of machine-code program. Sadly that's slower
than the old (but broken) way, this patch attempts to recover some of that
performance.
The key observation: every time we def a register, we also have to def it's
register units. If we def'd $rax, in the current implementation we
independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and
they will all have the same PHI positions. Instead of doing that, we can
calculate the PHI positions for {al, ah} and place PHIs for any aliasing
registers in the same positions. Any def of a super-register has to def
the unit, and vice versa, so this is sound. It cuts down the SSA placement
we need to do significantly.
This doesn't work for stack slots, or registers we only ever read, so place
PHIs normally for those. LiveDebugValues choses to ignore writes to SP at
calls, and now have to ignore writes to SP register units too.
Differential Revision: https://reviews.llvm.org/D111627
Old versions of gcc want template specialisations to happen within the
namespace where the template lives; this is still present in gcc 5.1, which
we officially support, so it has to be worked around.
InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.
This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.
Differential Revision: https://reviews.llvm.org/D110173
This patch adds patterns to match the following with INC/DEC:
- @llvm.aarch64.sve.cnt[b|h|w|d] intrinsics + ADD/SUB
- vscale + ADD/SUB
For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and
so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE
is off by default. There are no known issues with SVE2, so this feature is
enabled by default when targeting SVE2.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D111441
This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.
1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
which points to the address of per-function LSDA info. It is more
convenient to use than `MCSymbol` because it can take additional
target flags.
2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
symbol relative to `__memory_base` and generate the `add` node. If
PIC is disabled, continue to use the absolute address.
3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
the backend, because it is hard to make it work with dynamic
linking's loading order. Instead, we make all tag symbols undefined
in the LLVM backend and import it from JS.
4. Add support for undefined tags to the linker.
Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D111388
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.
Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110847
These "dump" methods call into MachineOperand::dump, which doesn't exist
with NDEBUG, thus we croak. Disable LiveDebugValues dump methods when
NDEBUG is turned on to avoid this.
With the -early-live-intervals command line flag,
TwoAddressInstructionPass::runOnMachineFunction would call
MachineFunction::verify before returning to check the live intervals.
But there was not much benefit to doing this since -verify-machineinstrs
and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
scheduling machine verification after every pass.
Also it caused problems on targets like Lanai which are marked as "not
machine verifier clean", since verification would fail for known
target-specific problems which are nothing to do with LiveIntervals.
Differential Revision: https://reviews.llvm.org/D111618
This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.
Differential Revision: https://reviews.llvm.org/D110165
Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch mostly updated the Bitcode and
DebugInfo test cases with new attribute name.
Differential Revision: https://reviews.llvm.org/D111591
This patch contains following enhancements to SrcRegMap and DstRegMap:
1 In findOnlyInterestingUse not only check if the Reg is two address usage,
but also check after commutation can it be two address usage.
2 If a physical register is clobbered, remove SrcRegMap entries that are
mapped to it.
3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
entry only when the COPY instruction is coalescable. (The COPY src is
killed)
With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.
Differential Revision: https://reviews.llvm.org/D108731
PHI elimination updates LiveVariables info as described here:
// We only need to update the LiveVariables kill of SrcReg if this was the
// last PHI use of SrcReg to be lowered on this CFG edge and it is not live
// out of the predecessor. We can also ignore undef sources.
Unfortunately if the last use also happened to be an undef use then it
would fail to update the LiveVariables at all. Fix this by not counting
undef uses in the VRegPHIUse map.
Thanks to Mikael Holmén for the test case!
Differential Revision: https://reviews.llvm.org/D111552