Commit Graph

152882 Commits

Author SHA1 Message Date
Nikita Popov bfa91f38a9 [DAG] Restore dropped condition
This was dropped in fcee33bd5a,
presumably accidentally.
2021-11-26 21:18:54 +01:00
Nikita Popov bee8dcda1f [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
Relative to the previous landing attempt, this makes
insertValueToMap() resilient against the value already being
present in the map -- previously I only checked this for the
createSimpleAffineAddRec() case, but the same issue can also
occur for the general createNodeForPHI(). In both cases, the
addrec may be constructed and added to the map in a recursive
query trying to create said addrec. In this case, this happens
due to the invalidation when the BE count is computed, which
ends up clearing out the symbolic name as well.

-----

This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-26 20:57:47 +01:00
Simon Pilgrim fcee33bd5a [DAG] Pull out repeated isLittleEndian() calls. NFC. 2021-11-26 18:41:56 +00:00
Kazu Hirata 562356d6e3 [Target] Use range-based for loops (NFC) 2021-11-26 08:23:01 -08:00
Alexey Bataev fc0aacf324 [SLP]Improve analysis/emission of vector operands for alternate nodes.
Compiler has an analysis for perfect diamond matching but it does not
support nodes with main/alternate opcodes. The problem is that the
scalars themselves are different and might not match directly with other
nodes, but operands and main/alternate opcodes might match and compiler
might reuse some previously emitted vector instructions. Need to include
this analysis in the cost model and actual vector instructions emission
process.

Differential Revision: https://reviews.llvm.org/D114101
2021-11-26 06:38:02 -08:00
Florian Hahn b927aa69bf
[SCEV] Turn check in createSimpleAffineAddRec to assertion. (NFC)
Accum is guaranteed to be defined outside L (via Loop::isLoopInvariant
checks above). I think that should guarantee that the more powerful
ScalarEvolution::isLoopInvariant also determines that the value is loop
invariant.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114634
2021-11-26 13:23:48 +00:00
Igor Kirillov 08d45e6f4d [AArch64][SVEIntrinsicOpts] Fix: predicated SVE mul/fmul are not commutative
We can not swap multiplicand and multiplier because the sve intrinsics
are predicated. Imagine lanes in vectors having the following values:
         pg = 0
         multiplicand = 1 (from dup)
         multiplier = 2
The resulting value should be 1, but if we swap multiplicand and multiplier it will become 2,
which is incorrect.

Differential Revision: https://reviews.llvm.org/D114577
2021-11-26 12:41:27 +00:00
Carl Ritson 8967d044fc [AMDGPU] Add SIMemoryLegalizer comments to clarify bit usage
Attempt to further document the intended cache policies requested
by different combinations of GLC, SLC and DLC bits.
GFX10 non-temporal stores are updated to set GLC.

Reviewed By: t-tye

Differential Revision: https://reviews.llvm.org/D114351
2021-11-26 21:05:58 +09:00
Abinav Puthan Purayil 4af45f10cc [GlobalISel] Fold or of shifts to funnel shift.
This change folds a basic funnel shift idiom:
- (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt)
- (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt)

This also helps in folding to rotate shift if x and y are equal since we
already have a funnel shift to rotate combine.

Differential Revision: https://reviews.llvm.org/D114499
2021-11-26 17:05:29 +05:30
David Sherwood e20391fc5d [LoopVectorize] When tail-folding, don't always predicate uniform loads
In VPRecipeBuilder::handleReplication if we believe the instruction
is predicated we then proceed to create new VP region blocks even
when the load is uniform and only predicated due to tail-folding.

I have updated isPredicatedInst to avoid treating a uniform load as
predicated when tail-folding, which means we can do a single scalar
load and a vector splat of the value.

Tests added here:

  Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

Differential Revision: https://reviews.llvm.org/D112552
2021-11-26 11:30:54 +00:00
Bradley Smith eafbaca977 [AArch64][SVE] Generate ASRD instructions for power of 2 signed divides
Differential Revision: https://reviews.llvm.org/D113281
2021-11-26 11:08:27 +00:00
David Green c76d6dd192 [ARM] Generate VCTP from SETCC
This converts a vector SETCC([0,1,2,..], splat(n), ult) to vctp n, which
can be fewer instructions and prevent the need for constant pool loads.

Differential Revision: https://reviews.llvm.org/D114177
2021-11-26 10:57:14 +00:00
Simon Pilgrim 2778f9a9f6 [DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use
If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.
2021-11-26 10:32:10 +00:00
David Sherwood 86137fb722 [CodeGen] Add scalable vector support for lowering of llvm.get.active.lane.mask
Currently the generic lowering of llvm.get.active.lane.mask is done
in SelectionDAGBuilder::visitIntrinsicCall and currently assumes
only fixed-width vectors are used. This patch changes the code to be
more generic and support scalable vectors too. I have added tests
for SVE here:

  CodeGen/AArch64/active_lane_mask.ll

although the code quality leaves a lot to be desired. The code will
be improved significantly in a later patch that makes use of the
SVE whilelo instruction.

Differential Revision: https://reviews.llvm.org/D114541
2021-11-26 08:17:55 +00:00
Kazu Hirata 259cd6f893 [llvm] Use range-based for loops (NFC) 2021-11-25 22:17:10 -08:00
Christudasan Devadasan 654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Jeremy Morse 536b9eb31e [DebugInfo][InstrRef] Add extra indirection for NRVO tests
In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs
after SelectionDAG, even in instruction referencing mode (if the variable
is an argument). If the corresponding argument value is spilt to the stack,
then we have:
 * Indirection from it being on the stack,
 * Indirection from it being a dbg.declare or a dbg.addr.

However InstrRefBasedLDV only emits one level of indirection. This patch
adds the second, by adding an extra DW_OP_deref if necessary. The two
tests modified fail otherwise -- they feature some NRVO, and require two
levels of indirection to be correct.

Differential Revision: https://reviews.llvm.org/D114364
2021-11-25 21:43:38 +00:00
Quinn Pham b11c66accf [NFC] Inclusive language: rename master flag to main flag
[NFC] As part of using inclusive language within the llvm project, this patch
renames master flag to main flag in these comments.

Reviewed By: ZarkoCA

Differential Revision: https://reviews.llvm.org/D114090
2021-11-25 15:16:11 -06:00
Jeremy Morse 3107081e94 [DebugInfo][InstrRef] Avoid some quadratic behaviour in LiveDebugVariables
This is a performance patch -- LiveDebugVariables can behave quadratically
if a lot of debug instructions are inserted back into the same place, and
we have to repeatedly step-over hte ones we've already inserted.

To get around it, whenever we insert a debug instruction at a slot index,
check whether there are more debug instructions to insert at this point,
and insert them too. That avoids the repeated lookup and stepping through.
It relies on the container for unlinked debug instructions being recorded
in-order, which is how LiveDebugVariables currently does it.

Differential Revision: https://reviews.llvm.org/D114587
2021-11-25 20:31:00 +00:00
Florian Hahn 8cb1af73c6
Recommit [ThreadPool] Support returning futures with results.
This reverts commit 71a7c55f0f.

The revert broken building llvm-reduce and it is not clear it fixes an
issue with LLVM_ENABLE_THREADS=OFF.

See discussion in https://reviews.llvm.org/D114183 for more details.
2021-11-25 20:07:53 +00:00
Amy Kwan 150681f2f3 [PowerPC] Prevent the optimizer from producing wide vector types in IR.
This patch prevents the optimizer from producing wide vectors in the IR,
specifically the MMA types (v256i1, v512i1). The idea is that on Power, we only
want to be producing these types only if the vector_pair and vector_quad types
are used specifically.

To prevent the optimizer from producing these types in the IR,
vectorCostAdjustmentFactor() is updated to return an instruction cost factor or
an invalid instruction cost if the current type is that of an MMA type. An
invalid instruction cost returned by this function signifies to other cost
computing functions to return the maximum instruction cost to inform the
optimizer that producing these types within the IR is expensive, and should not
be produced in the first place.

This issue was first seen in the test case included within this patch.

Differential Revision: https://reviews.llvm.org/D113900
2021-11-25 12:35:26 -06:00
Daniel McIntosh 71a7c55f0f Revert "[ThreadPool] Support returning futures with results."
This reverts commit 6149e57dc1.

The offending commit broke building with LLVM_ENABLE_THREADS=OFF.
2021-11-25 12:19:35 -05:00
Kazu Hirata bfd5dd1568 [llvm] Use range-based for loops (NFC) 2021-11-25 08:55:16 -08:00
David Green fbb61adb70 [ARM] Convert fptoi.sat to fixed point multiply
This is a very small addition to the existing MVE fixed point vcvt code
to also create them from FP_TO_SINT_SAT and FP_TO_UINT_SAT nodes, which
should be equally valid for native saturating converts under MVE.

Differential Revision: https://reviews.llvm.org/D114360
2021-11-25 15:43:45 +00:00
Jeremy Morse 102d2a8a99 [DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks
DBG_INSTR_REF's and  DBG_VALUE's can end up in blocks that aren't in the
lexical scope of their variable. It's arguable as to what we should do
about this, however VarLocBasedLDV permits such variable locations to be
propagated, so let's allow it in InstrRefBasedLDV.

It's necessary for the modified test to work.

Differential Revision: https://reviews.llvm.org/D114578
2021-11-25 14:52:11 +00:00
Alexey Bataev 4675a1654c Revert "[SLP]Improve analysis/emission of vector operands for alternate nodes."
This reverts commit 496254cf80 to fix
compiler crashes reported in D114101#3152982.
2021-11-25 05:19:49 -08:00
Simon Pilgrim 63b1e58f07 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Reapplied with fix for AArch64 rev patterns to matching the ARM fix.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-25 11:14:15 +00:00
David Green 3a700cabdc [SDAG] Allow Unknown sizes when refining MMO alignments. NFC
The changes in D113888 / 32b6c17b29 altered the memory size of a
masked store, as it will store an unknown number of bytes not the full
vector size. We can have situations where the masked stores is legalized
and then turned to a normal store, as the mask is known to be all ones.
This creates a store with an unknown size MMO that was hitting this
assert.

The store created can be given a better size in a followup patch. This
currently adjusts the assert to handle unknown sizes.
2021-11-25 10:19:29 +00:00
Jameson Nash 0332d105b9 GlobalISel: remove assert that memcpy Src and Dst addrspace must be identical
The LangRef does not require these arguments to have the same type.

Differential Revision: https://reviews.llvm.org/D93154
2021-11-24 20:23:05 -05:00
Zarko Todorovski 95875d246a [LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm
Part of work to use more inclusive language in clang/llvm. Rewording
some comments and change function and variable names.
2021-11-24 17:29:55 -05:00
David Blaikie cd93ab8947 DWARFVerifier: Don't parse all units twice
Introduced/discussed in https://reviews.llvm.org/D38719

The header validation logic was also explicitly building the DWARFUnits
to validate. But then other calls, like "Units.getUnitForOffset" creates
the DWARFUnits again in the DWARFContext proper - so, let's avoid
creating the DWARFUnits twice by walking the DWARFContext's units rather
than building a new list explicitly.

This does reduce some verifier power - it means that any unit with a
header parsing failure won't get further validation, whereas the
verifier-created units were getting some further validation despite
invalid headers. I don't think this is a great loss/seems "right" in
some ways to me that if the header's invalid we should stop there.

Exposing the raw DWARFUnitVectors from DWARFContext feels a bit
sub-optimal, but gave simple access to the getUnitForOffset to keep the
rest of the code fairly similar.
2021-11-24 14:03:56 -08:00
Artur Pilipenko aa60d169ea [CVP] Add a cl::opt for canonicalization of signed relational comparisons
This canonicalization breaks the ability to discard checks in some cases.
Add a command line option to disable it. This option is on by default,
so the change is NFC.

See for details:
https://reviews.llvm.org/D112895#3149487
2021-11-24 13:52:38 -08:00
Alexey Bataev 496254cf80 [SLP]Improve analysis/emission of vector operands for alternate nodes.
Compiler has an analysis for perfect diamond matching but it does not
support nodes with main/alternate opcodes. The problem is that the
scalars themselves are different and might not match directly with other
nodes, but operands and main/alternate opcodes might match and compiler
might reuse some previously emitted vector instructions. Need to include
this analysis in the cost model and actual vector instructions emission
process.

Differential Revision: https://reviews.llvm.org/D114101
2021-11-24 12:55:24 -08:00
Florian Hahn 2897b67665
[LV] Use OrigLoop instead of induction to get function. (NFC)
Upcoming changes will result in Induction not being set/used in some
cases. Use OrigLoop to get the function instead.
2021-11-24 20:17:44 +00:00
Jeremy Morse bfadc5dcbf [DebugInfo][InstrRef] Cope with win32 calls changing SP in LiveDebugValues
Almost all of the time, call instructions don't actually lead to SP being
different after they return. An exception is win32's _chkstk, which which
implements stack probes. We need to recognise that as modifying SP, so
that copies of the value are tracked as distinct vla pointers.

This patch adds a target frame-lowering hook to see whether stack probe
functions will modify the stack pointer, store that in an internal flag,
and if it's true then scan CALL instructions to see whether they're a
stack probe. If they are, recognise them as defining a new stack-pointer
value.

The added test exercises this behaviour: two calls to _chkstk should be
considered as producing two different values.

Differential Revision: https://reviews.llvm.org/D114443
2021-11-24 19:56:21 +00:00
Stanislav Mekhanoshin 9300b133c8 Revert "[InstCombine] (~(a | b) & c) | ~(c | (a ^ b)) -> ~((a | b) & (c | (b ^ a)))"
This reverts commit c407769f5e.
2021-11-24 11:14:52 -08:00
Jeremy Morse 133e25f946 [DebugInfo][InstrRef] Ignore SP clobbers on call instructions even more
Avoid un-necessarily recreating DBG_VALUEs on call instructions.

In LiveDebugvalues we choose to ignore any clobbers of SP by call
instructions, as they're irrelevant to our model of the machine. We
currently do so for tracking register values (MTracker); do the same for
tracking variable locations (TTracker).

Test modified to endure that a duplicate DBG_VALUE is not created after the
call in struction in this test.

Differential Revision: https://reviews.llvm.org/D114365
2021-11-24 17:25:48 +00:00
Peter Waller 787b66eb5f [LoopAccessAnalysis][SVE] Bail out for scalable vectors
The supplied test case, reduced from real world code, crashes with a
'Invalid size request on a scalable vector.' error.

Since it's similar in spirit to an existing LAA test, rename the file to
generalize it to both.

Differential Revision: https://reviews.llvm.org/D114155
2021-11-24 15:52:20 +00:00
Roman Lebedev cd8d219536
[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 32 bit when have AVX512DQ
I believe, this effectively completes `X86TTIImpl::getReplicationShuffleCost()`
for AVX512, other than the question of handling plain AVX512F,
where we end up with some really ugly "shuffles",
but then is there any CPU's that support AVX512, but not AVX512DQ/AVX512BW?

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114315
2021-11-24 17:23:15 +03:00
Florian Hahn 8b86752c60
[VPlan] Remove unused VPInstruction constructor. (NFC)
VPInstruction inherits from VPValue, so the constructor taking
ArrayRef<VPValue*> covers all cases that would be covered by the removed
constructor.
2021-11-24 14:06:50 +00:00
Bradley Smith 080ef0b6a6 [AArch64][SVE] Recognize all ones mask during fixed mask generation
Differential Revision: https://reviews.llvm.org/D114431
2021-11-24 13:55:06 +00:00
Benjamin Kramer d32787230d Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl"
This reverts commit 3cf4a2c620.

It makes llc hang on the following test case.
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 {
entry:
  br label %while.body117.i

while.body117.i:                                  ; preds = %cleanup149.i, %entry
  %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ]
  %0 = load i16, i16* undef, align 2
  %1 = icmp eq i16 undef, -10240
  br i1 %1, label %fail.i, label %cleanup149.i

cleanup149.i:                                     ; preds = %while.body117.i
  %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2
  store i16 %or130.i, i16* %out.6269.i, align 2
  br label %while.body117.i

fail.i:                                           ; preds = %while.body117.i
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #1

attributes #0 = { "target-features"="+neon,+v8a" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
```
2021-11-24 14:42:54 +01:00
Sanjay Patel b326c05814 [InstSimplify] fold xor logic of 2 variables, part 2
(~a & b) ^ (a | b) --> a

This is the swapped and/or (Demorgan?) sibling fold for
the fold added with D114462 ( 892648b18a ).

This case is easier to specify because we are returning
a root value, not a 'not':
https://alive2.llvm.org/ce/z/SRzj4f
2021-11-24 08:15:47 -05:00
Nemanja Ivanovic b7bf937bbe [PowerPC] Provide XL-compatible vec_round implementation
The XL implementation of vec_round for vector double uses
"round-to-nearest, ties to even" just as the vector float
`version does. However clang and gcc use "round-to-nearest-away"
for vector double and "round-to-nearest, ties to even"
for vector float.

The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__
macro similarly to other instances of incompatibility.

Differential revision: https://reviews.llvm.org/D113642
2021-11-24 06:43:56 -06:00
Simon Pilgrim 3cf4a2c620 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-24 11:28:35 +00:00
Jay Foad d7e03df719 [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32
Select SelectionDAG ops smul_lohi/umul_lohi to
v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0.
v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions
so it is better to use one instruction than two.

Further improvements are possible to make better use of the addend
operand, but this is already a strict improvement over what we have
now.

Differential Revision: https://reviews.llvm.org/D113986
2021-11-24 11:25:02 +00:00
Jay Foad 8a52bd82e3 [AMDGPU] Only select VOP3 forms of VOP2 instructions
Change VOP_PAT_GEN to default to not generating an instruction selection
pattern for the VOP2 (e32) form of an instruction, only for the VOP3
(e64) form. This allows SIFoldOperands maximum freedom to fold copies
into the operands of an instruction, before SIShrinkInstructions tries
to shrink it back to the smaller encoding.

This affects the following VOP2 instructions:
v_min_i32
v_max_i32
v_min_u32
v_max_u32
v_and_b32
v_or_b32
v_xor_b32
v_lshr_b32
v_ashr_i32
v_lshl_b32

A further cleanup could simplify or remove VOP_PAT_GEN, since its
optional second argument is never used.

Differential Revision: https://reviews.llvm.org/D114252
2021-11-24 11:15:30 +00:00
Carl Ritson 976f3b3c9e [AMDGPU] Only allow implicit WQM in pixel shaders
Implicit derivatives are only valid in pixel shaders,
hence only implicitly enable WQM for pixel shaders.
This avoids unintended WQM in other shader types (e.g. compute)
when image sampling instructions are used.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114414
2021-11-24 20:04:42 +09:00
David Green 581f837355 [ARM] Fold (fadd x, (vselect c, y, -1.0)) into (vselect c, (fadd x, y), x)
This is similar to D113574, but as a DAG combine, not tablegen patterns.
Doing the fold as a DAG combine allows the fadd to be folded with a
fmul, finally producing a predicated vfma. It performs the same fold of
fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as
the identity value of a fadd.

Differential Revision: https://reviews.llvm.org/D113584
2021-11-24 10:41:00 +00:00
David Sherwood cf40ca026f [NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc
In quite a few places we were calling getCurSDLoc() to get the debug
location, but this is already a local variable `sdl`.

Differential Revision: https://reviews.llvm.org/D114447
2021-11-24 10:35:49 +00:00
Jeremy Morse b8f68ad9cd [DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag
It appears that we can emit all the instructions for a function, including
debug instructions, and then optimise some of the values out late.
Specifically, in the attached test case, an argument gets optimised out
after DBG_VALUE / DBG_INSTR_REFs are created. This confuses
MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a
defining instruction, and crashes instead.

Fix this by identifying when there's no defining instruction, and
translating that instead into a DBG_VALUE $noreg.

Differential Revision: https://reviews.llvm.org/D114476
2021-11-24 10:34:48 +00:00
David Green d9af9c2c5a [ARM] Fold floating point select(binop) patterns
Similar to D84091 which added extra predicated folds for integer operations
using the identity element of the operation, this adds them for floating
point operations for the form `BinOp(x, select(p, y, Identity))`. They are
folded back to predicated versions of the operator, with fadd having the
identity -0.0, fsub using the identity 0.0 and fmul using 1.0.

Differential Revision: https://reviews.llvm.org/D113574
2021-11-24 10:22:20 +00:00
Rosie Sumpter df32a39dd0 [LoopVectorize][CostModel] Update cost model for fmuladd intrinsic
This patch updates the cost model for ordered reductions so that a call
to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction
plus the cost of an ordered fadd reduction.

Differential Revision: https://reviews.llvm.org/D111630
2021-11-24 08:50:05 +00:00
Rosie Sumpter 2d33327f9d [LoopVectorize] Print fast-math flags for VPReductionRecipe 2021-11-24 08:50:05 +00:00
Rosie Sumpter 991074012a [LoopVectorize] Propagate fast-math flags for VPInstruction
In-loop vector reductions which use the llvm.fmuladd intrinsic involve
the creation of two recipes; a VPReductionRecipe for the fadd and a
VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags
these should be propagated through to the fmul instruction, so an
interface setFastMathFlags has been added to the VPInstruction class to
enable this.

Differential Revision: https://reviews.llvm.org/D113125
2021-11-24 08:50:04 +00:00
Rosie Sumpter c2441b6b89 [LoopVectorize] Add vector reduction support for fmuladd intrinsic
Enables LoopVectorize to handle reduction patterns involving the
llvm.fmuladd intrinsic.

Differential Revision: https://reviews.llvm.org/D111555
2021-11-24 08:50:04 +00:00
Abinav Puthan Purayil 078da26b1c [AMDGPU] Check for unneeded shift mask in shift PatFrags.
The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.

Differential Revision: https://reviews.llvm.org/D113448
2021-11-24 10:53:12 +05:30
Jun Ma 17eb6b61de Revert "[Taildup] Don't tail-duplicate loop header with multiple successors as its latches"
This reverts commit 1f9fa54984.
2021-11-24 10:26:37 +08:00
Jun Ma 07333810ca Revert "Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""""
This reverts commit c93f93b2e3.
2021-11-24 10:26:37 +08:00
Stanislav Mekhanoshin 661a232e34 [AMDGPU] Remove a no-op check in the gfx90a hazard recognizer
Also rename helper function accordingly.

Differential Revision: https://reviews.llvm.org/D114289
2021-11-23 15:35:19 -08:00
Florian Mayer 6c06d8e310 [stack-safety] Check SCEV constraints at memory instructions.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113160
2021-11-23 15:29:23 -08:00
Nemanja Ivanovic c9cb8edc51 [PowerPC] Allow scalars for asm constraint "v" with VSX
Similarly to what GCC does, we should allow scalars with
the "v" constraint rather than introducing unnecessary
new constraints for scalars in Altivec registers.

Differential revision: https://reviews.llvm.org/D113635
2021-11-23 17:03:04 -06:00
Matt Arsenault 273a0c8bc9 PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.

I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
2021-11-23 18:01:12 -05:00
Florian Hahn 73a05cc8df
[LAA] Move visitPointers up in file (NFC).
This allows easier re-use in earlier functions.
2021-11-23 22:47:26 +00:00
Sanjay Patel 892648b18a [InstSimplify] fold xor logic of 2 variables
(a & b) ^ (~a | b) --> ~a

I was looking for a shortcut to reduce some of the complex logic
folds that are currently up for review (D113216
and others in that stack), and I found this missing from
instcombine/instsimplify.

There is a trade-off in putting it into instsimplify: because
we can't create new values here, we need a strict 'not' op (no
undef elements). Otherwise, the fold is not valid:
https://alive2.llvm.org/ce/z/k_AGGj

If this was in instcombine instead, we could create the proper
'not'. But having the fold here benefits other passes like GVN
that use instsimplify as an analysis.

There is a related fold where 'and' and 'or' are swapped, and
that is planned as a follow-up commit.

Differential Revision: https://reviews.llvm.org/D114462
2021-11-23 16:50:23 -05:00
Rong Xu bf1138491a [SampleFDO] Recompute BFI if the sample loader changes BPI
The MIR sample loader changes the branch probability but not BFI.
Here we force a recompute of BFI if the branch probabilities are
changed.

Also register the MIR FSAFDO passes properly.

Differential Revision: https://reviews.llvm.org/D114400
2021-11-23 13:24:31 -08:00
Mehdi Amini 1392b654ff Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)"
This reverts commit 884b6dd311.
The windows build is broken with a linker error.
2021-11-23 20:10:36 +00:00
Quinn Pham 1345bc5e16 [NFC][llvm] Inclusive language: remove instance of master in LiveRangeUtils.h
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with primary in `LiveRangeUtils.h`.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D114191
2021-11-23 13:07:42 -06:00
spupyrev 884b6dd311 profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860
2021-11-23 11:02:40 -08:00
Zarko Todorovski 0d3add216f [llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts
Reworded some comments and asserts to avoid usage of `sanity check/test`

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114372
2021-11-23 13:22:55 -05:00
Philip Reames 03d8bc184a [indvars] Fix lftr crash when preheader is terminated by switch
This was found by oss-fuzz.  The switch will get canonicalized to a branch, but if it hasn't been when we run LFTR, we crashed on an unneeded assert.
2021-11-23 09:58:46 -08:00
Nemanja Ivanovic c933c2eb33 [PowerPC] Add BCD add/sub/cmp builtins
Support for builtins that use bcdadd./bcdsub. to add/subtract
Binary Coded Decimal values as well as to determine validity
and compare BCD values.

Differential revision: https://reviews.llvm.org/D114088
2021-11-23 11:42:36 -06:00
Florian Hahn 0a00d64e32
[LAA] Turn aggregate type check into assertion (NFCI).
getPtrStride should not be called with aggregate access types. There's
also an old TODO.

Turn the check into an assertion.
2021-11-23 17:37:30 +00:00
Philip Reames 065f777d27 Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)"
This reverts commit b00fc19822.  This change fails to build (link) on ubuntu x86,
2021-11-23 09:18:28 -08:00
Philip Reames 18086186ab [unroll] Remove two dead variable assignments [nfc]
These variables are not out-params, and we immediately return after assigning them.  Thus, the assignments are dead and just confusing.

I believe these used to be out-params, but they're not any more.
2021-11-23 09:12:20 -08:00
spupyrev b00fc19822 profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860
2021-11-23 09:08:30 -08:00
Philip Reames 5c77aa2b91 [unroll] Use early return in shouldFullUnroll [nfc] 2021-11-23 09:01:36 -08:00
Kazu Hirata d45cb1d7ea [llvm] Use range-based for loops (NFC) 2021-11-23 08:54:48 -08:00
Paul Robinson c075566c8d [PS4][TLI] Remove redundant line 2021-11-23 08:42:32 -08:00
alex-t 9e03e8c99e [AMDGPU] Enable fneg and fabs divergence-driven instruction selection.
Detailed description: We currently have a set of patterns to select ISD::FNEG and ISD::FABS to the bitwise operations.  We need to make them predicated to select the VALU or SALU bitwise operation variant according to the SDNode divergence bit.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114257
2021-11-23 19:37:07 +03:00
Simon Moll 1e65b93f3a [VP] Canonicalize macros of VPIntrinsics.def
Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>.  Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro).
A follow-up patch has documentation on how the macros are (intended) to be used.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D114144
2021-11-23 16:51:11 +01:00
Sanjay Patel 430ad9697d [InstCombine] enhance bitwise select matching
I noticed that adding a seemingly unrelated fold for xor caused
regressions on similar patterns, and this is one of the
underlying causes.

This could also be a variation for code as seen in:
https://llvm.org/PR34047
...although that exact example should be fixed after:
D113035 / c36b7e21bd

The vector test shows that we are actually missing a potential
canonicalization for bitcast-of-sext-of-not or the inverse.
The scalar test shows that even if we had that canonicalization,
it would still be possible to see this pattern due to extra uses.

https://alive2.llvm.org/ce/z/y2BAgi
2021-11-23 09:57:44 -05:00
Evgeniy Brevnov 47e2644c89 [DSE][NFC] Introduce "doesn't overwrite" return code for isOverwrite
Add OR_None code to indicate that there is no overwrite. This has no any effect for current uses but will be used in one of the next patches building support for PHI translation.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D105098
2021-11-23 17:11:15 +07:00
David Green 32b6c17b29 [SDAG] Use UnknownSize for masked load/store MMO size
A masked load or store will load a potentially unknown number of bytes
from a memory location - that is not generally known at compile time.
They do not necessarily load/store the entire vector width, and treating
them as such can lead to incorrect aliasing information (for example, if
the underlying object is smaller than the size of the vector).

This makes sure that the MMO is given an unknown size to represent this.
which is less accurate that "may load/store from up to 16 bytes", but
less incorrect that "will load/store from 16 bytes".

Differential Revision: https://reviews.llvm.org/D113888
2021-11-23 09:47:56 +00:00
Qiu Chaofan 59f4b3d308 [PowerPC] Implement more fusion types for Power10
This implements the rest of Power10 instruction fusion pairs, according
to user manual, including 'wide immediate', 'load compare', 'zero move'
and 'SHA3 assist'.

Only 'SHA3 assist' is enabled by default.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D112912
2021-11-23 17:21:17 +08:00
Martin Storsjö 4e5488afb2 [AArch64] [COFF] Move jump tables back to the readonly section
This essentially reverts f5884d255e
(D57277).

That commit was made as a workaround since LLVM back then didn't
support cross-section relative relocations (IMAGE_REL_ARM64_REL32)
in COFF for ARM64. Support for this was implemented later,
in d5c5cf5ce8 (D99572) and
382c505d9c (D102217).

The commit that moved jump tables to the function section noted
that it woud be ideal to utilize IMAGE_REL_ARM64_REL32.

Differential Revision: https://reviews.llvm.org/D113576
2021-11-23 10:13:48 +02:00
Martin Storsjö 06d0d449d8 [COFF] [ARM64] Create symbols with regular intervals for relocations against temporary symbols
For relocations against temporary symbols (that don't persist in
the object file), we normally adjust them to reference the start of
the section.

For adrp relocations, the immediate offset from the referenced
symbol is stored in the opcode as the 21 bit signed immediate; this
means that the symbol referenced must be within +/- 1 MB from the
referenced symbol.

Create label symbols with regular intervals (1 MB intervals). For
relocations against temporary symbols, pick the preceding added
offset symbol and make the relocation against that instead of
against the start of the section.

This should fix the root issue behind
https://bugs.llvm.org/show_bug.cgi?id=52378.

Differential Revision: https://reviews.llvm.org/D114340
2021-11-23 10:12:41 +02:00
Kazu Hirata d5b73a70a0 [llvm] Use range-based for loops (NFC) 2021-11-22 20:33:28 -08:00
Huihui Zhang 9cd7c534e2 [InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv.
For FAdd, FMul, FSub and FDiv, fold select into one of the operands to enable
further optimizations, i.e., floating-point reduction detection.

Turn code:
  %C = fadd %A, %B
  %D = select %cond, %C, %A

into:
  %C = select %cond, %B, -0.000000e+00
  %D = fadd %A, %C

Alive2 verification (with --disable-undef-input), timed out otherwise.
FAdd - https://alive2.llvm.org/ce/z/eUxN4Y
FMul - https://alive2.llvm.org/ce/z/5SWZz4
FSub - https://alive2.llvm.org/ce/z/Dhj8dU
FDiv - https://alive2.llvm.org/ce/z/Yj_NA2

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D113442
2021-11-22 15:10:10 -08:00
Florian Hahn 6149e57dc1
[ThreadPool] Support returning futures with results.
This patch adjusts ThreadPool::async to return futures that wrap
the result type of the passed in callable.

To do so, ThreadPool::asyncImpl first creates a shared promise. The
result of the promise is set in a new callable that first executes the
task. The callable is added to the task queue.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114183
2021-11-22 21:20:55 +00:00
Jay Foad 44a3916f78 [AMDGPU] Allow VOP3 source modifiers in fpow expansion
Differential Revision: https://reviews.llvm.org/D114353
2021-11-22 20:39:46 +00:00
Stanislav Mekhanoshin c407769f5e [InstCombine] (~(a | b) & c) | ~(c | (a ^ b)) -> ~((a | b) & (c | (b ^ a)))
Transform
```
(~(a | b) & c) | ~(c | (a ^ b)) -> ~((a | b) & (c | (b ^ a)))
```
And swapped case:
```
(a | ~(b & c)) & ~(a & (b ^ c)) --> ~(a | b) | (a ^ b ^ c)
```

```
----------------------------------------
define i3 @src(i3 %a, i3 %b, i3 %c) {
%0:
  %or1 = or i3 %b, %c
  %not1 = xor i3 %or1, 7
  %and1 = and i3 %a, %not1
  %xor1 = xor i3 %b, %c
  %or2 = or i3 %xor1, %a
  %not2 = xor i3 %or2, 7
  %or3 = or i3 %and1, %not2
  ret i3 %or3
}
=>
define i3 @tgt(i3 %a, i3 %b, i3 %c) {
%0:
  %obc = or i3 %b, %c
  %xbc = xor i3 %b, %c
  %o = or i3 %a, %xbc
  %and = and i3 %obc, %o
  %r = xor i3 %and, 7
  ret i3 %r
}
Transformation seems to be correct!
```
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %and1 = and i4 %b, %c
  %not1 = xor i4 %and1, 15
  %or1 = or i4 %not1, %a
  %xor1 = xor i4 %b, %c
  %and2 = and i4 %xor1, %a
  %not2 = xor i4 %and2, 15
  %and3 = and i4 %or1, %not2
  ret i4 %and3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %xor1 = xor i4 %b, %c
  %xor2 = xor i4 %xor1, %a
  %or1 = or i4 %a, %b
  %not1 = xor i4 %or1, 15
  %or2 = or i4 %xor2, %not1
  ret i4 %or2
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112955
2021-11-22 10:49:21 -08:00
Nico Weber 2fb3c05b34 [asm] Merge EmitMSInlineAsmStr() and EmitGCCInlineAsmStr()
This basically reverts 1778831a3d, which split them.
Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of
features that usually weren't added to EmitMSInlineAsmStr(), and
that was usually a mistake.  D71677, D113932, D114167 are all examples
of where things were backported to EmitMSInlineAsmStr().

The names were also not great. EmitMSInlineAsmStr() used to be called for `asm
inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as
well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand,
EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for
GCC-style __asm__ / asm statements with -masm=att (the default).

It's also less code (23 insertions, 188 deletions).

No behavior change.

Differential Revision: https://reviews.llvm.org/D114330
2021-11-22 11:49:57 -05:00
Nico Weber 7c2d51474a [asm] Allow labels as operands in intel asm syntax
This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass
with `asm inteldialect` too.

I don't know if this is something one can hit in practice with inline
asm. The test is from 2007 (4646aa3e33) but in 2009 blockaddr was
introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");`
compiles to

    call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1))

nowadays (thanks to jrtc27 for that example!).

(6c4d255bf3 switched clang to blockaddress on an opt-in basis,
e4801f7844 added docs for it, 31b132c0b7 added IR support.)

I half-heartedly tried to build clang 2.8 locally, but it didn't
just build. And 2.8 didn't have a prebuilt clang binary yet.

The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr()
more alike, and maybe we should delete this code form EmitGCCInlineAsmStr()
instead. But since it's just 3 lines and it's reachable from LLVM IR,
let's do the safer thing for now.

Differential Revision: https://reviews.llvm.org/D114329
2021-11-22 11:49:29 -05:00
Kazu Hirata 59a26448a6 [Target] Use range-based for loops (NFC) 2021-11-22 08:21:07 -08:00
Zarko Todorovski dc9b5550b2 [NFC][llvm][Hexagon] Inclusive Terms remove uses of sanity in Hexagon taget
Most changes are rewording comments but there are some assertions that I rephrased.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D114132
2021-11-22 10:08:01 -05:00
Hsiangkai Wang 137d3474ca [RISCV] Reverse the order of loading/storing callee-saved registers.
Currently, we restore the return address register as the last restoring
instruction in the epilog. The next instruction is `ret` usually. It is
a use of return address register. In some microarchitectures, there is
load-to-use data hazard. To avoid the load-to-use data hazard, we could
separate the load instruction from its use as far as possible. In this
patch, we reverse the order of restoring callee-saved registers to
increase the distance of `load ra` and `ret` in the epilog.

Differential Revision: https://reviews.llvm.org/D113967
2021-11-22 23:02:11 +08:00
Nikita Popov 62e9acad0a Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency"
This reverts commit d633db8f9d.

Causes bootstrap assertion failures:
https://lab.llvm.org/buildbot/#/builders/168/builds/3459/steps/9/logs/stdio
2021-11-22 15:47:33 +01:00
Nikita Popov d633db8f9d [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-22 15:27:25 +01:00
Simon Moll 56db1c072c [DA][NFC] Update publication - add remarks
Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis.  Fix phrasing, formatting. Add comments on reducible loop limitation.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D114146
2021-11-22 12:58:19 +01:00
Roman Lebedev 704d92607d
[X86][TTI] Finish costmodel for AVX512BW's VPMOVM2[BW] / VPMOV[BW]2M instructions
Apparently my methodology was suboptimal, and not only did miss all the +VL tuples,
i also missed some plain tuples. I believe, this adds everything missing.
Indeed, these manual costmodels are just not okay long-term.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114334
2021-11-22 14:31:34 +03:00
Roman Lebedev 8d09dd61c3
[X86][TTI] Costmodel for AVX512DQ's VPMOVM2[DQ] / VPMOV[DQ]2M instructions
Much like the VPMOVM2[BW] / VPMOV[BW]2M from AVX512BW,
these either sign-extent the mask register into a vector,
or pack the mask from vector register.

Apparently, we didn't even have MCA tests for these,
added in rG2f364f6f0d3a2420ca78cbd80abb186657180e05,
so i'm just guessing that their perf characteristics
are optimal.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114314
2021-11-22 14:31:34 +03:00
Diego Caballero 4348cd42c3 [LV] Drop integer poison-generating flags from instructions that need predication
This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact`
and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are
guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop
is linearized, these flags may lead to generating a poison value that is effectively used as the base address
of the widen load/store. The fix drops all the integer poison-generating flags from instructions that
contribute to the address computation of a widen load/store whose original instruction was in a basic block
that needed predication and is not predicated after vectorization.

Reviewed By: fhahn, spatel, nlopes

Differential Revision: https://reviews.llvm.org/D111846
2021-11-22 10:57:29 +00:00
Sjoerd Meijer 4d21b64464 [BPI] Look-up tables for non-loop branches. NFC.
This adds and uses look-up tables for non-loop branch probabilities, which have
have probabilities directly encoded into the tables for the different condition
codes. Compared to having this logic inlined in different functions, as it used
to be the case, I think this is compacter and thus also easier to check/cross
reference. This also adds a test for pointer heuristics that was missing.

Differential Revision: https://reviews.llvm.org/D114009
2021-11-22 10:30:42 +00:00
David Green 760d4d03d5 [AArch64] Sink splat shuffles to lane index intrinsics
This teaches AArch64TargetLowering::shouldSinkOperands to sink splat
shuffles to certain neon intrinsics, so that they can make use of the
lane variants of the instructions that are available.

Differential Revision: https://reviews.llvm.org/D112994
2021-11-22 08:11:35 +00:00
wangpc af0ecfccae [RISCV] Generate pseudo instruction li
Add an alias of `addi [x], zero, imm` to generate pseudo
instruction li, which makes assembly mush more readable.
For existed tests, users can update them by running script
`llvm/utils/update_llc_test_checks.py`.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D112692
2021-11-22 14:01:37 +08:00
Kazu Hirata 49e3838145 [llvm] Use make_early_inc_range (NFC) 2021-11-21 19:24:17 -08:00
Kazu Hirata ea5421bd0d [llvm] Use range-based for loops (NFC) 2021-11-21 19:24:15 -08:00
Roland McGrath b72b56016a NFC: clang-format lib/Transforms/Instrumentation/InstrProfiling.cpp
Differential Revision: https://reviews.llvm.org/D114343
2021-11-21 18:16:02 -08:00
Kazu Hirata c133fb321f [CodeGen] Use llvm::is_contained (NFC) 2021-11-21 10:36:20 -08:00
Kazu Hirata fc981cedea [llvm] Use range-based for loops (NFC) 2021-11-21 10:36:18 -08:00
Kazu Hirata f6bce30cf9 [llvm] Use range-based for loops (NFC) 2021-11-20 18:42:10 -08:00
Phoebe Wang 6cc820a3e2 [X86][FP16] Relax the pattern condition for VZEXT_MOVL to match more cases
Fixes pr52560

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114313
2021-11-21 09:14:11 +08:00
Nikita Popov aeba28bc62 [DSE] Drop hasAnalyzableMemoryWrite() (NFCI)
The functionality of hasAnalyzableMemoryWrite() is effectively
subsumed by getLocForWriteEx(), which will return None if the
instruction is not analyzable. The implementations don't match
exactly (e.g. getLocForWriteEx() does not limit non-calls to
stores), but in conjunction with the isRemovable() check, it ends
up being the same.
2021-11-20 23:20:12 +01:00
Nikita Popov 0a2bde94a0 [LVI] Drop requirement that modulus is constant
If we're looking only at the lower bound, the actual modulus
doesn't matter. This is a leftover from when I wanted to consider
the upper bound as well, where the modulus does matter.
2021-11-20 21:06:08 +01:00
Nikita Popov cd84cab6b3 [LVI] Support urem in implied conditions
If (X urem M) >= C we know that X >= C. Make use of this fact
when computing the implied condition range.

In some cases we could also establish an upper bound, but that's
both tricker and not interesting in practice.

Alive: https://alive2.llvm.org/ce/z/R5ZGSW
2021-11-20 21:01:26 +01:00
Florian Hahn cf8efbd30e
[VPlan] Wrap vector loop blocks in region.
A first step towards modeling preheader and exit blocks in VPlan as well.
Keeping the vector loop in a region allows for changing the VF as we
traverse region boundaries.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D113182
2021-11-20 17:59:48 +00:00
Sanjay Patel 337948ac6e [InstCombine] add folds for binop with sexted bool and constant operands
This is a generalization/extension of the existing and/or
folds noted with TODO comments. Those have a one-use
constraint that is not necessary.

Potential follow-ups are noted by the TODO comments in
the new function. We can also call this function from
other binop visit* functions, but we need to add tests
first.

This solves:
https://llvm.org/PR52543

https://alive2.llvm.org/ce/z/NWuCR5
2021-11-20 12:33:00 -05:00
Craig Topper a4373f6753 [X86] Don't combine (x86cmp (trunc (movmsk (bitcast X))), 0) if the truncate discards unknown bits.
We have transform that tries turn a pmovmskb into movmskps/pd or
movmskps to movmskpd. This transform isn't valid if the truncate
discarded bits that might be set by the original movmsk.

We could fix this by inserting an AND after the new movmsk to discard
the equivalent of the truncated bits, but I've left that for later
patch.

Fixes PR52567.

Differential Revision: https://reviews.llvm.org/D114306
2021-11-19 21:50:35 -08:00
Kazu Hirata d1abf481da [llvm] Use range-based for loops (NFC) 2021-11-19 21:12:13 -08:00
RamNalamothu 18f9351223 [AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slows down the loader,
and is only used to simplify disassembly.

Use '--symbolize-operands' with llvm-objdump to improve readability of the
branch target operands in disassembly.

Fixes: SWDEV-312223

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D114273
2021-11-20 10:32:41 +05:30
Lang Hames 43f5f6916f [ORC][JITLink] Move JITDylib name into JITLinkDylib base class.
This will enable better error messages and debug logs in JITLink.
2021-11-19 20:54:28 -08:00
ksyx 97b9e8438e [GVN][NFC] Remove redundant check
The if-check above deleted part guarantees that StoreOffset <= LoadOffset
and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that
LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows
StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0,
while it could be meaningless to have a type with nonpositive size, so that
the check could be removed. The values are converted to signed types to
avoid unsigned operation with negative offsets.

Part of revision D100179
Reapply commit c35e8185d8 with fixing problem
reported by mstorsjo
2021-11-19 20:24:36 -05:00
James Nagurne 241df03ce5 [NFC] Test commit, add whitespace to end-of-line 2021-11-19 18:22:00 -06:00
Ellis Hoag de11de308b [InstrProf] Use i32 for GEP index from lowering llvm.instrprof.increment
The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114268
2021-11-19 15:45:14 -08:00
Quinn Pham 3f3bee42d2 [NFC][llvm] Inclusive language: remove instance of master from Thumb2SizeReduction.cpp
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `Thumb2SizeReduction.cpp`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D114196
2021-11-19 16:07:58 -06:00
David Blaikie 3f3680dff3 DWARFVerifier: Simplify name lookups
No need to use the dynamic fallback query when the name type is known
statically at the call site.
2021-11-19 12:31:27 -08:00
Jay Foad ff7f2cfa95 [AMDGPU] Add an implicit use of M0 to all V_MOV_B32_indirect_read/write
NFCI. Previously the implicit use was added to V_MOV_B32_indirect_read
when building the instruction. V_MOV_B32_indirect_write didn't have an
implicit use of M0 at all, but apparently it did not cause any problems.

Differential Revision: https://reviews.llvm.org/D114239
2021-11-19 19:00:17 +00:00
Fabian Wolff 7eec832def [DSE] Improve handling of `strncpy` in Dead Store Elimination
Fixes PR#52062 and one of the remaining cases of PR#47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D114035
2021-11-19 17:46:29 +00:00
Florian Hahn 76effb001d
[LV] Remove obsolete comment about creating a dummy block (NFC)
No dummy pre-entry block is created since a6c4969f5f. The comment is
stale now and can be removed.

Mentioned by @Ayal in D113182.
2021-11-19 17:17:04 +00:00
Philip Reames 28000587e1 [SCEV] Revert two speculative compile time optimizations which made no difference
Revert "[SCEV] Defer all work from ea12c2cb as late as possible"
Revert "[SCEV] Defer loop property checks from ea12c2cb as late as possible"

This reverts commit 734abbad79 and  1a5666acb2.

Both of these changes were speculative attempts to address a compile time regression.  Neither worked, and both complicated the code in undesirable ways.
2021-11-19 08:45:56 -08:00
Philipp Tomsich af57a71d18 [RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk
On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:
```
  core_list_find:
	lh	a2, 2(a1)
	seqz	a3, a0         <<
	bltz	a2, .LBB0_5
	bnez	a3, .LBB0_9    << should sink the seqz
        [...]
	j	.LBB0_9
  .LBB0_5:
	bnez	a3, .LBB0_9    << should sink the seqz
	lh	a1, 0(a1)
        [...]
```
due to an icmp not being sunk.

The blocks after `codegenprepare` look as follows:
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !4
    %cmp = icmp sgt i16 %0, -1
    %tobool.not37 = icmp eq %struct.list_head_s* %list, null
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph
```
where the `%tobool.not37` is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
`bltz` instruction and the `bnez` is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !6
    %cmp = icmp sgt i16 %0, -1
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    %1 = icmp eq %struct.list_head_s* %list, null
    br i1 %1, label %return, label %land.rhs11.lr.ph
```

This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:
 * we no longer signal multiple condition registers (thus changing
   the behaviour of sinkCmpExpression() back to sinking the icmp)
 * we override shouldNormalizeToSelectSequence() to let always select
   the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged.  Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:
```
  core_list_find:
	lh	a2, 2(a1)
	bltz	a2, .LBB0_5
	beqz	a0, .LBB0_9    <<
        [...]
	j	.LBB0_9
.LBB0_5:
	beqz	a0, .LBB0_9    <<
	lh	a1, 0(a1)
        [...]
```

Differential Revision: https://reviews.llvm.org/D98932
2021-11-19 08:32:59 -08:00
Victor Huang 86e77cdb08 [PowerPC] Add a flag for conditional trap optimization
This patch adds a flag to enable/disable conditional trap optimization.
Optimization disabled by default.

Peer reviewed by: nemanjai
2021-11-19 10:24:54 -06:00
Ben Langmuir 4c94760f36 [ORC] Fix materialization of weak local symbols
We were adding all defined weak symbols to the materialization
responsibility, but local symbols will not be in the symbol table, so it
failed to materialize due to the "missing" symbol.

Local weak symbols come up in practice when using `ld -r` with a hidden
weak symbol.

rdar://85574696
2021-11-19 07:25:56 -08:00
Matt Morehouse 671f0930fe [X86] Selective relocation relaxation for +tagged-globals
For tagged-globals, we only need to disable relaxation for globals that
we actually tag.  With this patch function pointer relocations, which
we do not instrument, can be relaxed.

This patch also makes tagged-globals work properly with LTO, as
-Wa,-mrelax-relocations=no doesn't work with LTO.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D113220
2021-11-19 07:18:27 -08:00
Alexey Bataev d1fdf867b1 [SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC.
Added TreeEntry::getVectorFactor to get the final vectotization factor
to simplify the code.

Differential Revision: https://reviews.llvm.org/D114190
2021-11-19 06:32:19 -08:00
Nico Weber 8b76d33c59 [asm] Allow block address operands in `asm inteldialect`
This makes the following program build with -masm=intel:

    int foo(int count) {
      asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop);
      return count;
    stop:
      return 0;
    }

It's also is another step towards merging EmitGCCInlineAsmStr() and
EmitMSInlineAsmStr().

Differential Revision: https://reviews.llvm.org/D114167
2021-11-19 09:27:30 -05:00
Nico Weber 4f9a5c2a14 [asm] Remove explicit branch for modifier 'l'
No intended behavior change.

EmitGCCInlineAsmStr() used to explicitly check for modifier 'l'
after handling block address and machine basic block operands.
This prevented passing a MachineOperand with 'l' modifier to
PrintAsmMemoryOperand(). Conceptually that seems kind of nice,
but in practice the overrides of PrintAsmMemoryOperand() in all (*)
AsmPrinter subclasses already reject modifiers they don't know about,
and none of them don't know about 'l'. So removing this doesn't have
a behavior difference, is less code, and it makes EmitGCCInlineAsmStr()
and EmitMSInlineAsmStr() more similar, to prepare for merging them later.

(Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that
always works with X86AsmPrinter I think, and
X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l'
modifier, so it's hard to motivate adding that branch.)

*: The one exception was AVRAsmPrinter, which had an llvm_unreachable instead
of returning true. So this commit changes that, so that the AVR target keeps
emitting an error instead of crashing when passing a mem operand with a :l
modifier to it. All the other targets already don't crash on this.

Differential Revision: https://reviews.llvm.org/D114216
2021-11-19 09:19:53 -05:00
Jay Foad 30b27ecfc2 [AMDGPU] Use new opcode for indexed vgpr reads
Introduce V_MOV_B32_indirect_read for indexed vgpr reads
(and rename the old V_MOV_B32_indirect to
V_MOV_B32_indirect_write) so they can be unambiguously
distinguished from regular V_MOV_B32_e32. Previously they
were distinguished by looking for extra implicit operands
but this is fragile because regular moves sometimes have
extra implicit operands too:
- either by accident, when instructions end up with
  duplicate implicit operands (see e.g. D100939)
- or by design, when SIInstrInfo::copyPhysReg breaks a
  multi-dword copy into individual subreg mov instructions
  and adds implicit operands for the super-register.

The effect of this is that SIInstrInfo::isFoldableCopy can
be simplified and identifies more foldable copies. The test
diffs show that more immediate 0 values have been folded as
inline operands.

SIInstrInfo::isReallyTriviallyReMaterializable could
probably be simplified too but that is not part of this
patch.

Differential Revision: https://reviews.llvm.org/D114230
2021-11-19 13:08:11 +00:00
Roman Lebedev 049799c311
[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 8 bit when have AVX512BW+AVX512VBMI
If in addition to AVX512BW (that provides `{k}<->{i8,i16}` casts and i16 shuffles),
we have AVX512VBMI, which provides i8 shuffles, we are in an optimal situation.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114071
2021-11-19 15:58:10 +03:00
Roman Lebedev a751084bb4
[X86][Costmodel] `trunc v16i8 to v8i1` can appear after legalization, cost is same as for `trunc v8i8 to v8i1`
Note that there are many other missing costs, i'm *only* adding the ones that are queried
from `getReplicationShuffleCost()` for the existing (quite exhaustive) test coverage.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114070
2021-11-19 15:57:32 +03:00
Roman Lebedev a50fdd3fc9
[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW
Here we get pretty lucky. AVX512F does not provide any instructions
to convert between a `k` vector mask and a vector,
but AVX512BW adds `{k}<->nX{i8,i16}`conversions,
and just as it happens, with AVX512BW we have a i16 shuffle.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113915
2021-11-19 15:55:41 +03:00
Simon Pilgrim 0f652d8f52 [X86] LowerRotate - recognise hidden ROTR patterns for better vXi8 codegen
Check for a hidden ISD::ROTR (rotl(sub(0,x))) - vXi8 lowering can handle both (its always beneficial for splats, but otherwise only if we have VPTERNLOG).

We currently hit infinite loops in TargetLowering::expandROT if we set ISD::ROTR to custom, which needs addressing before we extend this much further.
2021-11-19 11:49:15 +00:00
Simon Pilgrim 812e64ef0c [DAG] MatchRotate - support rotate-by-constant of illegal types
Patch to fix some of the regressions in D77804.

By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.

Differential Revision: https://reviews.llvm.org/D113192
2021-11-19 11:12:04 +00:00
Yonghong Song 8fb3f84484 BPF: Workaround InstCombine trunc+icmp => mask+icmp Optimization
Patch [1] added further InstCombine trunc+icmp => mask+icmp
optimization and this caused a couple of bpf selftest failure.
Previous llvm BPF backend patch [2] introduced llvm.bpf.compare
builtin to handle such situations.

This patch further added support ">" and ">=" icmp opcodes.
Tested with bpf selftests and all tests are passed including two
previously failed ones.

Note Patch [1] also added optimization if the to-be-compared
constant is negative-power-of-2 (-C) or not-of-power-of-2 (~C).
This patch didn't implement these two cases as typical bpf
program compares a scalar to a positive length or boundary value,
and this scalar later is used as a index into an array buffer
or packet buffer.

  [1] https://reviews.llvm.org/D112634
  [2] https://reviews.llvm.org/D112938

Differential Revision: https://reviews.llvm.org/D114215
2021-11-18 20:25:28 -08:00
Serguei Katkov 3557f49353 [AARCH64] Teach AArch64FrameLowering::getFrameIndexReferencePreferSP really prefer SP.
Do more efforts to use sp if it is possible to lower a frame index.

Reviewers: reames, loicottet, ostannard, t.p.northover
Reviewed By: reames
Subscribers: arphaman, danilaml, hiraditya, kristof.beyls, llvm-commits, Matt, yrouban
Differential Revision: https://reviews.llvm.org/D111133
2021-11-19 11:14:02 +07:00
Senran Zhang 0425ea4621 [NFC][OpaquePtr][Evaluator] Remove call to PointerType::getElementType
There are still another 2 uses of PointerType::getElementType in
Evaluator when evaluating BitCast's on pointers. BitCast's on pointers
should be removed when opaque ptr is ready, so I just keep them as is.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D114131
2021-11-19 10:32:55 +08:00
Philip Reames 734abbad79 [SCEV] Defer all work from ea12c2cb as late as possible
This is a second speculative compile time optimization to address a reported regression.  My actual suspicion is that availability of no-self-wrap is making some *other* bit of code trigger, but let's rule this out.
2021-11-18 17:19:52 -08:00
Stanislav Mekhanoshin f8e615462b [AMDGPU] Fix SIPostRABundler crash on null register used by dbg value
Recently we started generate DBG_VALUEs with $noreg operands.
This crashes SIPostRABundler, and it should not iterate these
registers anyway.

Fixes: SWDEV-311733

Differential Revision: https://reviews.llvm.org/D114202
2021-11-18 17:01:19 -08:00
Victor Huang 40c65655af [PowerPC] Remove the redundant terminator instruction when optimizing conditional trap
This patch is a follow up patch for ae27ca9a67 to
the remove redundant terminator when optimizing conditional trap.

Peer reviewed by: nemanjai
2021-11-18 17:52:26 -06:00
Ahmed Bougacha e3a7f0e2f9 [AArch64][PAC] Select llvm.ptrauth.sign/sign.generic to PAC*.
The @llvm.ptrauth.sign/sign.generic intrinsics map cleanly to
the various AArch64 PAC[IDG][Z][AB] instructions.  Select them.

Differential Revision: https://reviews.llvm.org/D91087
2021-11-18 15:21:30 -08:00
David Blaikie 3cbc4c487a llvm-dwarfdump: Rebuild type names in dwo type units 2021-11-18 14:12:48 -08:00
Philip Reames 1a5666acb2 [SCEV] Defer loop property checks from ea12c2cb as late as possible
This is a speculative compile time optimization to address a reported regression.  It's the only thing which vaguely makes sense.
2021-11-18 13:47:45 -08:00
Nikita Popov 46c26991ae [DSE] Remove getLocForWrite() (NFCI)
This implements nearly the same logic as getLocForWriteEx(), and
is only used in one place. In that context, we should also know
that getLocForWriteEx() returns a non-None result. As such,
consolidate everything to use one function.
2021-11-18 21:19:18 +01:00
Nikita Popov f1295563f1 [DSE] Move removePartiallyOverlappedStores() into DSEState (NFC)
So it can use getLocForWriteEx().
2021-11-18 21:19:18 +01:00
Philip Reames ea12c2cb9c [SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions
This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag.

The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be:
for (unsigned i = 0; i != N; i += 2) { body; }

If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2.

Differential Revision: https://reviews.llvm.org/D103991
2021-11-18 10:07:44 -08:00
Kazu Hirata 7ca14f6044 [llvm] Use range-based for loops (NFC) 2021-11-18 09:09:52 -08:00
Simon Pilgrim 3821d2ab3b [X86] LowerRotate - pull out repeated is ISD::ROTL check. NFC. 2021-11-18 17:03:19 +00:00
Kerry McLaughlin ff64b2933a [LoopVectorize] Check the number of uses of an FAdd before classifying as ordered
checkOrderedReductions looks for Phi nodes which can be classified as in-order,
meaning they can be vectorised without unsafe math. In order to vectorise the
reduction it should also be classified as in-loop by getReductionOpChain, which
checks that the reduction has two uses.

In this patch, a similar check is added to checkOrderedReductions so that we
now return false if there are more than two uses of the FAdd instruction.
This fixes PR52515.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D114002
2021-11-18 16:41:19 +00:00
Arnold Schwaighofer 7d11c5dac2 Coro: Remove coro_end and coro_suspend_retcon in private unprocessed functions
We might emit functions that are private and never called. The coro
split pass only processes functions that might be called. Remove
intrinsics that we can't generate code for.

rdar://84619859

Differential Revision: https://reviews.llvm.org/D114021
2021-11-18 07:48:24 -08:00
Mubashar Ahmad 8e47b83ec9 [AArch64][ARM] Enablement of Cortex-A710 Support
Phabricator review: https://reviews.llvm.org/D113256
2021-11-18 10:58:05 +00:00
Eric Tang 9fe6b9e802 [TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s|u]mul.overflow intrinsics
Fixed the vector type issue that where we used getVectorNumElements()
    should be replaced by getVectorElementCount() when lowering these
    intrinsics.

    This is similar to D94149

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D109809
2021-11-18 10:16:08 +00:00
David Sherwood 9cef7c1ca9 [CodeGen][SVE] Add missing isel patterns for vector_reverse
We were missing patterns for vector_reverse of unpacked FP vector
types, as well as all the supported bfloat vectors.

Tests added here:

  CodeGen/AArch64/named-vector-shuffle-reverse-sve.ll

Differential Revision: https://reviews.llvm.org/D114089
2021-11-18 09:59:26 +00:00
Florian Hahn da9f2ba3b1
[SCEV] Reorder operands checks in collectConditions.
The initial two cases require a SCEVConstant as RHS. Pull up the condition
to check and swap SCEVConstants from below. Also remove a redundant
check & swap if RHS is SCEVUnknown.
2021-11-18 09:36:16 +00:00
Chen Zheng 9bda9a3980 [PowerPC] fix typos in comments, NFC 2021-11-18 08:55:23 +00:00
Phoebe Wang a9fba2be35 [X86][ABI] Do not return float/double from x87 registers when x87 is disabled
This is aligned with GCC's behavior.
Also, alias `-mno-fp-ret-in-387` to `-mno-x87`, by which we can fix pr51498.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112143
2021-11-18 12:31:08 +08:00
Phoebe Wang de34a940ae [X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers
when passing variable arguments.
GCC also provides option -mskip-rax-setup to skip the setup of rax when
SSE is disabled. This helps to reduce the code size, see pr23258.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112413
2021-11-18 11:20:32 +08:00
Zarko Todorovski 59c84774d2 [NFC][llvm] Inclusive language: remove uses of sanity in llvm/lib/ExecutionEngine/
Reworded and removed code comments to avoid using `sanity check` and `sanity
test`.
2021-11-17 22:17:54 -05:00
Zi Xuan Wu 24d1673c8b [llvm-tblgen][RISCV] Make llvm-tblgen RISCVCompressInstEmitter to be common infra across different targets
Not only RISCV but also other target such as CSKY, there are compressed instructions mixed with normal instructions.
To reuse the basic infra to compress/uncompress and predict instruction, we need reconstruct the RISCVCompressInstEmitter
and make it more general and suitable for other target.

Differential Revision: https://reviews.llvm.org/D113475
2021-11-18 11:14:27 +08:00
Zarko Todorovski 5b8bbbecfa [NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity
test`.
2021-11-17 21:59:00 -05:00
Luo, Yuanke c4dba47196 [X86][AMX] Don't emit tilerelease for old AMX instrisic.
We should avoid mixing old AMX instrinsic with new AMX intrinsic. For
old AMX intrinsic, user is responsible for invoking tile release. This
patch is to check if there is any tile config generated by compiler. If
so it emit tilerelease instruction, otherwise it don't emit the
instruction.

Differential Revision: https://reviews.llvm.org/D114066
2021-11-18 09:28:32 +08:00
Craig Topper d78fdf111d [LegalizeTypes] Further limit expansion of CTTZ during type promotion.
Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type.
We have special handling for CTTZ expansion to use those ops with a
small conversion. The setup for that doesn't generate extra code or
large constants so we don't gain anything from expanding early and we
make CTTZ_ZERO_UNDEF codegen worse.

Follow up from post commit feedback on D112268. We don't seem to have
any in tree tests that care about this.
2021-11-17 15:27:29 -08:00
Philip Reames ad69402f3e [SCEVAA] Avoid forming malformed pointer diff expressions
This solves the same crash as in D104503, but with a different approach.

The test case test_non_dom demonstrates a case where scev-aa crashes today. (If exercised either by -eval-aa or -licm.) The basic problem is that SCEV-AA expects to be able to compute a pointer difference between two SCEVs for any two pair of pointers we do an alias query on. For (valid, but out of scope) reasons, we can end up asking whether expressions in different sub-loops can alias each other. This results in a subtraction expression being formed where neither operand dominates the other.

The approach this patch takes is to leverage the "defining scope" notion we introduced for flag semantics to detect and disallow the formation of the problematic SCEV. This ends up being relatively straight forward on that new infrastructure. This change does hint that we should probably be verifying a similar property for all SCEVs somewhere, but I'll leave that to a follow on change.

Differential Revision: D114112
2021-11-17 12:38:04 -08:00
Simon Pilgrim 5f99f771ec [X86] splitVector - only extract lower half subvector from splats
If we're splitting a source vector that is a splat (with no undefs), just extract (for free) the lower half subvector and use it for both halfs.
2021-11-17 19:38:43 +00:00
Simon Pilgrim 3020608b61 Fix MSVC signed/unsigned mismatch warning. NFC. 2021-11-17 18:59:23 +00:00
Simon Pilgrim e76032c173 [X86] LowerRotate - improve vXi8 rotate-by-scalar lowering with direct use of (extended) shift-by-scalar helpers.
If we're rotating vXi8 by a splatted amount, then unpack to vXi16, perform a SHL by the (extended) scalar, and then pack the results.

This is a vector equivalent to the "rotl(x,y) -> (((aext(x) << bw) | zext(x)) << (y & (bw-1))) >> bw" style expansion we do for scalars in LowerFunnelShift.

I think we can usefully use this for other vector types and vector funnel-shifts in the future, depending how we expand beyond D113192 for matching rotations/funnel-shifts for more type/ops.
2021-11-17 18:59:23 +00:00
Stanislav Mekhanoshin 6d3db28088 [InstCombine] Generalize complex OR patterns to AND
For every pattern with only NOT, OR, and AND operations there is
always a symmetrical attern with AND and OR swapped.

This adds 2 transformations: https://reviews.llvm.org/D113526

```
(~(a & b) | c) & (~(a & c) | b) --> ~((b ^ c) & a)
(~(a & b) | c) & ~(a & c) --> ~((b | c) & a)
```

```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %and1 = and i4 %b, %a
  %not1 = xor i4 %and1, 15
  %and2 = and i4 %a, %c
  %not2 = xor i4 %and2, 15
  %or = or i4 %not2, %b
  %r = and i4 %or, %not1
  ret i4 %r
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %or = or i4 %b, %c
  %and = and i4 %or, %a
  %r = xor i4 %and, 15
  ret i4 %r
}
Transformation seems to be correct!

----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %and1 = and i4 %a, %b
  %not1 = xor i4 %and1, 15
  %or1 = or i4 %not1, %c
  %and2 = and i4 %a, %c
  %not2 = xor i4 %and2, 15
  %or2 = or i4 %not2, %b
  %and3 = and i4 %or1, %or2
  ret i4 %and3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %xor = xor i4 %b, %c
  %and = and i4 %xor, %a
  %not = xor i4 %and, 15
  ret i4 %not
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D113526
2021-11-17 10:47:36 -08:00
Nico Weber bf834b2629 [x86/asm] Let EmitMSInlineAsmStr() handle variants too
This is preparation for D113707, where I want to make `-masm=intel`
emit `asm inteldialect` instructions.

`{movq %rbx, %rax|mov rax, rbx}` is supposed to evaluate to the bit
between { and | for att and to the bit between | and } for intel.
Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(),
EmitMSInlineAsmStr() has to support variants as well.

(clang translates `{...|...}` to `$(...$|...$)`. I'm not sure why
it doesn't just send along only the first `...` or the second `...`
to LLVM, but given the notes in PR23933 let's not do a big
reorganization in this codepath.)

Differential Revision: https://reviews.llvm.org/D113932
2021-11-17 13:31:59 -05:00
Craig Topper 0274be28d7 [RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent.
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.

The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.

This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.

We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.

Differential Revision: https://reviews.llvm.org/D111904
2021-11-17 10:29:41 -08:00
Nico Weber 103cc914d6 [x86/asm] Make variants work when converting at&t inline asm input to intel asm output
`asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm
input), so EmitGCCInlineAsmStr() always has to pick the same variant since it
cares about the input asm string, not the output asm string.

For PowerPC, that default variant is 1. For other targets, it's 0.

Without this, the included test case errors out with

    error: unknown use of instruction mnemonic without a size suffix
             mov rax, rbx

since it picks the intel branch and then tries to interpret it as AT&T
when selecting intel-style output with `-x86-asm-syntax=intel`.

Differential Revision: https://reviews.llvm.org/D113894
2021-11-17 13:23:18 -05:00
DianQK 1e9fa0b12a Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction.
This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead.

The compiler also needs to mark register $x0 as live in for the following case.

```
$x1 = ADDXri $sp, 16, 0
BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0
```

This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0.
As an example:

```
lang=c
typedef struct {
    double v1;
    double v2;
} D16;

typedef struct {
    D16 v1;
    D16 v2;
} D32;

typedef long long LL8;
typedef struct {
    long long v1;
    long long v2;
} LL16;
typedef struct {
    LL16 v1;
    LL16 v2;
} LL32;

typedef struct {
    LL32 v1;
    LL32 v2;
} LL64;

LL8 needx0(LL8 v0, LL8 v1);

void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8);

LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8)
{
  LL8 result = needx0(v0, 0);
  bar(v1, v2, v3, v4, v5, v6, v7, v8);
  return result + 1;
}
```

As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`.
This code is compiled to give the following instruction MIR code.

```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
...
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
...
```

Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get:
```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
...
```

For the first time we outlined the following sequence:
```
frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
```
and
```
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
```

When we execute the outline again, we will get:
```
$x0 = ORRXrs $xzr, $lr, 0 <---- here
BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0
$lr = ORRXrs $xzr, $x0, 0

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```

When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register.
The reason for the above error appears to be that:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```
should be:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0
```

When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`.
A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0).

Reviewed By: jinlin

Differential Revision: https://reviews.llvm.org/D112911
2021-11-17 09:44:10 -08:00
Arthur Eubanks e3e25b5112 [NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor
In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.

We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.

The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.

Heavily inspired by D98103.

Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113947
2021-11-17 09:06:46 -08:00
Dmitry Vyukov a7c57c4ec8 tsan: don't consider debug calls as calls
Tsan pass does 2 optimizations based on presence of calls:
1. Don't emit function entry/exit callbacks if there are no calls
and no memory accesses.
2. Combine read/write of the same variable if there are no
intervening calls.
However, all debug info is represented as CallInst as well
and thus effectively disables these optimizations.
Don't consider debug info calls as calls.

Reviewed By: glider, melver

Differential Revision: https://reviews.llvm.org/D114079
2021-11-17 14:42:16 +01:00
Mirko Brkusanin db6bc2ab51 [AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods
If possible fold fneg into instruction above if users cannot fold mods and we
know it will decrease instruction count.
Follows same logic as SDAG combiner in choosing opportunities to combine.

Differential Revision: https://reviews.llvm.org/D112827
2021-11-17 14:25:13 +01:00
David Sherwood 8d77555b12 [Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector
When getTypeConversion returns TypeScalarizeScalableVector we were
sometimes returning a non-simple type from getTypeLegalizationCost.
However, many callers depend upon this being a simple type and will
crash if not. This patch changes getTypeLegalizationCost to ensure
that we always a return sensible simple VT. If the vector type
contains unusual integer types, e.g. <vscale x 2 x i3>, then we just
set the type to MVT::i64 as a reasonable default.

A test has been added here that demonstrates the vectoriser can
correctly calculate the cost of vectorising a "zext i3 to i64"
instruction with a VF=vscale x 1:

  Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Differential Revision: https://reviews.llvm.org/D113777
2021-11-17 13:11:58 +00:00
Simon Pilgrim 5fedbd5b18 [DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c')
If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.
2021-11-17 12:41:48 +00:00
David Sherwood 670dd40244 [Analysis] Fix getNumberOfParts to return 0 when the answer is unknown
When asking how many parts are required for a scalable vector type
there are occasions when it cannot be computed. For example, <vscale x 1 x i3>
is one such vector for AArch64+SVE because at the moment no matter how we
promote the i3 type we never end up with a legal vector. This means
that getTypeConversion returns TypeScalarizeScalableVector as the
LegalizeKind, and then getTypeLegalizationCost returns an invalid cost.
This then causes BasicTTImpl::getNumberOfParts to dereference an invalid
cost, which triggers an assert. This patch changes getNumberOfParts to
return 0 for such cases, since the definition of getNumberOfParts in
TargetTransformInfo.h states that we can use a return value of 0 to represent
an unknown answer.

Currently, LoopVectorize.cpp is the only place where we need to check for
0 as a return value, because all other instances will not currently
ask for the number of parts for <vscale x 1 x iX> types.

In addition, I have changed the target-independent interface for
getNumberOfParts to return 1 and assume there is a single register
that can fit the type. The loop vectoriser has lots of tests that are
target-independent and they relied upon the 0 value to mean the
answer is known and that we are not scalarising the vector.

I have added tests here that show we correctly return an invalid cost
for VF=vscale x 1 when the loop contains unusual types such as i7:

  Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Differential Revision: https://reviews.llvm.org/D113772
2021-11-17 12:07:09 +00:00
Florian Hahn e8b55cf7b7
[SCEV] Apply loop guards when computing max BTC for arbitrary steps.
Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.

Fixes PR52464.

Reviewed By: reames, nikic

Differential Revision: https://reviews.llvm.org/D113578
2021-11-17 11:00:49 +00:00
David Sherwood ca18fcc2c0 [IR] Change CreateStepVector to work with element types smaller than i8
Currently the stepvector intrinsic only supports element types that
are integers of size 8 bits or more. This patch adds support for the
creation of stepvectors with smaller element types by creating
the intrinsic with i8 elements that we then truncate to the requested
size.

It's not currently possible to write a vectoriser test to exercise
this code path so I have added a unit test here:

  llvm/unittests/IR/IRBuilderTest.cpp

Differential Revision: https://reviews.llvm.org/D113767
2021-11-17 10:47:50 +00:00
Jay Foad 3264e95938 [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493
2021-11-17 10:16:47 +00:00
Stanislav Mekhanoshin c74f2e5b27 [InstCombine] Use SpecificBinaryOp_match in two more places
Differential Revision: https://reviews.llvm.org/D114038
2021-11-17 01:16:06 -08:00
Roman Lebedev 2037ec725f
[X86][Costmodel] `*ext v64i1 to v32i16` can appear after legalization, cost is same as for `*ext v32i1 to v32i16`
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113914
2021-11-17 12:02:50 +03:00
Roman Lebedev 23b194bf18
[X86][Costmodel] `trunc v32i16 to v64i1` can appear after legalization, cost is same as for `trunc v32i16 to v32i1`
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113913
2021-11-17 12:02:50 +03:00
Eric Tang f7eb061a5f [SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors
This change make WidenVecRes_SELECT work for scalable vectors.

    This patch is split from [D110319](https://reviews.llvm.org/D110319)

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110388
2021-11-17 08:55:11 +00:00
Aaron Puchert b20da5117f Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC)
Instead of popping them and then immediately throwing them away, we can
just filter out globals and items in different scopes before adding them
to WorkList. Shouldn't change anything but keep the queue smaller.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D113864
2021-11-17 00:01:20 +01:00
Aaron Puchert 86b3100cde [DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC)
It was being used occasionally already, and using it on the constructor
and getDbgEntityID has obvious type safety benefits.

Also use llvm_unreachable in the switch as usual, but since only these
two values are used in constructor calls I think it's still NFC.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D113862
2021-11-17 00:01:20 +01:00
Benjamin Kramer 8b8e8704ce [PowerPC] Fix a nullptr dereference
LiMI1/LiMI2 can be null, so don't call a method on them before checking.
Found by ubsan.
2021-11-16 23:52:42 +01:00
Philip Reames 8d85e945b2 [SCEV] Canonicalize X - urem X, Y patterns
There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version.

The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency.

The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions.

Differential Revision: https://reviews.llvm.org/D114018
2021-11-16 11:59:21 -08:00
Arthur Eubanks c95a9f46c9 [Loads] Handle addrspacecast constant expressions when determining dereferenceability
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114015
2021-11-16 11:17:57 -08:00
Victor Huang ae27ca9a67 [PowerPC] PPC backend optimization on conditional trap intrustions
This patch adds PPC back end optimization to analyze the arguments of a
conditional trap instruction to execute one of the following:
1. Delete it if never trap
2. Replace it if always trap
3. Otherwise keep it

Reviewed By: nemanjai, amyk, PowerPC

Differential revision: https://reviews.llvm.org/D111434
2021-11-16 13:11:57 -06:00