Commit Graph

28968 Commits

Author SHA1 Message Date
Kazu Hirata 972d4133e9 Use {DenseSet,SmallPtrSet}::contains (NFC) 2021-10-29 20:26:07 -07:00
Florian Hahn 274a9b0f0b
[DSE] Support redundant stores eliminated by memset.
This patch adds support to remove stores that write the same value
as earlier memesets.

It uses isOverwrite to check that a memset completely overwrites a later
store. The candidate store must store the same bytewise value as the
byte stored by the memset.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112321
2021-10-29 22:19:53 +01:00
Sanjay Patel 8f786b4618 [InstCombine] fix comments to match code; NFC 2021-10-29 15:48:35 -04:00
modimo 5caad9b5d3 [InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner
Adds the following switches:

1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:

 1. Original: defers to original advisor
 2. AlwaysInline: inline all sites not in replay
 3. NeverInline: inline no sites not in replay

2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:

  1. Line
  2. LineColumn
  3. LineDiscriminator
  4. LineColumnDiscriminator

Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.

All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang

An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.

Updated various tests to cover the new switches and negative remarks

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D112040
2021-10-29 12:32:03 -07:00
modimo 51ce567b38 [SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect
Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO.

Testing:
ninja check-all

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D112033
2021-10-29 12:04:52 -07:00
Roman Lebedev 0ae7bf124a
[NFC][LoopDeletion] Count the number of broken backedges
Those don't contribute to the number of deleted loops.
2021-10-29 21:58:16 +03:00
Sanjay Patel d0e9879d96 [InstCombine] allow vector splat matching for bitwise logic folds
These transforms are also likely missing a one-use check,
but that's another patch.
2021-10-29 14:22:50 -04:00
Stanislav Mekhanoshin a905c54b76 [InstCombine] Fold `(~(a | b) & c) | ~(a | c)` into `~((b & c) | a)`
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
  %or1 = or i4 %b, %a
  %not1 = xor i4 %or1, -1
  %or2 = or i4 %a, %c
  %not2 = xor i4 %or2, -1
  %and = and i4 %not2, %b
  %or3 = or i4 %and, %not1
  ret i4 %or3
}

define i4 @tgt(i4 %a, i4 %b, i4 %c) {
  %and = and i4 %c, %b
  %or = or i4 %and, %a
  %or3 = xor i4 %or, -1
  ret i4 %or3
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112338
2021-10-29 10:58:09 -07:00
Jay Foad 1b758925ad [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.

Differential Revision: https://reviews.llvm.org/D112791
2021-10-29 15:02:58 +01:00
David Green 11630dbbc3 [InstCombine] Fold BW/2+1 tops bits are same pattern
Match "icmp eq (trunc (lsr A, BW), (ashr (trunc A), BW-1))", which checks
the top BW/2 + 1 bits are all the same. Create "A >=s INT_MIN && A <=s
INT_MAX", which we generate as "icmp ult (add A, 2^BW-1), 2^BW" to skip
a few steps of instcombining.
https://alive2.llvm.org/ce/z/NjH6Ty
https://alive2.llvm.org/ce/z/_fEQ9P

Differential Revision: https://reviews.llvm.org/D109155
2021-10-29 12:30:20 +01:00
David Green 9020e22a87 [InstCombine] Convert xor (ashr X, BW-1), C -> select(X >=s 0, C, ~C)
The sequence of instructions `xor (ashr X, BW-1), C` (or with a truncation
`xor (trunc (ashr X, BW-1)), C)` takes a value, produces all zeros or all
ones and with it optionally inverts a constant depending on whether the
original input was positive or negative. This is the same as checking if
the value is positive, and selecting between the constant and ~constant.
https://alive2.llvm.org/ce/z/NJ85qY

This is a fairly general version of a fold that helps pull saturating
arithmetic into a canonical form.

Differential Revision: https://reviews.llvm.org/D109151
2021-10-29 11:19:20 +01:00
Chuanqi Xu bb16e83932 [NFC] [Coroutines] Use llvm::make_scope_exit to replace self-defined RTTIHelper 2021-10-29 12:14:20 +08:00
Stanislav Mekhanoshin f7f430c913 [InstCombine] Fixed non-determinisctic order of new instructions
Fixes non-determinisctic order of XOR instructions created after
5a7a458306. The order of call argument evaluation is not
defined, so create one Value before the call.
2021-10-28 12:14:02 -07:00
Stanislav Mekhanoshin 5a7a458306 [InstCombine] Fold `(c & ~(a | b)) | (b & ~(a | c))` to `~a & (b ^ c)`
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %or1 = or i4 %a, %b
  %not1 = xor i4 %or1, 15
  %and1 = and i4 %not1, %c
  %or2 = or i4 %a, %c
  %not2 = xor i4 %or2, 15
  %and2 = and i4 %not2, %b
  %or3 = or i4 %and1, %and2
  ret i4 %or3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %xor = xor i4 %b, %c
  %not = xor i4 %a, 15
  %or3 = and i4 %xor, %not
  ret i4 %or3
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112276
2021-10-28 11:54:30 -07:00
Yuanfang Chen c18ed69873 [Internalize] Preserve __stack_chk_fail in Internalizer correctly
Move the section collecting `AlwaysPreserved` up before any
`maybeInternalize` is called. Otherwise, functions in `AlwaysPreserved` (in this case, `__stack_chk_fail`)
are not preserved.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D112684
2021-10-28 11:22:26 -07:00
Florian Hahn c45045bfd0
[VPlan] Keep induction recipes in header.
This patch updates recipe creation to ensure all
VPWidenIntOrFpInductionRecipes are in the header block. At the moment,
new induction recipes can be created in different blocks when trying to
optimize casts and induction variables.

Having all induction recipes in the header makes it easier to
analyze/transform them in VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111300
2021-10-28 18:22:05 +01:00
Leonard Grey 793b481f54 [CGProfile] Don't emit call graph profile edges with zero weight
With D112160 and D112164, on a Chrome Mac build this reduces the total
size of CGProfile sections by 78% (around 25% eliminated entirely) and
total size of object files by 0.14%.

Differential Revision: https://reviews.llvm.org/D112655
2021-10-28 11:32:49 -04:00
David Green 9358384fd6 [InstCombine] Extend canonicalizeClampLike to handle truncated inputs
This extends the canonicalizeClampLike function to allow cases where the
input is truncated, but still matching on the types of the ICmps. For
example
  %t = trunc i32 %X to i8
  %a = add i32 %X, 128
  %cmp = icmp ult i32 %a, 256
  %c = icmp sgt i32 %X, -1
  %f = select i1 %c, i8 High, i8 Low
  %r = select i1 %cmp, i8 %t, i8 %f
becomes
  %c1 = icmp slt i32 %X, -128
  %c2 = icmp sge i32 %X, 128
  %s1 = select i1 %c1, i32 sext(Low), i32 %X
  %s2 = select i1 %c2, i32 sext(High), i32 %s1
  %t = trunc i32 %s2 to i8
https://alive2.llvm.org/ce/z/vPzfxH

We limit the transform to constant High and Low values, where we know
the sext are free.

Differential Revision: https://reviews.llvm.org/D108049
2021-10-28 15:46:58 +01:00
Dawid Jurczak f87e0c68d7 [DSE] Eliminates redundant store of an exisiting value (PR16520)
That's https://reviews.llvm.org/D90328 follow-up.

This change eliminates writes to variables where the value that is being written is already stored in the variable.
This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them.
When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write.

For example:

void f() {

x = 1;
if foo() {
   x = 1;
   g();
} else {
  h();
}

}
void g();
void h();

The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this:

void f() {

x = 1;
if foo() {
   g();
} else {
  h();
}

}
void g();
void h();

Differential Revision: https://reviews.llvm.org/D111727
2021-10-28 16:20:09 +02:00
David Green 79011c705b [InstCombine] Fix rare condition violation in canonicalizeClampLike
With a "ult x, 0", the fold in canonicalizeClampLike does not validate
with undef inputs. This condition will usually have been simplified
away, but we should ensure the code is correct in case.
https://alive2.llvm.org/ce/z/S8HQ6H vs https://alive2.llvm.org/ce/z/h2XBJ_

See: https://reviews.llvm.org/D108049
2021-10-28 15:03:07 +01:00
Alexey Bataev 07ef9f513f [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-28 05:45:09 -07:00
Sanjay Patel e8535fa784 [InstCombine] allow Negator to fold multi-use select with constant arms
The motivating test is reduced from:
https://llvm.org/PR52261

Note that the more general problem of folding any binop into a multi-use
select of constants is still there. We need to ease the restriction in
InstCombinerImpl::FoldOpIntoSelect() to catch those. But these examples
never reach that code because Negator exclusively handles negation
patterns within visitSub().

Differential Revision: https://reviews.llvm.org/D112657
2021-10-28 08:35:58 -04:00
Johannes Doerfert acf3093117 [Attributor][FIX] Do not ignore memory writes in AAMemoryBehavior
Even if we look for `nocapture` we need to bail on escaping pointers.
The crucial thing is that we might not look at a big enough scope when
we derive the memory behavior. Thus, it might be `nocapture` in a larger
context while it is "captured" in a smaller context.
2021-10-27 21:04:32 -05:00
Johannes Doerfert 734f91441d [Attributor][NFC] Improve debug messages 2021-10-27 21:04:31 -05:00
Johannes Doerfert 8a4551b893 [Attributor][FIX] Use right address space to avoid assertion
When we strip and accumulate constant offsets we need to pick the right
address space such that the offset APInt has the right bit width.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112544
2021-10-27 18:22:37 -05:00
Nick Desaulniers 3ccd041af9 [LowerTypeTests] Emit cfi_jt aliases regardless of function export
A constant complaint we get is that the __typeid__ symbols in the CFI
jump tables causes confusing stack traces in applications. Emit the more
readable cfi_jt aliases regardless of function export (LTO vs Thin LTO).

Reviewed By: pcc, tejohnson

Differential Revision: https://reviews.llvm.org/D107934
2021-10-27 11:36:26 -07:00
Alexey Bataev f06e332982 Revert "[SLP]Improve/fix reordering of the gathered graph nodes."
This reverts commit 64d1617d18 to fix test
non-stability.
2021-10-27 11:16:58 -07:00
Roman Lebedev 156f10c840
[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one
It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ

While generally i don't like such hacks,
we have a very good reason to do this: here we are expanding
a run-time correctness check for the vectorization,
and said `umul_with_overflow` will not be optimized out
before we query the cost of the checks we've generated.

Which means, the cost of run-time checks would be artificially inflated,
and after https://reviews.llvm.org/D109368 that will affect
the minimal trip count for which these checks are even evaluated.
And if they aren't even evaluated, then the vectorized code
certainly won't be run.

We could consider doing this in IRBuilder,  but then we'd need to
also teach `CreateExtractValue()` to look into chain of `insertvalue`'s,
and i'm not sure there's precedent for that.

Refs. https://reviews.llvm.org/D109368#3089809
2021-10-27 19:45:55 +03:00
Alexey Bataev 64d1617d18 [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-27 08:49:13 -07:00
David Sherwood 5d9318638e [NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx
This patch changes the definition of getStepVector from:

  Value *getStepVector(Value *Val, int StartIdx, Value *Step, ...

to

  Value *getStepVector(Value *Val, Value *StartIdx, Value *Step, ...

because:

1. it seems inconsistent to pass some values as Value* and some as
   integer, and
2. future work will require the StartIdx to be an expression made up
   of runtime calculations of the VF.

In widenIntOrFpInduction I've changed the code to pass in the
value returned from getRuntimeVF, but the presence of the assert:

  assert(!VF.isScalable() && "scalable vectors not yet supported.");

means that currently this code path is only exercised for fixed-width
VFs and so the patch is still NFC.

Differential revision: https://reviews.llvm.org/D111882
2021-10-27 16:12:38 +01:00
Alexey Bataev 9b12975cbf Revert "[SLP]Improve/fix reordering of the gathered graph nodes."
This reverts commit f719b794bc to fix
instability in tests.
2021-10-27 07:31:36 -07:00
Alexey Bataev f719b794bc [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-27 06:08:40 -07:00
Alexey Bataev cb4feae7bd [SLP]Fix logical and/or reductions.
Need to emit select(cmp) instructions for poison-safe forms of select
ops. Currently alive reports that `Target is more poisonous than source`
for operations we generating for such instructions.

https://alive2.llvm.org/ce/z/FiNiAA

Differential Revision: https://reviews.llvm.org/D112562
2021-10-27 04:25:20 -07:00
David Sherwood 3d706c20f8 [NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor
I have removed LoopVectorizationPlanner::setBestPlan, since this
function is quite aggressive because it deletes all other plans
except the one containing the <VF,UF> pair required. The code is
currently written to assume that all <VF,UF> pairs will live in the
same vplan. This is overly restrictive, since scalable VFs live in
different plans to fixed-width VFS. When we add support for
vectorising epilogue loops when the main loop uses scalable vectors
then we will the vplan for the main loop will be different to the
epilogue.

Instead I have added a new function called

  LoopVectorizationPlanner::getBestPlanFor

that returns the best vplan for the <VF,UF> pair requested and leaves
all the vplans untouched. We then pass this best vplan to

  LoopVectorizationPlanner::executePlan

which now takes an additional VPlanPtr argument.

Differential revision: https://reviews.llvm.org/D111125
2021-10-27 09:38:27 +01:00
Arthur Eubanks ae27c57b18 [InferAddressSpaces] Make pass work with opaque pointers
Avoid getPointerElementType().
2021-10-26 23:53:20 -07:00
Sanjay Patel acabad9ff6 [InstCombine] try to canonicalize icmp with trunc op into mask and cmp
The motivating test is based on:
https://llvm.org/PR52260

We have better analysis for X == 0, so try harder to form that.
2021-10-26 17:43:28 -04:00
Usman Nadeem da1318ccca [NFC][Instcombine] Cleanup some obsolete matches in visitSelectInstr
These are now redundant after https://reviews.llvm.org/D106872

Change-Id: I82edfedf1d45cac4e3368d77ce3a48c78e342c19
2021-10-26 10:07:08 -07:00
Rosie Sumpter b716d0aa94 [LoopVectorize] Clean up VPReductionRecipe::execute. NFC
Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode.
Move the code which finds Kind and IsOrdered to be outside the for loop
since neither of these change with the vector part.

Differential Revision: https://reviews.llvm.org/D112547
2021-10-26 17:18:25 +01:00
Alexey Bataev ce14d1b690 [SLP]Do not reorder reduction nodes.
The final reduction nodes should not be reordered, the order does not
matter for reductions. Also, it might be profitable to vectorize smaller
reduction trees, reduction cost may compensate small tree cost.

Part of D111574

Differential Revision: https://reviews.llvm.org/D112467
2021-10-26 07:41:24 -07:00
Nikita Popov 11a8423dab [SCEV] Use reverse() (NFC) 2021-10-26 11:08:58 +02:00
Max Kazantsev 9bbfe0f72c [NFC] Remove obsolete simplifyOnceImpl function
The function simplifyOnce only calls simplifyOnceImpl and does nothing else.
Having this separate helper makes no sense. Removing it.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D112517
Reviewed By: mkazantsev
2021-10-26 13:51:42 +07:00
Max Kazantsev d4c74cd4e8 [NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop
When peeling a loop, we assume that the latch has a `br` terminator and that
all loop exits are either terminated with an `unreachable` or have a terminating
deoptimize call. So when we peel off the 1st iteration, we change the IDom of
all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now,
but if we add logic to support loops with exits that are followed by a block
with an `unreachable` or a terminating deoptimize call, changing the exit's idom
wouldn't be enough and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So neither
of the exits dominates this unreachable block. If we change the IDoms of the
exits to some peeled loop block, we don't update the dominators of the unreachable
block. Currently we just don't get to the peeling logic, saying that we can't peel
such loops.

Previously we stored exits' IDoms in a map before peeling a loop and then, after
peeling off one iteration, we changed their IDoms.
Now we use the same logic not only for exits but for all non-loop blocks dominated
by the loop.
So when we add logic to support peeling loops with exits which branch, for example,
to an unreachable-terminated block, we would update the IDoms not only for exits,
but for their successors.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev, nikic
2021-10-26 13:09:07 +07:00
Nikita Popov 3a995c918e [SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander
Always insert values into ExprValueMap, and instead skip using them
in SCEVExpander if poison-generating flags have been lost. This
ensures that all values that are in ValueExprMap are also in
ExprValueMap, so we can use the latter to invalidate the former.

This change is probably not entirely NFC for the case where
originally the SCEV had no nowrap flags but they were inferred
later, in which case that would now allow reusing the existing
value for expansion.

Differential Revision: https://reviews.llvm.org/D112389
2021-10-25 22:37:20 +02:00
Arthur Eubanks 4a9db7367d [AlwaysInliner] Invalidate analyses when we delete functions
Fixes PR52292.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D112473
2021-10-25 13:36:32 -07:00
Zarko Todorovski 9769e97c35 [LLVM] Inclusive terms: remove/replace references to sanity in RewriteStatepointsForGC.cpp and test
Part of work to have the LLVM backend to use more inclusive terms.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D112461
2021-10-25 16:17:41 -04:00
Philip Reames f82cf6187f [indvars] Fix pr52276 (missing one use check)
The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test).  Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit.  Restrict ourselves to case where we have a single use.
2021-10-25 09:26:55 -07:00
Alexey Bataev eb9b75dd4d [SLP]Change the order of the reduction/binops args pair vectorization attempts.
Need to change the order of the reduction/binops args pair vectorization
attempts. Need to try to find the reduction at first and postpone
vectorization of binops args. This may help to find more reduction
patterns and vectorize them.
Part of D111574.

Differential Revision: https://reviews.llvm.org/D112224
2021-10-25 06:27:14 -07:00
Max Kazantsev a9b0776a81 [SimplifyCFG] Sanity assert in iterativelySimplifyCFG
We observe a hang within iterativelySimplifyCFG due to infinite
loop execution. Currently, there is no limit to this loop, so
in case of bug it just works forever. This patch adds an assert
that will break it after 1000 iterations if it didn't converge.
2021-10-25 17:10:17 +07:00
Nikita Popov 75384ecdf8 [InstSimplify] Refactor invariant.group load folding
Currently strip.invariant/launder.invariant are handled by
constructing constant expressions with the intrinsics skipped.
This takes an alternative approach of accumulating the offset
using stripAndAccumulateConstantOffsets(), with a flag to look
through invariant.group intrinsics.

Differential Revision: https://reviews.llvm.org/D112382
2021-10-25 10:56:25 +02:00
Florian Hahn a6c4969f5f
[VPlan] Do not create dummy entry block (NFC).
At the moment a dummy entry block is created at the beginning of VPlan
construction. This dummy block is later removed again.

This means it is not easy to identify the VPlan header block in a
general fashion, because during recipe creation it is the single
successor of the entry block, while later it is the entry block.

To make getting the header easier, just skip creating the dummy block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111299
2021-10-25 09:52:58 +01:00
Nikita Popov 477551fd09 [SCEVExpander] Minor cleanup in value reuse (NFC)
Use dyn_cast_or_null and convert one of the checks into an
assertion. SCEV is a per-function analysis.
2021-10-25 10:32:17 +02:00
Kazu Hirata 9800731367 [Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC) 2021-10-24 17:35:35 -07:00
Philip Reames 3c06ecaa1e [instcombine] Fix oss-fuzz 39934 (mul matcher can match non-instruction)
Fixes a crash observed by oss-fuzz in 39934.  Issue at hand is that code expects a pattern match on m_Mul to imply the operand is a mul instruction, however mul constexprs are also valid here.
2021-10-24 14:42:03 -07:00
Nikita Popov 710596a1e1 [ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI)
As this API is now internally offset-based, we can accept a starting
offset and remove the need to create a temporary bitcast+gep
sequence to perform an offset load. The API now mirrors the
ConstantFoldLoadFromConst() API.
2021-10-23 17:59:39 +02:00
Philip Reames 412eb07edd [indvars] Use fact loop must exit to canonicalize to unsigned conditions
The logic in this patch is that if we find a comparison which would be unsigned except for when the loop is infinite, and we can prove that an infinite loop must be ill defined, we can still make the predicate unsigned.

The eventual goal (combined with a follow on patch) is to use the fact the loop exits to remove the zext (see tests) entirely.

A couple of points worth noting:
* We loose the ability to prove the loop unreachable by committing to the must exit interpretation. If instead, we later proved that rhs was definitely outside the range required for finiteness, we could have killed the loop entirely. (We don't currently implement this transform, but could in theory, do so.)
* simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts).  D112170 explores fixing that, but - at least so far - appears to be too expensive compile time wise.

Differential Revision: https://reviews.llvm.org/D111836
2021-10-22 10:31:36 -07:00
Nikita Popov 5bb7562962 [Attributor] Generalize GEP construction
Make use of the getGEPIndicesForOffset() helper for creating GEPs.
This handles arrays as well, uses correct GEP index types and
reduces code duplication.

Differential Revision: https://reviews.llvm.org/D112263
2021-10-22 18:30:43 +02:00
Kazu Hirata 6fe949c4ed [Target, Transforms] Use StringRef::contains (NFC) 2021-10-22 08:52:33 -07:00
Chuanqi Xu ddbf196194 [Coroutines] Ignore partial lifetime markers refer of an alloca
When I playing with Coroutines, I found that it is possible to generate
following IR:
```
%struct = alloca ...
%sub.element = getelementptr %struct, i64 0, i64 index ; index is not
%zero
lifetime.marker.start(%sub.element)
% use of %sub.element
lifetime.marker.end(%sub.element)
store %struct to xxx ;  %struct is escaping!

<suspend points>
```

Then the AllocaUseVisitor would collect the lifetime marker for
sub.element and treat it as the lifetime markers of the alloca! So it
judges that the alloca could be put on the stack instead of the frame by
judging the lifetime markers only.
The root cause for the bug is that AllocaUseVisitor collects wrong
lifetime markers.

This patch fixes this.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D112216
2021-10-22 09:49:50 +08:00
Vitaly Buka b7ea298dfd [msan] Don't use TLS slots of noundef args
Transformations may strip the attribute from the
argument, e.g. for unused, which will result in
shadow offsets mismatch between caller and
callee.

Stripping noundef for used arguments can be
a problem, as TLS is not going to be set
by caller. However this is not the goal of the
patch and I am not aware if that's even
possible.

Differential Revision: https://reviews.llvm.org/D112197
2021-10-21 18:35:12 -07:00
Nikita Popov 1848525842 [CodeMetrics] Don't require speculatability for ephemeral values
As discussed in D112016, our current requirement of speculatability
for ephemeral is overly strict: What we really care about is that
the instruction will be DCEd once the assume is dropped. For that
it is sufficient that the instruction is side-effect free and not
a terminator.

In particular, this allows non-dereferenceable loads to be ephemeral
values.

Differential Revision: https://reviews.llvm.org/D112179
2021-10-21 20:30:01 +02:00
Sanjay Patel 66d22b4da4 [VectorCombine] fold shuffle-of-binops with common operand
shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)

This is motivated by an example in D111800
(although that patch avoids the problem for that particular example).

The pattern is shown in reduced form with:
https://llvm.org/PR52178
https://alive2.llvm.org/ce/z/d8zB4D

There is no difference on the PhaseOrdering test from D111800
because the aarch64 cost model says that the shuffle cost is 3 while
the fadd cost is 2.

Differential Revision: https://reviews.llvm.org/D111901
2021-10-21 12:37:54 -04:00
Sanjay Patel 3888de9507 [InstCombine] generalize reassociated Demorgan folds
This updates the recent D112108 / b92412fb28
to handle the flipped logic ('or') sibling:
https://alive2.llvm.org/ce/z/Y2L6Ch
2021-10-21 10:39:29 -04:00
Alexey Bataev 3ea7877c8b [SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization.
Vectorization of PHIs and stores very similar, it might be beneficial to
try to revectorize stores (like PHIs) if the total number of stores with
the same/alternate opcode is less than the vector size but number of
stores with the same type is larger than the vector size.

Differential Revision: https://reviews.llvm.org/D109831
2021-10-21 06:25:32 -07:00
Dawid Jurczak 9ba5bb4309 [NFC][LoopIdiom] Make for loops more readable
Patch simplifies for loops in LIR following LLVM guidelines: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible.

Differential Revision: https://reviews.llvm.org/D112077
2021-10-21 12:17:44 +02:00
Evgeniy Brevnov 1a8ec24efb [NARY-REASSOCIATE][NFC] Simplify min/max handling
In order to explore different variants of reassociation current implementation uses "swap in a loop" approach. Unfortunately, the implementation is more complicated than it could be. This is an attempt to streamline the code. New approach is to extract core functionality into a helper function and call it explicitly as many times as required.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112128
2021-10-21 15:45:53 +07:00
Vitaly Buka 6742c8a2d8 [NFC][msan] Break the loop when done
We have nothing to do after the Argument
is found.
2021-10-20 21:08:12 -07:00
Stanislav Mekhanoshin b92412fb28 [InstCombine] Fold `(a & ~b) & ~c` to `a & ~(b | c)`
%not1 = xor i32 %b, -1
  %not2 = xor i32 %c, -1
  %and1 = and i32 %a, %not1
  %and2 = and i32 %and1, %not2
=>
  %i1 = or i32 %b, %c
  %i2 = xor i32 %1, -1
  %and2 = and i32 %i2, %a

Differential Revision: https://reviews.llvm.org/D112108
2021-10-20 13:05:46 -07:00
Florian Hahn 8977bd5806
[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue.
At the moment, rewriteLoopExitValue forgets the current phi node in the
loop that collects phis to rewrite. A few lines after the value is
forgotten, SCEV is used again to analyze incoming values and
potentially expand SCEV expression. This means that another SCEV is
created for PN, before the IR is actually updated in the next loop.

This leads to accessing invalid cached expression in combination with
D71539.

PN should only be changed once the actual incoming exit value is set in
the next loop. Moving invalidation there should ensure that PN is
invalidated in all relevant cases.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D111495
2021-10-20 20:48:33 +01:00
Sanjay Patel 80ab06c599 [InstCombine] fold fake vector insert to bit-logic
bitcast (inselt (bitcast X), Y, 0) --> or (and X, MaskC), (zext Y)

https://alive2.llvm.org/ce/z/Ux-662

Similar to D111082 / db231ebdb0 :
We want to avoid relatively opaque vector ops on types that are
likely supported by the backend as scalar integers. The bitwise
logic ops are more likely to allow further combining.

We probably want to generalize this to allow a shift too, but
that would oppose instcombine's general rule of not creating
extra instructions, so that's left as a potential follow-up.
Alternatively, we could do that transform in VectorCombine
with the help of the TTI cost model.

This is part of solving:
https://llvm.org/PR52057
2021-10-20 14:21:40 -04:00
Itay Bookstein 08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Evgeniy Brevnov 269f563a2b [NARY-REASSOCIATE] Fix infinite recursion optimizing min\max
To guarantee convergence of the algorithm each optimization step should decrease number of instructions when IR is modified. This property is not held in this test case. The problem is that SCEV Expander may do "unexpected" reassociation what results in creation of new min/max chains and introduction of extra instructions. As a result on each step we indefinitely optimize back and forth.

The solution is to restrict SCEV Expander to perform uncontrolled reassociations by means of "Unknown" expressions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112060
2021-10-20 14:23:03 +07:00
Philip Reames 0836a1059d Extend transform introduced in D111896 to multiple exits
This is trivial.  It was left out of the original review only because we had multiple copies of the same code in review at the same time, and keeping them in sync was easiest if the structure was kept in sync.
2021-10-19 12:12:19 -07:00
Philip Reames fca0218875 [indvars] Canonicalize exit conditions to unsigned using range info
This patch duplicates a bit of logic we apply to comparisons encountered during the IV users walk to conditions which feed exit conditions. Why? simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts).

Note that this can be trivially extended to multiple exiting blocks. I'm leaving that to a future patch (solely to cut down on the number of versions of the same code in review at once.)

Differential Revision: https://reviews.llvm.org/D111896
2021-10-19 11:49:12 -07:00
Anna Thomas 9403514e76 [LoopPredication] Calculate profitability without BPI
Using BPI within loop predication is non-trivial because BPI is only
preserved lossily in loop pass manager (one fix exposed by lossy
preservation is up for review at D111448). However, since loop
predication is only used in downstream pipelines, it is hard to keep BPI
from breaking for incomplete state with upstream changes in BPI.
Also, correctly preserving BPI for all loop passes is a non-trivial
undertaking (D110438 does this lossily), while the benefit of using it
in loop predication isn't clear.

In this patch, we rely on profile metadata to get almost similar benefit as
BPI, without actually using the complete heuristics provided by BPI.
This avoids the compile time explosion we tried to fix with D110438 and
also avoids fragile bugs because BPI can be lossy in loop passes
(D111448).

Reviewed-By: asbirlea, apilipenko
Differential Revision: https://reviews.llvm.org/D111668
2021-10-19 14:24:04 -04:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Alexey Bataev b9cfa016da [SLP]Fix emission of the shrink shuffles.
Need to follow the order of the reused scalars from the
ReuseShuffleIndices mask rather than rely on the natural order.

Differential Revision: https://reviews.llvm.org/D111898
2021-10-18 13:13:12 -07:00
modimo 313c657fce [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.

This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D110658
2021-10-18 13:08:39 -07:00
Florian Hahn e844f05397
[LoopUtils] Simplify addRuntimeCheck to return a single value.
This simplifies the return value of addRuntimeCheck from a pair of
instructions to a single `Value *`.

The existing users of addRuntimeChecks were ignoring the first element
of the pair, hence there is not reason to track FirstInst and return
it.

Additionally all users of addRuntimeChecks use the second returned
`Instruction *` just as `Value *`, so there is no need to return an
`Instruction *`. Therefore there is no need to create a redundant
dummy `and X, true` instruction any longer.

Effectively this change should not impact the generated code because the
redundant AND will be folded by later optimizations. But it is easy to
avoid creating it in the first place and it allows more accurately
estimating the cost of the runtime checks.
2021-10-18 18:03:09 +01:00
Gil Rapaport 1156bd4fc3 [LV] Record memory widening decisions (NFCI)
Record widening decisions for memory operations within the planned recipes and
use the recorded decisions in code-gen rather than querying the cost model.

Differential Revision: https://reviews.llvm.org/D110479
2021-10-18 18:03:35 +03:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Peter Waller c4603a8a43 [InstCombine][DebugInfo] Remove superflous assertion, add test
When this code was added, an unnecessary assertion slipped in which we
now hit in real code.

Add a test to defend against it firing again.
2021-10-18 11:00:01 +00:00
Max Kazantsev baad10c09e Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits"
This reverts commit fa16329ae0.

See comments in discussion. Merged by mistake, not entirely getting what
the problem was.
2021-10-18 17:14:11 +07:00
Max Kazantsev fa16329ae0 [NFC] [LoopPeel] Change the way DT is updated for loop exits
When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.

With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).

This patch was a part of D110922.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev
2021-10-18 10:23:05 +07:00
Sanjay Patel a49f5386ce [InstCombine] generalize fold for mask-with-signbit-splat, part 2
This removes an over-specified fold. The more general transform
was added with:
727e642e97

There's a difference on an existing test that shows a potentially
unnecessary use limit on an icmp fold.

That fold is in InstCombinerImpl::foldICmpSubConstant(), and IIRC
there was some back-and-forth on it and similar folds because they
could cause analysis/passes (SCEV, LSR?) to miss optimizations.

Differential Revision: https://reviews.llvm.org/D111410
2021-10-15 17:11:29 -04:00
Sanjay Patel 727e642e97 [InstCombine] generalize fold for mask-with-signbit-splat
(iN X s>> (N-1)) & Y --> (X < 0) ? Y : 0

https://alive2.llvm.org/ce/z/qeYhdz

I was looking at a missing abs() transform and found my way to this
generalization of an existing fold that was added with D67799.
As discussed in that review, we want to make sure codegen handles
this difference well, and for all of the targets/types that I
spot-checked, it looks good.

I am leaving the existing fold in place in this commit because
it covers a potentially missing icmp fold, but I plan to remove
that as a follow-up commit as suggested during review.

Differential Revision: https://reviews.llvm.org/D111410
2021-10-15 16:25:48 -04:00
Florian Hahn 4a1d63d7d0
[VectorCombine] Add option to only run scalarization transforms.
This patch adds a pass option to only run transforms that scalarize
vector operations and do not create new vector instructions.

When running VectorCombine early in the pipeline introducing new vector
operations can have negative effects, like blocking loop or SLP
vectorization. To avoid regressions, restrict the early VectorCombine
run (when using -enable-matrix) to only perform scalarization and not
introduce new vector operations.

This is done as option to the pass directly, which is then set when
adding the pass to the pipeline. This is done for the new pass manager
only.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111800
2021-10-15 20:35:58 +01:00
Stephen Tozer f5ed223b0f [DebugInfo] Limit the size of DIExpressions that we will salvage up to
Fixes: https://bugs.llvm.org/show_bug.cgi?id=51841

This patch places an arbitrary limit on the size of DIExpressions that
we will produce via salvaging, for performance reasons. This helps to
fix a performance issue observed in the bug above, in which debug values
would be salvaged hundreds of times, producing expressions with over
1000 elements and causing the compiler to hang. Limiting the size of
debug values that we will produce to 128 largely fixes this issue.

Reviewed By: dblaikie, jmorse

Differential Revision: https://reviews.llvm.org/D110332
2021-10-15 17:34:21 +01:00
Anton Afanasyev 7b07c01351 [InstCombine] Support arbitrary const shift amount for `lshr (sext i1 ...)`
Add lshr (sext i1 X to iN), C --> select (X, -1 >> C, 0) case. This expands
C == N-1 case to arbitrary C.

Fixes PR52078.

Reviewed By: spatel, RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D111330
2021-10-15 13:39:13 +03:00
Kazu Hirata 81e9c90686 [llvm] Use llvm::is_contained (NFC) 2021-10-14 22:44:09 -07:00
Alexey Bataev 414abff1fe [SLP]Fix PR52090: clang crashes: Assertion `Index < Length && "Invalid index!"' failed.
Need to check that either Idx is UndefMaskElem and value is UndefValue
or Idx is valid and value is the same as the scalar value in the node.

Differential Revision: https://reviews.llvm.org/D111802
2021-10-14 14:26:29 -07:00
Nikita Popov 69853f9920 [IVUsers] Move preheader check into SCEVExpander
Rather than checking for loop nest preheaders upfront in IVUsers,
move this requirement into isSafeToExpand() from SCEVExpander.

Historically, LSR did not check whether SCEVs are safe to expand
and fully relied on IVUsers to validate this. Later, support for
non-expandable SCEVs was added via rigid formulas.

Checking this in isSafeToExpand() makes it more obvious what
exactly this check is guarding against, and avoids the awkward
loop nest scan.

This is a followup to https://reviews.llvm.org/D111493#3055286.

Differential Revision: https://reviews.llvm.org/D111681
2021-10-14 21:52:31 +02:00
Simon Pilgrim 13185f0154 [Transforms] eliminateDeadStores - remove unused variable. NFC.
The initial MemoryAccess *Current assignment is never used, and all other uses are initialized/used within the worklist loop (and not across multiple iterations) - so move the variable internal to the loop.

Fixes scan-build unused assignment warning.
2021-10-14 18:10:03 +01:00
Shoaib Meenai 6404f4b5af [InstCombine] Remove attributes after hoisting free above null check
If the parameter had been annotated as nonnull because of the null
check, we want to remove the attribute, since it may no longer apply and
could result in miscompiles if left. Similarly, we also want to remove
undef-implying attributes, since they may not apply anymore either.

Fixes PR52110.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111515
2021-10-13 15:34:56 -07:00
Philip Reames 47d10b25f8 [instcombine] PRE freeze to only potentially posion/undef operand of phi
This extends the foldOpIntoPhi code used when visiting a freeze user of a phi to allow any non-undef/poison operand as opposed to only non-undef/poison constants.  This lets us hoist a freeze in the increment of an IV into the preheader in many cases.

Differential Revision: https://reviews.llvm.org/D111744
2021-10-13 13:55:54 -07:00
Sjoerd Meijer 67a58fa3a6 [FuncSpec] Don't run the solver if there's nothing to do
Even if there are no interesting functions, the SCCP solver would still run
before bailing. Now bail earlier, avoid running the solver for nothing.

Differential Revision: https://reviews.llvm.org/D111645
2021-10-13 19:05:19 +01:00
Arthur Eubanks 3628bb7436 Make various assume bundle data structures use uint64_t
Following D110451, we need to make sure to support 64 bit values.
2021-10-13 10:38:41 -07:00
Sanjay Patel 02928fcb8c [InstCombine] improve code comments; NFC 2021-10-13 10:40:44 -04:00
Sanjay Patel 905d170803 [InstCombine] allow matching vector splat constants in foldLogOpOfMaskedICmps()
This is NFC-intended for scalar code. There are still unnecessary
m_ConstantInt restrictions in surrounding code, so this is not a
complete fix.

This prevents regressions seen with a planned follow-on to D111410.
2021-10-13 10:15:26 -04:00
Philip Reames 6f34839407 [instcombine] propagate freeze through single use poison producing flag instruction
If we have an instruction which produces poison only when flags are specified on the instruction, then we know that freezing the operands and dropping flags is equivalent to freezing the result. If we know those flags don't result in any undefined behavior being executed, then there's no point in preserving the flags as we gain no knowledge by having them.

This patch extends the existing propagation logic which sinks freeze to single potential non-poison operands to allow dropping of flags when we know the freeze is the sole use of the instruction with poison flags.

The main value is that we tend to sink freezes towards the phi in IV cycles where the incoming value to the phi is the freeze of an IV increment. This will in turn (in a future patch), let us fold the freeze through the phi into the loop preheader. Motivated by eliminating need for CanonicalizeFreezeInLoops for the clearly profitable cases from onephi.ll test case in the test directory.

Differential Revision: https://reviews.llvm.org/D111675
2021-10-12 13:52:41 -07:00
Ayal Zaks 15692fd6b5 [LV] Fix 2nd crash for reverse interleaved groups under mask/fold-tail.
This patch fixes another crash revealed by PR51614:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse (which is currently not supported).

Differential Revision: https://reviews.llvm.org/D108900
2021-10-12 21:44:42 +03:00
Mircea Trofin ea4a6c8426 [Inline] Make sure the InlineAdvisor is correctly cleared.
If another inlining session came after a ModuleInlinerWrapperPass, the
advisor alanysis would still be cached, but its Result would be cleared.
We need to clear both.

This addresses PR52118

Differential Revision: https://reviews.llvm.org/D111586
2021-10-12 10:42:41 -07:00
Sanjay Patel 7a2949647a [InstCombine] propagate no-wrap flag through select-of-mul fold
This may not be obvious, but Alive2 agrees:
https://alive2.llvm.org/ce/z/Ld9qNT

If the mul has "nsw", then -1 * INT_MIN is poison, so the
negate can also have "nsw" because 0 - INT_MIN is poison.

If the mul has "nuw", then that means the "OtherOp" can only
be 0 or 1 (anything else multiplied by 0xfff... would wrap).
So the replacement negate must be "nsw" because it is either
"0-0" or "0-1".

This is another regression noticed with a planned follow-up
to D111410.
2021-10-12 12:57:20 -04:00
Hongtao Yu 098a0d8fbc [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847
2021-10-12 09:44:12 -07:00
Kerry McLaughlin 1439ef1a3f [LoopVectorize] Classify pointer induction updates as scalar only if they have one use
collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming
that the instruction will be scalar after vectorization. This may crash later
in VPReplicateRecipe::execute() if there there is another user of the instruction
other than the Phi node which needs to be widened.

This changes collectLoopScalars so that if there are any other users of
Update other than a Phi node, it is not added to ScalarPtrs.

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D111294
2021-10-12 13:24:49 +01:00
Florian Hahn 40d85f16c4
[LoopPeel] Use any_of & contains instead of for & find.
Using contains was suggested in D108114, but I forgot to include it when
landing the patch.
2021-10-12 12:18:01 +01:00
Sjoerd Meijer fc0fa85171 [FuncSpec] Allow ConstExprs that are function pointers
This is a follow up of D110529 that disallowed constexprs. That change
introduced a regression as this also disallowed constexprs that are function
pointers, which is actually one of the motivating use cases that we do want to
support.

Differential Revision: https://reviews.llvm.org/D111567
2021-10-12 11:44:26 +01:00
Florian Hahn cd0ba9dc58
[LoopPeel] Peel if it turns invariant loads dereferenceable.
This patch adds a new cost heuristic that allows peeling a single
iteration off read-only loops, if the loop contains a load that

    1. is feeding an exit condition,
    2. dominates the latch,
    3. is not already known to be dereferenceable,
    4. and has a loop invariant address.

If all non-latch exits are terminated with unreachable, such loads
in the loop are guaranteed to be dereferenceable after peeling,
enabling hoisting/CSE'ing them.

This enables vectorization of loops with certain runtime-checks, like
multiple calls to `std::vector::at` if the vector is passed as pointer.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D108114
2021-10-12 11:42:28 +01:00
Alina Sbirlea f7ca54289c [LoopSimplifyCFG] Do not require MSSA. Continue to preserve if available.
LoopSimplifyCFG does not need MSSA, but should preserve it if it's available.

This is a legacy PM change, aimed to denoise the test changes in D109958.

Differential Revision: https://reviews.llvm.org/D111578
2021-10-11 14:27:15 -07:00
Sanjay Patel 59441c7329 [InstCombine] fold signbit check of X | (X -1)
There may be some other patterns like this or a generalization,
but this is an example that I noticed would definitely regress
with a planned follow-up to D111410.

https://alive2.llvm.org/ce/z/GVpQDb
2021-10-11 16:14:13 -04:00
Arthur Eubanks fbddf22ef7 [SCCP] Properly report changes when changing a pointer argument
Fixes one of the issues in PR51946.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D111277
2021-10-11 13:12:08 -07:00
Florian Hahn ab33427c86
[VPlan] Print live-in backedge taken count as part of plan.
At the moment, a VPValue is created for the backedge-taken count, which
is used by some recipes. To make it easier to identify the operands of
recipes using the backedge-taken count, print it at the beginning of the
VPlan if it is used.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D111298
2021-10-11 20:13:01 +01:00
Philip Reames 7f55209cee [SCEV] Extend trip count to avoid overflow by default
As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined

There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8.

In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so.

When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case.

This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow.

I want to emphasize that this change is *not* NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case.

Differential Revision: https://reviews.llvm.org/D110587
2021-10-11 09:55:55 -07:00
hyeongyu kim 0aeb37324d [SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison
https://bugs.llvm.org/show_bug.cgi?id=27506
https://bugs.llvm.org/show_bug.cgi?id=31652
https://bugs.llvm.org/show_bug.cgi?id=51043

Problems with SimpleLoopUnswitch cause the bug reports above.

```
while (...) {
  if (C) { A }
  else   { B }
}
Into:

C' = freeze(C)
if (C') {
  while (...) { A }
} else {
  while (...) { B }
}
```
This problem can be solved by adding a freeze on hoisted branches(above transform) and has been solved by D29015.
However, D29015 is now reverted by performance regression(2b5a897651)

It is not the first time that an added freeze has caused performance regression.
SimplifyCFG also had a problem with UB caused by branching-on-undef, which was solved by adding freeze to the branching condition. (D104569)
Performance regression occurred in D104569, and patches such as D105344 and D105392 were written to minimize it.

This patch will correct the SimpleLoopUnswitch as D104569 handles the SimplyCFG while minimizing performance loss by introducing patches like D105344 and D105392(This patch was rebased with the author's permission)

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D106041
2021-10-12 01:02:09 +09:00
David Sherwood 26b7d9d622 [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-11 09:41:38 +01:00
Clement Courbet 6aaf1e7ea9 [LoopIdiom] Fix store size SCEV type.
We were using the type of the loop back edge count to represent the
store size. This failed for small loop counts (e.g. in the added test,
the loop count was an i2).

Use the index type instead.

Fixes PR52104.

Differential Revision: https://reviews.llvm.org/D111401
2021-10-11 09:39:06 +02:00
Dawid Jurczak 9e65929a8e [DSE] Re-enable calloc transformation with extra care (PR25892)
Transformation from malloc+memset to calloc is always correct and in many situations
it brings significant observable benefits in terms of execution speed and memory consumption [1][2].
Unfortunately there are cases when producing calloc cause performance drops [3].
As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios.
If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after
calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern
before memset basic block sounds like good justification for performing transformation.
Also that method was already suggested by GCC folks [4]. Main reason for change is that for now
to be safe we check for post dominance relation which is way too conservative approach making transformation
"almost" disabled in practice. This patch tends to enable transformation again but with extra care.

[1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc
[2] https://vorpus.org/blog/why-does-calloc-exist/
[3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022

Differential Revision: https://reviews.llvm.org/D110021
2021-10-10 21:47:14 +02:00
Sanjay Patel 05281d95f2 [InstCombine] move fold for "(X-Y) == 0"; NFC
This consolidates related folds that all have a
similar use restriction that may not be necessary.
2021-10-10 11:26:03 -04:00
Sanjay Patel da210f5d34 [InstCombine] canonicalize "(C2 - Y) > C" as (Y + ~C2) < ~C
The test diffs show that we have better analysis/folds for 'add'
(although we should at least have the simplifications
independently, so we don't have the one-use restriction).

This is related to solving regressions that would appear in
transforms related to D111410, and that is part of a series
of enhancements that may eventually helpi solve PR34047.

https://alive2.llvm.org/ce/z/3tB9KG

  define i1 @src(i8 %x, i8 %C, i8 %C2) {
    %sub = sub nuw i8 %C2, %x
    %r = icmp slt i8 %sub, %C
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %C, i8 %C2) {
    %Cnot = xor i8 %C, -1
    %C2not = xor i8 %C2, -1
    %add = add nuw i8 %x, %C2not
    %r = icmp sgt i8 %add, %Cnot
    ret i1 %r
  }
2021-10-10 11:06:49 -04:00
Sanjay Patel acafde09a3 [InstCombine] enhance icmp with sub folds
There were 2 related but over-specified folds for:
C1 - X == C

One allowed multi-use but was limited to equal constants.
The other allowed different constants but disallowed multi-use.

This combines the 2 folds into a more general match.
The test diffs show the multi-use cases that were falling
through the cracks.

https://alive2.llvm.org/ce/z/4_hEt2

  define i1 @src(i8 %x, i8 %subC, i8 %C) {
    %s = sub i8 %subC, %x
    %r = icmp eq i8 %s, %C
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %subC, i8 %C) {
    %newC = sub i8 %subC, %C
    %isneg = icmp eq i8 %x, %newC
    ret i1 %isneg
  }
2021-10-09 11:39:49 -04:00
Dávid Bolvanský 943b304848 Fixed some errors detected by PVS Studio 2021-10-09 17:27:41 +02:00
Nikita Popov ea12adc169 [CanonicalizeFreeze] Drop IVUsers.h include (NFC)
Looking for users of IVUsers, this was a false positive. Only LSR
uses IVUsers.
2021-10-09 17:01:26 +02:00
Max Kazantsev 4c0da23663 [LoopDeletion] Support selects when symbolically evaluating 1st iteration
Adds support for selects for which we know value on the 1st iteration.

Differential Revision: https://reviews.llvm.org/D104111
Reviewed By: nikic
2021-10-09 14:47:44 +07:00
Arthur Eubanks 20a0c482e0 [LICM] Use Align instead of int 2021-10-08 18:26:15 -07:00
Nikita Popov e3129fb792 [LoopFlatten] Mark inner loop as deleted
If a loop is flattened, the inner loop is removed and the LPM
should be informed of this fact, so it can invalidate associated
analyses. To support this, we relax an assertion in LPMUpdater to
allow invalidating non-top-level loops when running in LoopNestMode,
as the pass does not know how exactly it will get scheduled.

Differential Revision: https://reviews.llvm.org/D111350
2021-10-08 23:12:15 +02:00
Andrew Browne 007d98f520 [DFSan] Fix warning: getArgsFunctionType defined but not used
Warning introduced in 61ec2148c5
2021-10-08 11:58:36 -07:00
Arthur Eubanks a3358fcff1 More followup type changes after 05392466 2021-10-08 11:51:36 -07:00
Andrew Browne 61ec2148c5 [DFSan] Remove -dfsan-args-abi support in favor of TLS.
ArgsABI was originally added in https://reviews.llvm.org/D965

Current benchmarking does not show a significant difference.
There is no need to maintain both ABIs.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D111097
2021-10-08 11:18:36 -07:00
Arthur Eubanks 9405217999 Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits""
This reverts commit d68b59f3eb.

This is causing crashes, see D110922 for details.
2021-10-08 10:53:23 -07:00
Philip Reames d694dd0f0d Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc]
This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places.  The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.
2021-10-08 09:50:10 -07:00
Max Kazantsev d68b59f3eb Recommit "[LoopPeel] Peel loops with deoptimizing exits"
Removed obsolete DT verification that should not be there because the
strategy of DT updates has changed.

Differential Revision: https://reviews.llvm.org/D110922
2021-10-08 17:54:27 +07:00
Max Kazantsev 48a5a2d1af Revert "[LoopPeel] Peel loops with deoptimizing exits"
This reverts commit 8a959625c4.

Reported failures with LLVM_ENABLE_EXPENSIVE_CHECKS, need to investigate.
2021-10-08 16:07:59 +07:00
Jingu Kang 4c98070cce [LoopBoundSplit] Handle the case in which exiting block is loop header
Update the incoming value of phi nodes in header of post-loop correctly.

Differential Revision: https://reviews.llvm.org/D110060
2021-10-08 09:13:41 +01:00
Dawid Jurczak dd5991cc6f [LoopIdiom] Transform loop containing memcpy to memmove
The purpose of patch is to learn Loop Idiom Recognize pass how to recognize simple memmove patterns
in similar way like GCC does: https://godbolt.org/z/dKjGvTGff
It's follow-up of following change: https://reviews.llvm.org/D104464

Differential Revision: https://reviews.llvm.org/D107075
2021-10-08 09:56:01 +02:00
Max Kazantsev 8a959625c4 [LoopPeel] Peel loops with deoptimizing exits
Added support for peeling loops with "deoptimizing" exits -
such exits that it or any of its children (or any of their
children, etc) either has a @llvm.experimental.deoptimize call
prior to the terminating return instruction of this basic block
or is terminated with unreachable. All blocks in the the
sequence must have a single successor, maybe except for the last
one.

Previously we only checked the exit block for being deoptimizing.
Now we check if the last reachable block from the exit is deoptimizing.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D110922
Reviewed By: mkazantsev
2021-10-08 10:32:13 +07:00
Nikita Popov c5245dd339 [LoopFlatten] Mark loop analyses as preserved
LoopFlatten does preserve loop analyses (DT, LI and SCEV), but
currently doesn't mark them as preserved in the NewPM (they are
marked as preserved in the LegacyPM). I think this doesn't really
have an effect in the end because the loop pass adaptor will just
assume they're preserved anyway, but let's be explicit about this
for the sake of clarity.

Differential Revision: https://reviews.llvm.org/D111328
2021-10-07 21:56:49 +02:00
Sanjay Patel d95ebef4b8 [InstCombine] ease use check for fold of bitcasted extractelt to trunc
This helps with examples like:
https://llvm.org/PR52057
...but we need at least one more fold to fix that case.
2021-10-07 15:09:34 -04:00
Akira Hatanaka 392a2a554c Refactor code in ObjCARC.cpp. NFC
This is in preparation for another patch I'm planning to send later.
2021-10-07 11:25:01 -07:00
Bjorn Pettersson 7f93bb4a58 [LoopRotate] Forget SCEV values in RewriteUsesOfClonedInstructions
This patch fixes problems reported in PR51981.

When rotating a loop it isn't enough to just forget SCEV for that
loop nest. When rotating we might clone some instructions from the
old header into the preheader, and insert new PHI nodes to merge
values together. There could be users of the original value that are
updated to use the PHI result. And those users were not necessarily
depending on a PHI node earlier, so they weren't cleaned up when just
forgetting all SCEV:s for the loop nest. So we need to explicitly
forget those values to avoid invalid cached SCEV expressions.

Reviewed By: fhahn, mkazantsev

Differential Revision: https://reviews.llvm.org/D110813
2021-10-07 19:36:30 +02:00
Chris Jackson a61c0adba1 [DebugInfo][LSR] Limit the size of SCEV translated to DIExpression
SCEV-based salvaging will use excessive resources if it encounters
very long SCEV expressions. This patch places a limit on the length of
SCEV expression that salvaging will attempt to translate.

Reviewed by: Orlando

Differential Revision: https://reviews.llvm.org/D110558
2021-10-07 15:38:28 +00:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
Kuba Mracek 7329abf2f8 [GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots
Differential Revision: https://reviews.llvm.org/D109114
2021-10-06 15:55:55 -07:00
Arthur Eubanks 72dddce652 More size_t -> uint64_t fixes after 05392466
Fixes some bots where the two differ.
2021-10-06 15:13:47 -07:00
Arthur Eubanks ab7d421869 size_t -> uint64_t after 05392466
Fixes some bots where the two differ.
2021-10-06 13:52:04 -07:00
Arthur Eubanks 05392466f0 Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 13:29:23 -07:00
Arthur Eubanks 569346f274 Revert "Reland [IR] Increase max alignment to 4GB"
This reverts commit 8d64314ffe.
2021-10-06 11:38:11 -07:00
Arthur Eubanks 1b76312e98 Update some types after D110451
To fix mismatched size_t vs uint64_t on some platforms.
2021-10-06 11:27:48 -07:00
Arthur Eubanks 8d64314ffe Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 11:03:51 -07:00
Arthur Eubanks 72cf8b6044 Revert "[IR] Increase max alignment to 4GB"
This reverts commit df84c1fe78.

Breaks some bots
2021-10-06 10:21:35 -07:00
Arthur Eubanks df84c1fe78 [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 09:54:14 -07:00
Simon Pilgrim 0dcd2b40e6 [TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost
We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument.

This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known.

Differential Revision: https://reviews.llvm.org/D111024
2021-10-06 15:40:35 +01:00
Sanjay Patel db231ebdb0 [InstCombine] fold fake vector extract to shift+trunc
We already handle more complicated cases like:
extelt (bitcast (inselt poison, X, 0)) --> trunc (lshr X)

But we missed this simpler pattern:
https://alive2.llvm.org/ce/z/D55h64 / https://alive2.llvm.org/ce/z/GKzzRq

This is part of solving:
https://llvm.org/PR52057

I made the transform depend on legal/desirable int type to avoid creating
a shift of an illegal type (for example i128). I'm not sure if that
restriction is actually necessary, but we can change that as a follow-up
if the backend can deal with integer ops on too-wide illegal types.

The pile of AVX512 test changes are all neutral AFAICT - the x86 backend
seems to know how to turn that into the expected "kmov" instructions.

Differential Revision: https://reviews.llvm.org/D111082
2021-10-06 08:12:05 -04:00
Simon Pilgrim 21661607ca [llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
2021-10-06 12:04:30 +01:00
kpyzhov 7e390dfea7 [AMDGPU] Correction to 095c48fdf3.
Differential Revision: https://reviews.llvm.org/D110337
2021-10-05 21:47:25 -04:00
Sanjay Patel bc72baa047 [InstCombine] add folds for logical nand/nor
This is noted as a regression in:
https://llvm.org/PR52077
2021-10-05 18:31:20 -04:00
Mircea Trofin 7d541eb4d4 [inliner] Mandatory inlining decisions produce remarks
This also removes the need to disable the mandatory inlining phase in
tests.

In a departure from the previous remark, we don't output a 'cost' in
this case, because there's no such thing. We just report that inlining
happened because of the attribute.

Differential Revision: https://reviews.llvm.org/D110891
2021-10-05 14:01:25 -07:00
Sanjay Patel 668beb8ae8 [InstCombine] refactor folds of 'not' instructions; NFC
This removes repeated calls to m_Not, so hopefully a little
more efficient.

Also, we may need to enhance some of these blocks to allow
logical and/or (select of bools).
2021-10-05 16:36:57 -04:00
Alexey Bataev bebe702dbe [SLP]Detect reused scalars in all possible gathers for better vectorization cost.
Some initially gathered nodes missed the check for the reused scalars,
which leads to high gather cost. Such nodes still can be represented as
m gathers + shuffle instead of n gathers, where m < n.

Differential Revision: https://reviews.llvm.org/D111153
2021-10-05 09:43:03 -07:00
kpyzhov 095c48fdf3 [AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration.
The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument.
This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf.

Differential revision: https://reviews.llvm.org/D110337
2021-10-05 09:56:04 -04:00
Sjoerd Meijer cdfc678572 [SCCPSolver] Fix use-after-free in markArgInFuncSpecialization
In SCCPSolver::markArgInFuncSpecialization, the ValueState map may be
reallocated *after* the initial ValueLatticeElement reference is grabbed, but
*before* its use in copy initialization. This causes a use-after-free.  To fix
this, this commit changes the behavior to create the new ValueLatticeElement
before assigning the old one to it.

Patch by: https://github.com/duck-37/

Differential Revision: https://reviews.llvm.org/D111112
2021-10-05 12:56:32 +01:00
wlei fb29d812e4 [CSSPGO] Rename the field of SampleContextFrame
Differential Revision: https://reviews.llvm.org/D110980
2021-10-04 19:06:59 -07:00
Arthur Eubanks 7f28b4d5b7 [LICM] Bail if checking a global/constant for invariant.start
When we check if a load is loop invariant by finding a dominating
invariant.start call, we strip bitcasts until we get to an i8* Value,
and look for an invariant.start use of the i8* Value.

We may accidentally end up at an i8 global and look at a global's uses,
which we shouldn't do in a loop pass. Although we could make this
logic work with globals, that's not currently intended.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111098
2021-10-04 14:14:25 -07:00
Sanjay Patel 6a2a84c253 [InstCombine] add helper for "is desirable int type"; NFC
This splits out the logic from shouldChangeType() that
currently allows 8/16/32-bit transforms even if those
types are not listed as legal in the data layout.

This could be useful as a predicate for vector
insert/extract transforms.

Note that this leaves the subsequent checks in
shouldChangeType() unchanged. We may want to merge
the checks for i1 and/or "ToLegal" into "isDesirable",
but that may alter existing transforms.
2021-10-04 14:30:18 -04:00
Christopher Tetreault df1f03280c [SimpleLoopUnswitch] Allow threshold to be specified zero or more times
Differential Revision: https://reviews.llvm.org/D110594
2021-10-04 09:19:26 -07:00
Bjorn Pettersson 7f84fa4ad4 [TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC
In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer
need the IsSizeTTy lambda function and the SizeTTy object. Instead
we just follow the regular structure of checking for integer types
given an exepected number of bits.
2021-10-04 15:46:39 +02:00
Joseph Huber f074a6a041 [OpenMP] Add options to change Attributor max iterations in OpenMPOpt
This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110749
2021-10-04 09:39:04 -04:00
Jingu Kang 4288b6520a [LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC)
Differential Revision: https://reviews.llvm.org/D109682
2021-10-04 10:17:44 +01:00
David Sherwood 28388645a3 [NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
Avoid creating EpilogueVectorizationForceVF twice.
2021-10-04 10:14:22 +01:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Sanjay Patel f32c0fe8e5 [InstCombine] fold cast of right-shift if high bits are not demanded (3rd try)
The first two tries at this were reverted because they caused an
infinite loop in instcombine.
That should be fixed after a series of patches that ended with
removing the faulty opposing transform:
3fabd98e5b

Original commit message:
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-10-03 10:37:22 -04:00
Dávid Bolvanský 5f2f611880 Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical 2021-10-03 13:58:10 +02:00
hyeongyu kim cf284f6c5e [LSV] Change the default value of InstertElement to poison
This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior.

Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value.
Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111005
2021-10-03 17:57:34 +09:00
Daniil Suchkov 45bd8d9477 [SimpleLoopUnswitch] Don't unswitch constant conditions
Added an additional check for constants after simplification of
"select _, true, false" pattern. We need to prevent attempts to unswitch constant
conditions for two reasons:
a) Doing that doesn't make any sense, in the best case it will just burn
some compile time.
b) SimpleLoopUnswitch isn't designed to unswitch constant conditions
(due to (a)), so attempting that can cause miscompiles. The attached
testcase is an example of such miscompile.

Also added an assertion that'll make sure we aren't trying to replace
constants, so it will help us prevent such bugs in future. The assertion
from D110751 is another layer of protection against such cases.

Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D110752
2021-10-01 21:30:54 +00:00
Sanjay Patel 3fabd98e5b [InstCombine] fold (trunc (X>>C1)) << C to shift+mask directly
This is no-externally-visible-functional-difference-intended.
That is, the test diffs show identical instructions other than
name changes (those are included specifically to verify the logic).

The existing transforms created extra instructions and relied
on subsequent folds to get to the final result, but that could
conflict with other transforms like the proposed D110170 (and
caused that patch to be reverted twice so far because of infinite
combine loops).
2021-10-01 14:22:44 -04:00
Arthur Eubanks a7b4ce9cfd [NFC][AttributeList] Replace index_begin/end with an iterator
We expose the fact that we rely on unsigned wrapping to iterate through
all indexes. This can be confusing. Rather, keeping it as an
implementation detail through an iterator is less confusing and is less
code.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D110885
2021-10-01 10:17:41 -07:00
Kazu Hirata 4f0225f6d2 [Transforms] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-01 09:57:40 -07:00
Kerry McLaughlin c1d46d3461 [SLPVectorizer] Fix crash in isShuffle with scalable vectors
D104809 changed `buildTree_rec` to check for extract element instructions
with scalable types. However, if the extract is extended or truncated,
these changes do not apply and we assert later on in isShuffle(), which
attempts to cast the type of the extract to FixedVectorType.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D110640
2021-10-01 10:56:44 +01:00
Krasimir Georgiev 685f1bfd0a Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns"
It appears to cause stage2 clang build failures, e.g.,
https://lab.llvm.org/buildbot/#/builders/74/builds/7145.

This reverts commit 1fb37334bd.
2021-10-01 11:39:43 +02:00
David Sherwood 1fb37334bd [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-01 08:41:03 +01:00
Arnold Schwaighofer 2df2b27d94 [cora async] Cleanup undefined llvm.coro.async.resume
In situations where the coroutine function is not split we can just
replace the async.resume by null.

rdar://82591919

Differential Revision: https://reviews.llvm.org/D110191
2021-09-30 13:26:53 -07:00
Sanjay Patel 3fcb00df5d [InstCombine] restrict shift-trunc-shift fold to opposite direction shifts
This is NFCI because the pattern with 2 left-shifts should get
folded independently by smaller folds.

The motivation is to refine this block to avoid infinite loops
seen with D110170.
2021-09-30 15:06:13 -04:00
Adrian Prantl 9232ca4712 Improve the effectiveness of BDCE's debug info salvaging
This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!

Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.

This reapplies the previous patch with a fix for a use-after-free.

Differential Revision: https://reviews.llvm.org/D110568
2021-09-30 09:28:49 -07:00
Kazu Hirata f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Anna Thomas 6f2d01376d [LoopPredication] Remove unused variable
After rG452714f8f8037ff37f9358317651d1652e231db2, the Function `F` retrieved in LoopPredication is not used.
Remove this unused variable to stop some buildbots (ASAN, clang-ppc) from failing.
2021-09-30 10:40:47 -04:00
Anna Thomas 452714f8f8 [BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.

This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.

This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM.  This caused massive compile time
in our downstream LPM invocation which contained loop predication.

See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.

Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
2021-09-30 10:27:05 -04:00
Joseph Huber c11ebfea6d [OpenMP][NFC] Fix linting messages in OpenMPOpt
Summary:
This patch addresses some linting messages I keep getting in my editor
when working on OpenMPOpt.
2021-09-29 16:07:33 -04:00
Joseph Huber 87ce7e65f2 [OpenMP] Add missing distribute definitions to AAKernelInfo
Summary:
The RTL functions added in https://reviews.llvm.org/D110429 were
mistakenly left out from the list of safe runtime calls in AAKernelInfo.
This patch adds them in.
2021-09-29 16:06:34 -04:00
Sjoerd Meijer 367df18050 [LoopFlatten] Bail if we can't perform flattening after IV widening
It can happen that after widening of the IV, flattening may not be possible,
e.g. when it is deemed unprofitable. We were not properly checking this, which
resulted in flattening being applied when it shouldn't, also leading to
incorrect results (miscompilation).

This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980)

Differential Revision: https://reviews.llvm.org/D110712
2021-09-29 19:53:34 +01:00
Sanjay Patel ea56dcb730 [InstCombine] fix miscompile from dropRedundantMaskingOfLeftShiftInput()
The test is from https://llvm.org/PR51351.

There are 2 related logic bugs from over-generalizing "lshr" to "any shr",
but I'm not sure how to expose the difference for "MaskC" because instsimplify
already folds ashr of -1.

I'll extend instsimplify to catch the MaskD pattern as a follow-up, but this
patch should be enough to avoid the miscompile.
2021-09-29 11:43:18 -04:00
Florian Hahn 0b4a4cc72d
[IndVarSimplify] Forget phi value after changing incoming value.
This fixes an issue exposed by D71539, where IndVarSimplify tries
to access an invalid cached SCEV expression after making changes to the
underlying PHI instruction earlier.

When changing the incoming value of a PHI, forget the cached SCEV for
the PHI.
2021-09-29 14:44:13 +01:00
Djordje Todorovic f8dfc35256 NFC: [Debugify] Fix a typo when checking variables in the original mode 2021-09-29 04:35:10 -07:00
Jinsong Ji 25c30324e9 [AIX] Change the linkage of profiling counter/data to be private
We generate symbols like `profc`/`profd` for each function, and put them into csects.
When there are weak functions,  we generate weak symbols for the functions as well,
with ELF (and some others),  linker (binder) will discard and only keep one copy of the weak symbols.

However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect,
as binder can NOT discard a subset of a csect.

This creates a unique challenge for using those symbols to calculate some relative offsets.

This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct.

The downside is that we won't be able to discard the duplicated counters and profile data,
but those can not be discarded even if we keep the weak linkage,
due to the binder limitation of not discarding a subsect of the csect either .

Reviewed By: Whitney, MaskRay

Differential Revision: https://reviews.llvm.org/D110422
2021-09-29 00:47:25 +00:00
Sanjay Patel 98fde3489a [InstCombine] reduce redundant code for shl-binop folds
This is NFCI (no-functional-change-intended), but there
are benign diffs possible with commutable ops as seen in
the test diffs.

The transforms were repeated for the commutative opcodes,
but that should not be necessary if we canonicalize the
patterns that we're matching. If both operands of the
binop match, that should get folded eventually.

The transform that starts with a mask op seems to
over-constrain the use checks, so that could be a
potential enhancement.
2021-09-28 17:06:45 -04:00
Nikita Popov abbbc480a1 Revert "Improve the effectiveness of BDCE's debug info salvaging"
This reverts commit f6954bf804.

This breaks the test-suite O3 build:

/home/nikic/llvm-test-suite/build-O3/tools/timeit --summary Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.time /home/nikic/llvm-project/build/bin/clang++  -DNDEBUG  -O3   -w -Werror=date-time -save-stats=obj -save-stats=obj -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -MF Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.d -o Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -c ../Bitcode/Benchmarks/Halide/local_laplacian/local_laplacian.bc
While deleting: i64 %
Use still stuck around after Def is destroyed:  %12620 = mul i64 %12619, <badref>
clang++: /home/nikic/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.
2021-09-28 21:52:27 +02:00
Adrian Prantl f6954bf804 Improve the effectiveness of BDCE's debug info salvaging
This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!

Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.

Differential Revision: https://reviews.llvm.org/D110568
2021-09-28 10:24:51 -07:00
Adrian Prantl 9637b045e6 Improve the effectiveness of ADCE's debug info salvaging
This patch improves the effectiveness of ADCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!

Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.

Differential Revision: https://reviews.llvm.org/D110462
2021-09-28 10:24:50 -07:00
Adrian Prantl 1b998a5f0c Add salvageDebugInfo support for truncating/extending ptr/int conversions.
This patch enables debug info salvaging for truncating/extending ptr
int conversions. The testcase uncovered a bug in adce, which is
addressed separately.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D110461
2021-09-28 10:24:50 -07:00
Alex Richardson ebb3dc0833 [InstCombine] Fold ptrtoint(gep i8 null, x) -> x
This commit is the InstCombine follow-up to the previous constant-folding
change that enables noticeable optimizations for CHERI-enabled targets.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D110247
2021-09-28 17:57:37 +01:00
Alexey Bataev f701505c45 [SLP]Improve vectorization of phi nodes by trying wider vectors.
Try to improve vectorization of the PHI nodes by trying to vectorize
similar instructions at the size of the widest possible vectors, then
aggregating with compatible type PHIs and trying to vectoriza again and
only if this failed, try smaller sizes of the vector factors for
compatible PHI nodes. This restores performance of several benchmarks
after tuning of the fp/int conversion instructions costs.

Differential Revision: https://reviews.llvm.org/D108740
2021-09-28 07:20:36 -07:00
Sjoerd Meijer 0ea77502e2 [LoopFlatten] Updating Phi nodes after IV widening
In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV
widening was introduced. If after widening of the IV the transformation is
*not* applied, the narrow phi node was incorrectly modified, which should only
happen if flattening happens. This can be seen in the added test widen-iv2.ll,
which incorrectly had 1 incoming value, but should have its original 2 incoming
values, which is now restored.

Differential Revision: https://reviews.llvm.org/D110234
2021-09-28 15:09:20 +01:00
Sanjay Patel 1f8bead678 [InstCombine] reduce code for swapped predicate; NFC 2021-09-28 10:00:35 -04:00