Commit Graph

20031 Commits

Author SHA1 Message Date
Kevin P. Neal 770c57898e [FPEnv][InstSimplify] Prepush more tests for D106362.
In working on D106362 I found that a few more tests were needed. I've
been asked to pre-push the tests for that ticket. This should complete
the tests needed for now.
2021-10-04 13:48:34 -04:00
Joseph Huber f074a6a041 [OpenMP] Add options to change Attributor max iterations in OpenMPOpt
This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110749
2021-10-04 09:39:04 -04:00
Philip Reames f39978b84f [SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec
This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Differential Revision: https://reviews.llvm.org/D109845
2021-10-03 15:19:33 -07:00
Sanjay Patel f32c0fe8e5 [InstCombine] fold cast of right-shift if high bits are not demanded (3rd try)
The first two tries at this were reverted because they caused an
infinite loop in instcombine.
That should be fixed after a series of patches that ended with
removing the faulty opposing transform:
3fabd98e5b

Original commit message:
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-10-03 10:37:22 -04:00
Sanjay Patel 88a9c1827e [InstCombine] add test for shl + demanded bits; NFC
This is a reduction of a test that would infinite loop with D110170.
2021-10-03 10:35:59 -04:00
Nikita Popov 3be4acbaa3 [InstSimplify] Add additional load from constant test (NFC)
This case does not get folded, because the GEP indexes too deeply
(to the i8), making the bitcast logic not apply (on the [8 x i8]).
2021-10-03 15:52:36 +02:00
hyeongyu kim cf284f6c5e [LSV] Change the default value of InstertElement to poison
This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior.

Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value.
Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111005
2021-10-03 17:57:34 +09:00
Philip Reames 2ca8a3f213 [SCEV] Stop blindly propagating flags from inbound geps to SCEV nodes
This fixes a violation of the wrap flag rules introduced in c4048d8f. This was also noted in the (very old) PR23527.

The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of *any* gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true.

In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics.

The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817.

It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that.

Differential Revision: https://reviews.llvm.org/D109789
2021-10-01 16:30:44 -07:00
Daniil Suchkov 45bd8d9477 [SimpleLoopUnswitch] Don't unswitch constant conditions
Added an additional check for constants after simplification of
"select _, true, false" pattern. We need to prevent attempts to unswitch constant
conditions for two reasons:
a) Doing that doesn't make any sense, in the best case it will just burn
some compile time.
b) SimpleLoopUnswitch isn't designed to unswitch constant conditions
(due to (a)), so attempting that can cause miscompiles. The attached
testcase is an example of such miscompile.

Also added an assertion that'll make sure we aren't trying to replace
constants, so it will help us prevent such bugs in future. The assertion
from D110751 is another layer of protection against such cases.

Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D110752
2021-10-01 21:30:54 +00:00
Daniil Suchkov bdd52e8bae [Test] Add a test exposing a miscompile in SimpleLoopUnswitch.
The miscompile was introduced by 6b4b1dc6ec.
2021-10-01 21:30:54 +00:00
Sanjay Patel 3fabd98e5b [InstCombine] fold (trunc (X>>C1)) << C to shift+mask directly
This is no-externally-visible-functional-difference-intended.
That is, the test diffs show identical instructions other than
name changes (those are included specifically to verify the logic).

The existing transforms created extra instructions and relied
on subsequent folds to get to the final result, but that could
conflict with other transforms like the proposed D110170 (and
caused that patch to be reverted twice so far because of infinite
combine loops).
2021-10-01 14:22:44 -04:00
Sanjay Patel baac82b4cf [InstCombine] add tests for icmp of gep; NFC 2021-10-01 10:53:23 -04:00
Roman Lebedev 3a0643e9c2
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110755
2021-10-01 17:48:13 +03:00
Roman Lebedev b12aeaec9a
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110754
2021-10-01 17:48:13 +03:00
Roman Lebedev f44d9009c2
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4rY96hnGT - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/vbo37Y3r9 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: =0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110753
2021-10-01 17:48:13 +03:00
Matthew Devereau f085a9db8b [AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops
Replacing fmul and fadd instrinsics with their binary ops results
more succinct AArch64 SVE output, e.g.:

4:   65428041 	fmul	z1.h, p0/m, z1.h, z2.h
8:   65408020 	fadd	z0.h, p0/m, z0.h, z1.h
->
4:   65620020   fmla    z0.h, p0/m, z1.h, z2.h
2021-10-01 11:24:46 +01:00
Kerry McLaughlin c1d46d3461 [SLPVectorizer] Fix crash in isShuffle with scalable vectors
D104809 changed `buildTree_rec` to check for extract element instructions
with scalable types. However, if the extract is extended or truncated,
these changes do not apply and we assert later on in isShuffle(), which
attempts to cast the type of the extract to FixedVectorType.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D110640
2021-10-01 10:56:44 +01:00
Krasimir Georgiev 685f1bfd0a Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns"
It appears to cause stage2 clang build failures, e.g.,
https://lab.llvm.org/buildbot/#/builders/74/builds/7145.

This reverts commit 1fb37334bd.
2021-10-01 11:39:43 +02:00
David Sherwood 1fb37334bd [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-01 08:41:03 +01:00
Arnold Schwaighofer 2df2b27d94 [cora async] Cleanup undefined llvm.coro.async.resume
In situations where the coroutine function is not split we can just
replace the async.resume by null.

rdar://82591919

Differential Revision: https://reviews.llvm.org/D110191
2021-09-30 13:26:53 -07:00
Florian Hahn 1fbdbb5595
Revert "Recommit "[SCEV] Look through single value PHIs." (take 2)"
This reverts commit 764d9aa979.

This patch exposed a few additional cases where SCEV expressions are not
properly invalidated.

See PR52024, PR52023.
2021-09-30 20:53:51 +01:00
Sanjay Patel 66c069d7d6 [InstCombine] add tests for shift-trunc-shift; NFC 2021-09-30 15:06:13 -04:00
Craig Topper 765348298c [CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering
The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted.

I believe the cost model was also incorrect for the old expansion.
The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result
to calculate theirs signs. Then 2 icmps to compare the signs. Followed
by an And. The previous cost model was using 3 icmps and 2 selects.
Digging back through git blame, those 2 selects in the cost model used to
be 2 icmps, but were changed in https://reviews.llvm.org/D90681

Differential Revision: https://reviews.llvm.org/D110739
2021-09-30 09:41:14 -07:00
Adrian Prantl 9232ca4712 Improve the effectiveness of BDCE's debug info salvaging
This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!

Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.

This reapplies the previous patch with a fix for a use-after-free.

Differential Revision: https://reviews.llvm.org/D110568
2021-09-30 09:28:49 -07:00
Anna Thomas 452714f8f8 [BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.

This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.

This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM.  This caused massive compile time
in our downstream LPM invocation which contained loop predication.

See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.

Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
2021-09-30 10:27:05 -04:00
Bjorn Pettersson 3f8027fb67 [test] Update some test cases to use -passes when specifying the pipeline
This updates transform test cases for
  ADCE
  AddDiscriminators
  AggressiveInstCombine
  AlignmentFromAssumptions
  ArgumentPromotion
  BDCE
  CalledValuePropagation
  DCE
  Reg2Mem
  WholeProgramDevirt
to use the -passes syntax when specifying the pipeline.

Given that LLVM_ENABLE_NEW_PASS_MANAGER isn't set to off (which is
a deprecated feature) the updated test cases already used the new
pass manager, but they were using the legacy syntax when specifying
the passes to run. This patch can be seen as a step toward deprecating
that interface.

This patch also removes some redundant RUN lines. Here I am
referring to test cases that had multiple RUN lines verifying both
the legacy "-passname" syntax and the new "-passes=passname" syntax.
Since we switched the default pass manager to "new PM" both RUN lines
have verified the new PM version of the pass (more or less wasting
time running the same test twice), unless LLVM_ENABLE_NEW_PASS_MANAGER
is set to "off". It is assumed that it is enough to run these tests
with the new pass manager now.

Differential Revision: https://reviews.llvm.org/D108472
2021-09-29 21:51:08 +02:00
Sjoerd Meijer 367df18050 [LoopFlatten] Bail if we can't perform flattening after IV widening
It can happen that after widening of the IV, flattening may not be possible,
e.g. when it is deemed unprofitable. We were not properly checking this, which
resulted in flattening being applied when it shouldn't, also leading to
incorrect results (miscompilation).

This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980)

Differential Revision: https://reviews.llvm.org/D110712
2021-09-29 19:53:34 +01:00
Sanjay Patel 4414e2ad97 [InstSimplify] (-1 << x) s>> x --> -1
This was noticed in:
https://llvm.org/PR51351

https://alive2.llvm.org/ce/z/aLxunD
2021-09-29 13:03:12 -04:00
Sanjay Patel ea56dcb730 [InstCombine] fix miscompile from dropRedundantMaskingOfLeftShiftInput()
The test is from https://llvm.org/PR51351.

There are 2 related logic bugs from over-generalizing "lshr" to "any shr",
but I'm not sure how to expose the difference for "MaskC" because instsimplify
already folds ashr of -1.

I'll extend instsimplify to catch the MaskD pattern as a follow-up, but this
patch should be enough to avoid the miscompile.
2021-09-29 11:43:18 -04:00
Sanjay Patel d3e2067c7c [InstSimplify] add tests for (-1 << x) s>> x; NFC 2021-09-29 11:43:18 -04:00
Sanjay Patel ac4f30ac49 [InstCombine] add test for miscompile in dropRedundantMaskingOfLeftShiftInput(); NFC (PR51351) 2021-09-29 11:43:18 -04:00
Simon Pilgrim 17f1fc1e54 [TTI] BasicTTI::getInterleavedMemoryOpCost(): use getScalarizationOverhead()
getScalarizationOverhead() results in a somewhat better cost estimation than counting the insertion/extraction costs directly. Notably, this is still overestimating the costs.

Original Patch by: @lebedev.ri (Roman Lebedev)

Differential Revision: https://reviews.llvm.org/D110713
2021-09-29 16:41:53 +01:00
Florian Hahn 0b4a4cc72d
[IndVarSimplify] Forget phi value after changing incoming value.
This fixes an issue exposed by D71539, where IndVarSimplify tries
to access an invalid cached SCEV expression after making changes to the
underlying PHI instruction earlier.

When changing the incoming value of a PHI, forget the cached SCEV for
the PHI.
2021-09-29 14:44:13 +01:00
Sanjay Patel 98fde3489a [InstCombine] reduce redundant code for shl-binop folds
This is NFCI (no-functional-change-intended), but there
are benign diffs possible with commutable ops as seen in
the test diffs.

The transforms were repeated for the commutative opcodes,
but that should not be necessary if we canonicalize the
patterns that we're matching. If both operands of the
binop match, that should get folded eventually.

The transform that starts with a mask op seems to
over-constrain the use checks, so that could be a
potential enhancement.
2021-09-28 17:06:45 -04:00
Sanjay Patel 6c1a58fe51 [InstCombine] add multi-use tests for shl folds; NFC 2021-09-28 17:06:45 -04:00
Nikita Popov abbbc480a1 Revert "Improve the effectiveness of BDCE's debug info salvaging"
This reverts commit f6954bf804.

This breaks the test-suite O3 build:

/home/nikic/llvm-test-suite/build-O3/tools/timeit --summary Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.time /home/nikic/llvm-project/build/bin/clang++  -DNDEBUG  -O3   -w -Werror=date-time -save-stats=obj -save-stats=obj -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -MF Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.d -o Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -c ../Bitcode/Benchmarks/Halide/local_laplacian/local_laplacian.bc
While deleting: i64 %
Use still stuck around after Def is destroyed:  %12620 = mul i64 %12619, <badref>
clang++: /home/nikic/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.
2021-09-28 21:52:27 +02:00
Anna Thomas 03ce0841da Add profile count. Regenerate check lines. NFC
Function profile counts added to test cases. Regenerated test lines for
loop predication test.
2021-09-28 15:33:49 -04:00
Sanjay Patel 09c575e728 [InstCombine] add/move tests for shl with binop; NFC 2021-09-28 14:46:27 -04:00
Adrian Prantl f6954bf804 Improve the effectiveness of BDCE's debug info salvaging
This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!

Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.

Differential Revision: https://reviews.llvm.org/D110568
2021-09-28 10:24:51 -07:00
Adrian Prantl 9637b045e6 Improve the effectiveness of ADCE's debug info salvaging
This patch improves the effectiveness of ADCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!

Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.

Differential Revision: https://reviews.llvm.org/D110462
2021-09-28 10:24:50 -07:00
Adrian Prantl 1b998a5f0c Add salvageDebugInfo support for truncating/extending ptr/int conversions.
This patch enables debug info salvaging for truncating/extending ptr
int conversions. The testcase uncovered a bug in adce, which is
addressed separately.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D110461
2021-09-28 10:24:50 -07:00
Paul Robinson 56e681afcc [TargetLibraryInfo] Pick new/delete calls by target
There are two sets of new/delete functions, one with Windows/MSVC
mangling and one with Itanium mangling. Mark one set or the other
as unavailable depending on the target.

Split the test malloc-free-delete.ll into three parts: malloc-free.dll
for the C API tests, new-delete-itanium.ll and new-delete-msvc.ll for
the target-specific new/delete tests.

Differential Revision: https://reviews.llvm.org/D110419
2021-09-28 10:10:25 -07:00
Alex Richardson ebb3dc0833 [InstCombine] Fold ptrtoint(gep i8 null, x) -> x
This commit is the InstCombine follow-up to the previous constant-folding
change that enables noticeable optimizations for CHERI-enabled targets.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D110247
2021-09-28 17:57:37 +01:00
Alex Richardson 9049a1c61e [ConstantFolding] Fold ptrtoint(gep i8 null, x) -> x
I was looking at some missed optimizations in CHERI-enabled targets and
noticed that we weren't removing vtable indirection for calls via known
pointers-to-members. The underlying reason for this is that we represent
pointers-to-function-members as {i8 addrspace(200)*, i64} and generate the
constant offsets using (gep i8 null, <index>). We use a constant GEP here
since inttoptr should be avoided for CHERI capabilities. The pointer-to-member
call uses ptrtoint to extract the index, and due to this missing fold we can't
infer the actual value loaded from the vtable.
This is the initial constant folding change for this pattern, I will add
an InstCombine fold as a follow-up.

We could fold all inbounds GEP to null (and therefore the ptrtoint to
zero) since zero is the only valid offset for an inbounds GEP. If the
offset is not zero, that GEP is poison and therefore returning 0 is valid
(https://alive2.llvm.org/ce/z/Gzb5iH). However, Clang currently generates
inbounds GEPs on NULL for hand-written offsetof() expressions, so this
could lead to miscompilations.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110245
2021-09-28 17:57:36 +01:00
Alex Richardson fc0051011e [InstCombine][ConstantFold] Baseline tests for ptrtoint(gep null, x)
Differential Revision: https://reviews.llvm.org/D110244
2021-09-28 17:57:36 +01:00
Alexey Bataev f701505c45 [SLP]Improve vectorization of phi nodes by trying wider vectors.
Try to improve vectorization of the PHI nodes by trying to vectorize
similar instructions at the size of the widest possible vectors, then
aggregating with compatible type PHIs and trying to vectoriza again and
only if this failed, try smaller sizes of the vector factors for
compatible PHI nodes. This restores performance of several benchmarks
after tuning of the fp/int conversion instructions costs.

Differential Revision: https://reviews.llvm.org/D108740
2021-09-28 07:20:36 -07:00
Sjoerd Meijer 0ea77502e2 [LoopFlatten] Updating Phi nodes after IV widening
In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV
widening was introduced. If after widening of the IV the transformation is
*not* applied, the narrow phi node was incorrectly modified, which should only
happen if flattening happens. This can be seen in the added test widen-iv2.ll,
which incorrectly had 1 incoming value, but should have its original 2 incoming
values, which is now restored.

Differential Revision: https://reviews.llvm.org/D110234
2021-09-28 15:09:20 +01:00
Sanjay Patel 1a1aed8da8 [InstCombine] add tests for icmp-gep; NFC
We need more coverage for commuted and (un)signed preds to
verify that things behave as expected here. Currently, we
do not transform signed preds or non-inbounds geps.
2021-09-28 10:00:35 -04:00
Sanjay Patel 72d991c42e [InstCombine] add/move tests for icmp with gep operand(s); NFC 2021-09-28 09:40:52 -04:00
Alexey Bataev 8bacfb9bed [SLP]No need to schedule/check parent for extract{element/value} instruction.
The instruction extractelement/extractvalue are not required to
be scheduled since they only depend on the source vector/aggregate (with
constant indices), smae applies to the parent basic block checks.
Improves compile time and saves scheduling budget.

Differential Revision: https://reviews.llvm.org/D108703
2021-09-28 06:13:55 -07:00
Florian Hahn e2f6290e06
[VectorCombine] Discard ScalarizationResult state in early exit.
ScalarizationResult's destructor makes sure ToFreeze is not ignored if
set. Currently, scalarizeLoadExtract has an early exit if the index is
not safe directly. But when it is SafeWithFreeze, we need to discard the
state first, otherwise we hit the assert in the destructor.

Fixes PR51992.
2021-09-28 12:52:16 +01:00
Florian Hahn 764d9aa979
Recommit "[SCEV] Look through single value PHIs." (take 2)
This reverts commit 8fdac7cb7a.

The issue causing the revert has been fixed a while ago in 60b852092c.

Original message:

    Now that SCEVExpander can preserve LCSSA form,
    we do not have to worry about LCSSA form when
    trying to look through PHIs. SCEVExpander will take
    care of inserting LCSSA PHI nodes as required.

    This increases precision of the analysis in some cases.

    Reviewed By: mkazantsev, bmahjour

    Differential Revision: https://reviews.llvm.org/D71539
2021-09-28 10:32:17 +01:00
Anna Thomas 90fb73aa73 [LoopPred Test] Fix lld-x86_64-win BB failure
Need a more general CHECK line for testcase in 5df9112 for correctly
handling  lld-x86_64-win buildbot.
2021-09-27 21:28:46 -04:00
Anna Thomas 5df9112ce3 Reland "[LoopPredication] Add testcase showing BPI computation. NFC"
This relands commit 16a62d4f.
Relanded after fixing CHECK-LINES for opt pipeline output to be more
general (based on failures seen in buildbot).
2021-09-27 21:15:46 -04:00
Anna Thomas a0a9e3e05f Revert "[LoopPredication] Add testcase showing BPI computation. NFC"
This reverts commit 16a62d4f3d.

Needs some update to check lines to fix bb failure.
2021-09-27 17:08:57 -04:00
Anna Thomas 16a62d4f3d [LoopPredication] Add testcase showing BPI computation. NFC
Precommit testcase for D110438. Since we do not preserve BPI in loop
pass manager, we are forced to compute BPI everytime Loop predication is
invoked.
The patch referenced changes that behaviour by preserving lossy BPI for
loop passes.
2021-09-27 16:54:22 -04:00
Sanjay Patel b75ed244af [InstCombine] add tests for shl-of-sub; NFC 2021-09-27 14:56:01 -04:00
Sanjay Patel 623f93ed1c [InstCombine] add use check to shl transform
This bug was introduced with the refactoring in:
9075edc89b
...but there were no tests to detect it.
2021-09-27 14:10:26 -04:00
Sanjay Patel d992950078 [InstCombine] add tests for opposing shifts separated by trunc; NFC 2021-09-27 14:10:26 -04:00
Jameson Nash e27a6db529 Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location
We see that it might otherwise do:

  %10 = getelementptr {}**, <2 x {}***> %9, <2 x i32> <i32 10, i32 4>
  %11 = bitcast <2 x {}***> %10 to <2 x i64*>
...
  %27 = extractelement <2 x i64*> %11, i32 0
  %28 = bitcast i64* %27 to <2 x i64>*
  store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2

Which is an out-of-bounds store (the extractelement got offset 10
instead of offset 4 as intended). With the fix, we correctly generate
extractelement for i32 1 and generate correct code.

Differential Revision: https://reviews.llvm.org/D106613
2021-09-27 14:06:13 -04:00
Sanjay Patel 21429cf43a [InstCombine] generalize fold for (trunc (X u>> C1)) u>> C
This is another step towards trying to re-apply D110170
by eliminating conflicting transforms that cause infinite loops.
a47c8e40c7 was a previous patch in this direction.

The diffs here are mostly cosmetic, but intentional:
1. The existing code that would handle this pattern in FoldShiftByConstant()
   is limited to 'shl' only now. The formatting change to IsLeftShift shows
   that we could move several transforms into visitShl() directly for
   efficiency because they are not common shift transforms.

2. The tests are regenerated to show new instruction names to prove that
   we are getting (almost) identical logic results.

3. The one case where we differ ("trunc_sandwich_small_shift1") shows that
   we now use a narrow 'and' instruction. Previously, we relied on another
   transform to do that, but it is limited to legal types. That seems to
   be a legacy constraint from when IR analysis and codegen were less robust.

https://alive2.llvm.org/ce/z/JxyGA4

  declare void @llvm.assume(i1)

  define i8 @src(i32 %x, i32 %c0, i8 %c1) {
    ; The sum of the shifts must not overflow the source width.
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %ov = icmp ult i32 %sum, 32
    call void @llvm.assume(i1 %ov)

    %sh1 = lshr i32 %x, %c0
    %tr = trunc i32 %sh1 to i8
    %sh2 = lshr i8 %tr, %c1
    ret i8 %sh2
  }

  define i8 @tgt(i32 %x, i32 %c0, i8 %c1) {
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %maskc = lshr i8 -1, %c1

    %s = lshr i32 %x, %sum
    %t = trunc i32 %s to i8
    %a = and i8 %t, %maskc
    ret i8 %a
  }
2021-09-27 10:57:31 -04:00
Sjoerd Meijer eba76056a3 [FuncSpec] Don't specialise (or crash) on poison or constexpr values
Function specialization was crashing on poison values and constexpr values.
The problem is that these values are not added to the solver, so it crashes
when a lookup is performed for these values. This fixes that by not
specialising on these values. For poison that is obvious, but for constexpr
this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but
specialising on constexpr values wasn't done very intentionally, and need some
more work and tests if we wanted to support this.

As a follow up, we need to look if the solver should exit more gracefully and
return a "don't know", or that it should really support these constexprs.

This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600).

Differential Revision: https://reviews.llvm.org/D110529
2021-09-27 14:58:53 +01:00
Sjoerd Meijer a588ae482b [LoopFlatten] Precommit new test widen-iv2.ll for D110234. 2021-09-27 14:37:44 +01:00
Jun Ma 3a998c06a8 Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""
This reverts commit 8ba2adcf9e.
2021-09-27 20:39:05 +08:00
Florian Hahn 4b581e87df
[LV] Add tests where rt checks may make vectorization unprofitable.
Add a few additional tests which require a large number of runtime
checks for D109368.
2021-09-27 10:32:28 +01:00
Max Kazantsev e787678cef [Test] Add some simple tests where IndVars cannot remove a check in loop
Previously I've added tests that require context for inference, but it
seems tha SCEV can't prove same facts even when the context isn't required.
2021-09-27 12:12:51 +07:00
Sanjay Patel 6063e6b499 [InstCombine] move add after min/max intrinsic
This is another regression noted with the proposal to canonicalize
to the min/max intrinsics in D98152.

Here are Alive2 attempts to show correctness without specifying
exact constants:
https://alive2.llvm.org/ce/z/bvfCwh (smax)
https://alive2.llvm.org/ce/z/of7eqy (smin)
https://alive2.llvm.org/ce/z/2Xtxoh (umax)
https://alive2.llvm.org/ce/z/Rm4Ad8 (umin)
(if you comment out the assume and/or no-wrap, you should see failures)

The different output for the umin test is due to a fold added with
c4fc2cb5b2 :

// umin(x, 1) == zext(x != 0)

We probably want to adjust that, so it applies more generally
(umax --> sext or patterns where we can fold to select-of-constants).
Some folds that were ok when starting with cmp+select may increase
instruction count for the equivalent intrinsic, so we have to decide
if it's worth altering a min/max.

Differential Revision: https://reviews.llvm.org/D110038
2021-09-26 09:49:10 -04:00
Nikita Popov ba664d9066 [AA] Move earliest escape tracking from DSE to AA
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.

This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.

Differential Revision: https://reviews.llvm.org/D110368
2021-09-25 22:40:41 +02:00
Nikita Popov 327bbbb10b [DSE] Make capture check more precise
It is sufficient that the object has not been captured before the
load that produces the pointer we're loading. A capture after that
can not affect the already loaded pointer.

This is small part of D110368 applied separately.
2021-09-25 22:23:19 +02:00
Simon Pilgrim 8c83bd3bd4 [CostModel][X86] Adjust vXi32 multiply costs if it can be performed using PMADDWD
Update the costs to match the codegen from combineMulToPMADDWD - not only can we use PMADDWD is its zero-extended, but also if its a constant or sign-extended from a vXi16 (which can be replaced with a zero-extension).
2021-09-25 16:28:48 +01:00
Simon Pilgrim 5a14edd8ed [InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold.
We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078
2021-09-25 12:57:43 +01:00
Simon Pilgrim 993f3c61b3 [TTI] getUserCost - Ensure a vector insert/extract index is in unsigned 32-bit range
Otherwise fallback to the generic 'unknown index' path

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29050
2021-09-25 10:50:54 +01:00
Nikita Popov 5969e5743a [IR] Handle large element size when calculating GEP indices
This is a fix for the issue reported at
https://reviews.llvm.org/D110043#3019942:
The ElementSize is a uint64_t and as such may be larger than the
index space, or be negative in the index space. This is UB, but
shouldn't cause assertion failures.

We address this by detecting whether the size is too large and
use a zero index in that case (which is always conservatively
correct).

Differential Revision: https://reviews.llvm.org/D110437
2021-09-24 22:20:20 +02:00
Sanjay Patel a47c8e40c7 [InstCombine] fold lshr(trunc(lshr X, C1)) C2
Only the multi-use cases are changing here because there's
another fold that catches the simpler patterns.

But that other fold is the source of infinite loops when we
try to add D110170, so removing that is planned as a follow-up.

Attempt to show the general proof in Alive2:
https://alive2.llvm.org/ce/z/Ns1uS2

Note that the overshift fold-to-zero tests are not
currently handled by instsimplify. If they were, we
could assert that the shift amount sum is less than
the source bitwidth.
2021-09-24 15:44:07 -04:00
Nikita Popov 7774166499 [DSE] Add additional capture tests (NFC)
These test other escape sources and the case of multiple
underlying objects.
2021-09-24 21:13:29 +02:00
Simon Pilgrim bdee805b32 [ConstantFold] ConstantFoldGetElementPtr - use APInt::isNegative() instead of getSExtValue() to support big ints
Fixes fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39197
2021-09-24 18:18:53 +01:00
Simon Pilgrim 36eb6c0134 [SCCP] Regenerate bigint test checks 2021-09-24 18:18:53 +01:00
Hans Wennborg 1e9afab875 Re-apply "[JumpThreading] Ignore free instructions"
It seems the crashes we saw wasn't caused by this (see comments on the review).

> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290

This reverts commit 4604695d7c.
2021-09-24 18:52:30 +02:00
Florian Hahn 6f28fb7081
Recommit "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts the revert commit df56fc6ebb.

This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.

It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.

This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6
2021-09-24 17:13:27 +01:00
Sanjay Patel 638a4147fc [InstCombine] add tests for lshr-trunc-lshr; NFC 2021-09-24 11:38:19 -04:00
Sanjay Patel 3c5500907b Revert "[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)"
This reverts commit bb9333c350.

This exposes another existing bug that causes an infinite loop as shown in
D110170
...so reverting while I look at another fix.
2021-09-24 10:47:35 -04:00
Hans Wennborg 4604695d7c Revert "[JumpThreading] Ignore free instructions"
It caused compiler crashes, see comment on the code review for repro.

> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290

This reverts commit 1e3c6fc7cb.
2021-09-24 16:14:53 +02:00
Nico Weber df56fc6ebb Revert "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts commit 5ce89279c0.
Makes clang crash, see comments on https://reviews.llvm.org/D109844
2021-09-24 09:57:59 -04:00
Nikita Popov 1e3c6fc7cb [JumpThreading] Ignore free instructions
This is basically D108837 but for jump threading. Free instructions
should be ignored for the threading decision. JumpThreading already
skips some free instructions (like pointer bitcasts), but does not
skip various free intrinsics -- in fact, it currently gives them a
fairly large cost of 2.

Differential Revision: https://reviews.llvm.org/D110290
2021-09-23 18:28:36 +02:00
Simon Pilgrim c931d35216 [CostModel][X86] Increase i64 mul cost from 1 to 2
Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization.

Noticed while investigating PR51436.
2021-09-23 14:48:21 +01:00
Sanjay Patel bb9333c350 [InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)
The 1st try at this was reverted because it caused an infinite loop in instcombine.
That should be fixed after:
1cd6b44f26

(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-09-23 09:41:37 -04:00
Florian Hahn 5ce89279c0
[DSE] Track earliest escape, use for loads in isReadClobber.
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b1 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,
    NewPM-O3: +0.05%
    NewPM-ReleaseThinLTO: +0.05%
    NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109844
2021-09-23 12:45:05 +01:00
Alex Richardson 05663dc146 [InstSimplify] Don't lose inbounds when simplifying a GEP
I noticed this while working on a (ptrtoint (gep null, x)) -> x fold.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D110168
2021-09-23 09:25:06 +01:00
Johannes Doerfert c6457dcae8 [OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state
This patch fixes a problem when the AAKernelInfo state was invalidated,
e.g., due to `optnone` for a kernel, but not all parts indicated the
invalidation properly. We further eliminate most full state
invalidations as they should never be necessary.

Differential Revision: https://reviews.llvm.org/D109468
2021-09-23 00:04:30 -05:00
Johannes Doerfert 57822c3f4f [OpenMP][NFC] Repair test that contained nested kernels
The benchmark contained (partially) nested kernels, something we do not
generate nor support.
2021-09-23 00:04:29 -05:00
Johannes Doerfert 92280ae3d8 [OpenMP][NFC] Rerun the test check update script on all OpenMP-Opt tests 2021-09-23 00:04:29 -05:00
Johannes Doerfert 5e835ecb6d [OpenMP][NFC] Precommit test that exposes a bug in our optnone handling 2021-09-23 00:04:29 -05:00
Usman Nadeem 3b12282b0e [AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set
Differential Revision: https://reviews.llvm.org/D109667

Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea
2021-09-22 20:59:46 -07:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Sanjay Patel 1cd6b44f26 [InstCombine] add one-use check to shift-shift transform
We don't want to create extra instructions, and this
could infinite loop with the proposed transform in D110170.
2021-09-22 16:31:12 -04:00
Sanjay Patel 55aa4e92f7 [InstCombine] add test for shift-shift with extra use; NFC 2021-09-22 16:31:12 -04:00
Nikita Popov d8e1203f91 [JumpThreading] Add test with free instructions (NFC)
Which demonstrates that "free" instructions can prevent jump
threading.
2021-09-22 22:29:39 +02:00
Sanjay Patel a85d7a56c7 [ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users
This is another problem exposed by:
https://bugs.llvm.org/PR50836
2021-09-22 15:01:53 -04:00
Sanjay Patel c240169ff2 [Analysis] improve function matching for strlen libcall
The return type of strlen is size_t, not just any integer.

This is a partial fix for an example based on:
https://llvm.org/PR50836

There's another bug here because we can still crash
processing a real strlen or something that looks like it.
2021-09-22 13:50:12 -04:00
Arthur Eubanks e7249e4acf [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest
When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.

This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108837
2021-09-22 09:52:37 -07:00
Hongtao Yu d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Alexey Bataev 173dd896db [SLP][NFC]Add a test to show an issue with incorrectly extracted
pointers.
2021-09-22 09:02:13 -07:00
hyeongyu kim 98e96663f6 [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110230
2021-09-23 00:48:24 +09:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
hyeongyu kim ec8311444a [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110227
2021-09-23 00:14:50 +09:00
hyeongyu kim e5aaf03326 [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110226
2021-09-22 23:18:51 +09:00
Alexey Bataev b6d10beb50 [SLP][NFC]Rename function in the test for better matching of the
transformation.
2021-09-22 05:51:18 -07:00
Florian Hahn a7c6471a85
[Passes] Run vector-combine early with -fenable-matrix.
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like

    using matrix_t = double __attribute__((matrix_type(15, 15)));

    void foo(unsigned i, matrix_t &A, matrix_t &B) {
      for (unsigned j = 0; j < 4; ++j)
        for (unsigned k = 0; k < i; k++)
          B[k][j] -= A[k][j] * B[i][j];
    }

https://clang.godbolt.org/z/6dKxK1Ed7

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D102496
2021-09-22 12:48:32 +01:00
Sanjay Patel c6013f71a4 Revert "[InstCombine] fold cast of right-shift if high bits are not demanded"
This reverts commit 2f6b07316f.

This caused several bots to hit an infinite loop at stage 2,
so it needs to be reverted while figuring out how to fix that.
2021-09-22 07:45:21 -04:00
Yi Kong d0746f2e9b Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar
The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,
Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar.

  SELECT VecC, ScalarIdx, 0

We could splat Idx to a vector but it defeats the purpose of
optimisation. Don't apply the folding rule in this case.

This fixes a regression from commit d561b6fbdb.
2021-09-22 18:11:33 +08:00
Simon Pilgrim 41492d77ba [LoopVectorize][X86] Add operands to make it more obvious what line the CHECK concerns
As we're checking the cost debug analysis these should match the original IR line - so we shouldn't have any variable naming issues.

I'm investigating v4i32 mul -> PMADDDW costs handling (for PR47437) and these CHECK lines were proving tricky to keep track of
2021-09-22 10:08:32 +01:00
Florian Hahn 300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Sanjay Patel 2f6b07316f [InstCombine] fold cast of right-shift if high bits are not demanded
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-09-21 16:09:08 -04:00
Antonio Frighetto 43d6991c2a [IR] Look through bitcast in hasFnAttribute()
A logic incompleteness may lead MemorySSA to be too conservative
in its results. Specifically, when dealing with a call of kind
`call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where
the function `test` is declared with readonly attribute, the
bitcast is not looked through, obscuring function attributes. Hence,
some methods of CallBase (e.g., doesNotReadMemory) could provide
suboptimal results.

Differential Revision: https://reviews.llvm.org/D109888
2021-09-21 21:57:02 +02:00
Nikita Popov f2fa6ad047 [MergeICmps] Don't reorder unmerged comparisons
MergeICmps will currently sort (by offset) all comparisons in a chain,
including those that do not get merged. This is problematic in two ways:

 * We may end up moving the original first block into the middle of
   the chain, in which case the "extra work" instructions will also
   be in the middle of the chain, resulting in invalid IR
   (reported in https://reviews.llvm.org/D108782#3005583).
 * Reordering branches is generally not legal, because it may
   introduce branch on poison, which is UB (PR51845). The merging
   done by MergeICmps is legal as long as we assume that memcmp()
   works on frozen memory, but the reordering of unmerged comparisons
   is definitely incorrect (without inserting freeze instructions),
   so we should avoid it.

There are easier ways to fix the first issue, but I figured it was
worthwhile to do this properly to also fix the second one. What we
now do is to restore the original relative order of (potentially
merged) comparisons.

I took the liberty of dropping the MERGEICMPS_DOT_ON functionality,
because it would be more awkward to implement now (as the before and
after representation is different) and it doesn't seem terribly
useful nowadays.

Differential Revision: https://reviews.llvm.org/D110024
2021-09-21 21:22:12 +02:00
Owen Anderson b5fbbdd202 Teach InstCombine to eliminate malloc-realloc-free triplets.
Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D109988
2021-09-21 18:07:49 +00:00
Danila Malyutin 78b51c7a2c [LSR] Make sure that Factor fits into Base type
Fixes pr42770

Differential Revision: https://reviews.llvm.org/D108772
2021-09-21 20:50:50 +03:00
Ayal Zaks ab6a69dfea [LV] Fix crash for reverse interleaved loads with gap under fold-tail.
This patch fixes the crash found by PR51614:
whenever doing tail folding, interleave groups must be considered under mask.

Another fix D108900 follows for targets that support masked loads and stores:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse - which is currently not supported; rather than (only) asserting when
computing cost and generating code.

Differential Revision: https://reviews.llvm.org/D108891
2021-09-21 20:13:32 +03:00
Dávid Bolvanský c0fdfc9af2 [InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z)
We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer).

Requires reassoc.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109954
2021-09-21 18:20:46 +02:00
Sanjay Patel 08ef71ca92 [InstCombine] move/add tests for trunc-of-lshr; NFC
Planning to reframe a proposed transform in terms of
demanded bits as suggested in D110170.
The new tests end with an 'or'.
2021-09-21 12:11:25 -04:00
Florian Hahn 5131037ea9
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175
2021-09-21 16:54:47 +01:00
Anna Thomas 69921f6f45 [InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.

Also added an API for retrieving a unique undroppable user.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109700
2021-09-21 10:04:04 -04:00
Sanjay Patel af1c5312d7 [InstCombine] add tests for mask-shift with trunc; NFC 2021-09-21 09:41:41 -04:00
Florian Hahn ea27dd7497
[VectorCombine] Add tests which require DT to use info from assumes. 2021-09-21 13:07:06 +01:00
Simon Pilgrim fc8f1e4419 [InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824)
If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector

Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057
2021-09-21 13:01:09 +01:00
David Stenberg 7b4cc09b14 [LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist
This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().

With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():

  [...]

  cont2.i:
    br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i

  handler.type_mismatch3.i:
    %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
    unreachable

  cont4.i:
    unreachable

  [...]

with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.

This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D109221
2021-09-21 11:33:07 +02:00
Max Kazantsev 2c7d5fbc9e [SCEV] Generalize implication when signedness of FoundPred doesn't matter
The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.

Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic
2021-09-21 11:17:56 +07:00
Max Kazantsev 073b254cff [SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block
When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri
2021-09-21 10:45:19 +07:00
Usman Nadeem f417d9d821 [InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses
Differential Revision: https://reviews.llvm.org/D109808

Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e
2021-09-20 18:32:24 -07:00
Nikita Popov dd0226561e [IR] Add helper to convert offset to GEP indices
We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.

Differential Revision: https://reviews.llvm.org/D110043
2021-09-20 20:18:16 +02:00
Florian Hahn 963d3a22b3
[DSE] Add additional tests to cover review comments.
Adds additional tests following comments from D109844.

Also removes unusued in.ptr arguments and places in the call tests that
used loads instead of a getval call.
2021-09-20 17:06:04 +01:00
Alexey Bataev bc69dd62c0 [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-09-20 08:42:19 -07:00
David Sherwood f988f68064 [Analysis] Add support for vscale in computeKnownBitsFromOperator
In ValueTracking.cpp we use a function called
computeKnownBitsFromOperator to determine the known bits of a value.
For the vscale intrinsic if the function contains the vscale_range
attribute we can use the maximum and minimum values of vscale to
determine some known zero and one bits. This should help to improve
code quality by allowing certain optimisations to take place.

Tests added here:

  Transforms/InstCombine/icmp-vscale.ll

Differential Revision: https://reviews.llvm.org/D109883
2021-09-20 15:01:59 +01:00
Max Kazantsev e9d34c5429 [NFC] Add assert and test showing that revert of D109596 wasn't justified
All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify
form and rely on it. Added test that shows that this actually stands.
2021-09-20 12:01:12 +07:00
Max Kazantsev 471217cff8 Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration""
This reverts commit 6fec6552f5.

The patch was reverted on incorrect claim that this patch may break LCSSA form
when the loop is not in a simplify form. All IndVars' transform insure that
the loop is in simplify and LCSSA form, so if it wasn't broken before this
transform, it will also not be broken after it.
2021-09-20 12:01:10 +07:00
Max Kazantsev def15c5fb6 [SCEV] Support negative values in signed/unsigned predicate reasoning
There is a piece of logic that uses the fact that signed and unsigned
versions of the same predicate are equivalent when both values are
non-negative. It's also true when both of them are negative.

Differential Revision: https://reviews.llvm.org/D109957
Reviewed By: nikic
2021-09-20 11:26:33 +07:00
Chris Jackson 5ba8020326 [DebugInfo][LSR] Emit shorter expressions from scev-based salvaging
The scev-based salvaging for LSR can sometimes produce unnecessarily
verbose expressions. This patch adds logic to detect when the value to
be recovered and the induction variable differ by only a constant
offset. Then, the expression to derive the current iteration count can
be omitted from the dbg.value in favour of the offset.

Reviewed by: aprantl

Differential Revision: https://reviews.llvm.org/D109044
2021-09-19 21:41:44 +01:00
Sanjay Patel 9555d1edb0 [InstCombine] add/adjust tests for min/max intrinsics; NFC
If we transform these, we have to propagate no-wrap/undef carefully.
2021-09-19 10:10:37 -04:00
Nikita Popov abe21da670 [Tests] Fix noalias metadata in one more test
Missed this one in 80110aafa0. This
is another test mixing up alias scopes and alias scope lists.
2021-09-18 21:17:05 +02:00
Nikita Popov 80110aafa0 [Tests] Fix incorrect noalias metadata
Mostly this fixes cases where !noalias or !alias.scope were passed
a scope rather than a scope list. In some cases I opted to drop
the metadata entirely instead, because it is not really relevant
to the test.
2021-09-18 20:51:00 +02:00
Usman Nadeem d841c72e09 Precommit tests for D109807 "[InstCombine] Narrow type of logical operation chains when possible"
Change-Id: Iae9bf18619e4926301a866c7e2bd38ced524804e
2021-09-18 11:28:49 -07:00
Joseph Huber 27905eeb89 [Attributor] Change AAExecutionDomain to check intrinsic edges
The AAExecutionDomain instance checks if a BB is executed by the main
thread only. Currently, this only checks the `__kmpc_kernel_init` call
for generic regions to indicate the path taken by the main thread. In
the new runtime, we want to be able to detect basic blocks even in SPMD
mode. For this we enable it to check thread-ID intrinsics being compared
to zero as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109849
2021-09-17 19:51:38 -04:00
Joseph Huber fec2927e07 [OpenMP] Add NoSync attributes to alloc / free shared RTL calls
This patch adds the `nosync` attribute to the `__kmpc_alloc_shared` and
`__kmpc_free_shared` runtime library calls. This allows code analysis to
know that these functins dont contain any barriers. This will help
optimizations reason about the CFG of blocks containing these calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109995
2021-09-17 19:50:13 -04:00
Usman Nadeem 757384abff [AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations
zip1(uzp1(A, B), uzp2(A, B)) --> A
    zip2(uzp1(A, B), uzp2(A, B)) --> B

Differential Revision: https://reviews.llvm.org/D109666

Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd
2021-09-17 15:24:46 -07:00
Sanjay Patel 6da3503602 [InstCombine] add tests for min/max intrinsics with offset operand; NFC 2021-09-17 16:36:46 -04:00
Dávid Bolvanský d01e0c8c66 [NFC] Precommit tests for D109954 2021-09-17 21:59:35 +02:00
Hongtao Yu c5fafc1e73 [CSSPGO] Tweakes to lower pseudo probe runtime overhead
A couple tweaks to

1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary

2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D109976
2021-09-17 12:28:09 -07:00
Alexey Bataev 2b0b1d5319 [SLP][NFC]Add a test for reorder of alt shuffle operands. 2021-09-17 10:42:45 -07:00
Sanjay Patel 41ff7612b3 [InstCombine] allow splat vectors for narrowing masked fold
Mostly cosmetic diffs, but the use of m_APInt matches splat constants.
2021-09-17 11:24:16 -04:00
Sanjay Patel 3a587ed20f [InstCombine] add vector tests for 'and' folds; NFC 2021-09-17 11:24:16 -04:00
Max Kazantsev 690f76958a [Test] Add simple test where IndVars fails to remove checks on negative values 2021-09-17 15:40:32 +07:00
Florian Hahn bdafe3124c
[DSE] Add test cases with stores to objects before they escape.
Test cases where stores to local objects can be removed because the
object does not escape before calls that may read/write to memory.

Includes test from PR50220.
2021-09-17 09:10:53 +01:00
Max Kazantsev 74fa174f33 [Test] One more missing opportunity on IndVars check removal 2021-09-17 14:52:15 +07:00
Sjoerd Meijer 97cc678cc4 [FuncSpec] Specialising on addresses of const global values.
This introduces an option to allow specialising on the address of global
values. This option is off by default because it is likely not that profitable
to do so and needs more investigation. Before, we were specialising on addresses
and thus this changes the default behaviour.

Differential Revision: https://reviews.llvm.org/D109775
2021-09-17 08:07:05 +01:00
Christudasan Devadasan 167ff5280d [GlobalOpt] Do not shrink global to bool for an unfavorable AS
Do not call `TryToShrinkGlobalToBoolean` for address spaces
that don't allow initializers. It inserts an initializer value
while shrinking to bool. Used the target hook introduced with
D109337 to skip this call for the restricted address spaces.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109823
2021-09-16 23:13:30 -04:00
Daniil Suchkov fe950cba8f Update LoopPredication test to fix buildbot failure.
This patch updates tests added in 5f2b7879f1.
2021-09-16 23:37:59 +00:00
Daniil Suchkov 0e36288318 [LoopPredication] Report changes correctly when attempting loop exit predication
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D109855
2021-09-16 22:49:55 +00:00
Daniil Suchkov 5f2b7879f1 NFC. Add tests exposing missing analysis invalidation in LoopPredication. 2021-09-16 22:49:55 +00:00
Jon Roelofs 4b19e7dfae [LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks
Differential revision: https://reviews.llvm.org/D109929
2021-09-16 15:46:26 -07:00
Arthur Eubanks d49cb5b303 [SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest
This makes some tests in vector-reductions-logical.ll more stable when
applying D108837.

The cost of branching is higher when vector ops are involved due to
potential SLP transformations.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108935
2021-09-16 10:50:36 -07:00
Dávid Bolvanský a4a426c9e0 [InstCombine] Added llvm.powi optimizations
If power is even:
powi(-x, p) -> powi(x, p)
powi(fabs(x), p) -> powi(x, p)
powi(copysign(x, y), p) -> powi(x, p)
2021-09-16 19:42:21 +02:00
Dávid Bolvanský c0afb00924 [NFC] Added tests for llvm.powi optimizations 2021-09-16 19:42:21 +02:00
Sjoerd Meijer 2a1ac2e318 [FuncSpec] Add force flag to test case to trigger the transform. NFC. 2021-09-16 17:48:13 +01:00
Florian Hahn 2f97ff8e7b
[SLP] Add additional memory versioning tests. 2021-09-16 13:31:14 +01:00
Anton Afanasyev 6a5f49a1ac [AggressiveInstCombine] Add `{insert/extract}element` to `TruncInstCombine` DAG
Alive2 for `{insert/extract}element`: https://alive2.llvm.org/ce/z/hwy_E-

Actually, no one file of test suite is touched by this change,
which means that is rare pattern not generated by frontend. But
it's worth being in place.

Differential Revision: https://reviews.llvm.org/D109236
2021-09-16 11:24:31 +03:00
Anton Afanasyev 8371a4c9d5 [Test][AggressiveInstCombine] Add test for truncation of vector instructions
Precommit test for D109236
2021-09-16 11:24:30 +03:00
Sjoerd Meijer a4e437e3c9 [FuncSpec] Add a test for specialising on a non-constant global argument. NFC. 2021-09-16 09:17:39 +01:00
Sam Parker c98a8a09b5 [HardwareLoops] Loop guard intrinsic to recognise zext
If a loop count was initially represented by a 32b unsigned int in C
then the hardware-loop pass can recognise the loop guard and insert
the llvm.test.set.loop.iterations intrinsic. If this was instead a
unsigned short/char then clang inserts a zext instruction to expand
the loop count to an i32. This patch adds the necessary pattern
matching to enable the use of lvm.test.set.loop.iterations in those
cases.

Patch by: sherwin-dc

Differential Revision: https://reviews.llvm.org/D109631
2021-09-16 08:33:16 +01:00
Owen Anderson 68079ef0eb Teach SimplifyCFG to fold switches into lookup tables in more cases.
In particular, it couldn't handle cases where lookup table constant
expressions involved bitcasts. This does not seem to come up
frequently in C++, but comes up reasonably often in Rust via
`#[derive(Debug)]`.

Originally reported by pcwalton.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109565
2021-09-15 22:07:08 +00:00
Anna Thomas f9e4aebe4a Revert "[InstCombine] Improve TryToSinkInstruction with multiple uses"
This reverts commit 4ac4e52189.
There are couple of test failures, which needs update of the test cases.

Doing a clean revert and will recommit the change along with fixed
testcases.
2021-09-15 18:03:11 -04:00
Anna Thomas 4ac4e52189 [InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.

Also, the API for retrieving undroppable user has been updated accordingly since
in both usecases (Attributor and InstCombine), we seem to care about the user,
rather than the use.

Reviewed-By: nikic

Differential Revision: https://reviews.llvm.org/D109700
2021-09-15 20:39:38 +00:00
Sanjay Patel e5a32d720e [InstCombine] move extend after insertelement if both operands are extended
I was wondering how instcombine does on the examples in D109236,
and we're missing a basic transform:

inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index)

https://alive2.llvm.org/ce/z/z2aBu9

Note that there are several possible extensions of this fold
(see TODO comments).

Differential Revision: https://reviews.llvm.org/D109537
2021-09-15 14:38:03 -04:00
Anna Thomas 36ef65adc3 [InstCombine] Update test checks through autogeneration, add more tests. NFC
Updated check lines.
Tests precommitted from D109700.
2021-09-15 16:20:30 +00:00
Max Kazantsev c78ed20784 [Test] Add a test showing missing opportunities in branch deletion by indvars 2021-09-15 22:17:10 +07:00
Alexey Bataev 446e11fa29 [SLP][NFC]Add a test for tiny tree with stores and with not
same/alternate instructions.
2021-09-15 08:07:01 -07:00
Filipp Zhinkin f5d8952356 [InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y)
Enabled mul folding optimization that was previously disabled
by being incorrect.
To preserve correctness, mul's operand that is not compared
with zero in select's condition is now frozen.

Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286

Correctness:
https://alive2.llvm.org/ce/z/bHef7J
https://alive2.llvm.org/ce/z/QcR7sf
https://alive2.llvm.org/ce/z/vvBLzt
https://alive2.llvm.org/ce/z/jGDXgq
https://alive2.llvm.org/ce/z/3Pe8Z4
https://alive2.llvm.org/ce/z/LGga8M
https://alive2.llvm.org/ce/z/CTG5fs

Differential Revision: https://reviews.llvm.org/D108408
2021-09-15 09:04:06 -04:00
Sanjay Patel be1028053e [PhaseOrdering] add tests for PR47023; NFC 2021-09-15 08:44:04 -04:00
Simon Pilgrim 0767e43d87 [CostModel][X86] Adjust bitreverse/ctpop/ctlz/cttz AVX2+ costs based on llvm-mca reports
Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).
2021-09-15 13:04:40 +01:00
Florian Hahn 05c120823b
[DSE] Add capture-before test cases with loads.
Add a set of test cases where redundant stores may be removable,
depending on whether a local allocation gets captured before performing
a load.
2021-09-15 11:13:35 +01:00
David Green 61cc873a8e [LV] Recognize intrinsic min/max reductions
This extends the reduction logic in the vectorizer to handle intrinsic
versions of min and max, both the floating point variants already
created by instcombine under fastmath and the integer variants from
D98152.

As a bonus this allows us to match a chain of min or max operations into
a single reduction, similar to how add/mul/etc work.

Differential Revision: https://reviews.llvm.org/D109645
2021-09-15 10:45:50 +01:00
David Green bddfbf91ed [LV] Min/max intrinsic reduction test cases. 2021-09-15 09:56:19 +01:00
Florian Hahn e90d55e1c9
[VPlan] Support sinking recipes with uniform users outside sink target.
This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.

This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.

Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104254
2021-09-15 09:21:39 +01:00
Philip Reames d4e03bccd4 regen an autogened test which is stale 2021-09-14 18:42:23 -07:00
Matt Arsenault 88146230e1 SeparateConstOffsetFromGEP: Fix stack overflow in unreachable code
ConstantOffsetExtractor::Find was infinitely recursing on the add
referencing itself.
2021-09-14 19:49:38 -04:00
Matt Arsenault fdd9761dd1 Attributor: Fix crash on undef in !callees 2021-09-14 19:49:34 -04:00
Florian Hahn e248d69036
Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer."
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Fixes PR50296, PR50288.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D102266
2021-09-14 11:19:12 +01:00
David Green 5a6dfbb8cd [ARM] Teach DemandedVectorElts about VMOVN lanes
The class of instructions that write to narrow top/bottom lanes only
demand the even or odd elements of the input lanes. Which means that a
pair of VMOVNT; VMOVNB demands no lanes from the original input. This
teaches that to instcombine from the target hooks available through
ARMTTIImpl.

Differential Revision: https://reviews.llvm.org/D109325
2021-09-14 11:05:31 +01:00
Arthur Eubanks 096d9814aa [opt] Remove some legacy PM flags
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109664
2021-09-13 15:50:03 -07:00
Kuba Mracek e80ee4cbd9 [GlobalDCE] In VFE support for relative pointers, allow GEP references to the base symbol
This is for Swift VFE support. In some vtable forms that Swift emits, the "base" of a relative pointer is not the global symbol itself directly, but a GEP into it -- so the pointer is relative to a particular field in the global. So getPointerAtOffset() needs to be able to see through the GEP and allow it in a SUB expression, to correctly recognize the offset as a vtable slot.

Differential Revision: https://reviews.llvm.org/D109169
2021-09-13 15:22:11 -07:00
Jon Chesterfield 6775ad2025 [openmp] Apply test change from D109500 2021-09-13 18:36:29 +01:00
Jon Chesterfield bfcf979978 Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu"
This reverts commit d5c049a3f6.
Going to re-commit it in pieces for easier application to 13
2021-09-13 18:25:07 +01:00
Philip Reames 6fec6552f5 Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"
This reverts commit 5a6dfb27ca.  See original review for why.
2021-09-13 10:11:18 -07:00
Philip Reames 5746c76f3f Revert "[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration"
This reverts commit d9ca444835.  See review for why.
2021-09-13 10:10:49 -07:00
dpalermo d5c049a3f6 [openmp] Fix 51647, corrupt bitcode on amdgpu
Patch by @dpalermo

The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader.

This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use.

This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space.

Reviewed By: ronlieb, jdoerfert, dpalermo-phab

Differential Revision: https://reviews.llvm.org/D109500
2021-09-13 15:24:48 +01:00
Florian Hahn 4b342268c0
[VPlan] Add test that requires duplicating recipe for sinking. 2021-09-13 14:21:20 +01:00
Florian Hahn c24fc37e47
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be66 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107580
2021-09-13 11:21:45 +01:00
Jingu Kang 2a26d47a2d [LoopBoundSplit] Check the start value of split cond AddRec
After transformation, we assume the split condition of the pre-loop is always
true. In order to guarantee it, we need to check the start value of the split
cond AddRec satisfies the split condition.

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 10:32:35 +01:00
Max Kazantsev 7e337d8ba2 [Test] Add more sophisticated tests for switch UB opt
Optimizer is being too smart with existing tests, and the transform
gets concealed by following transforms.
2021-09-13 15:28:23 +07:00
Simon Pilgrim 6d970e83fa [InstCombine] Add PR51784 test cases 2021-09-13 08:36:43 +01:00
Max Kazantsev d9ca444835 [IndVars] Break backedge and replace PHIs if loop exits on 1st iteration
Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit
is taken on 1st iteration, we make all branches in the following exiting blocks
always branch out of the loop and their conditions simplified away.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108910
Reviewed By: lebedev.ri
2021-09-13 11:30:55 +07:00
Max Kazantsev 5a6dfb27ca [IndVars] Replace PHIs if loop exits on 1st iteration
This is a part of D108910.
We replace all loop PHIs with values coming from the loop preheader if
we proved that backedge is never taken.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D109596
Reviewed By: lebedev.ri
2021-09-13 10:50:33 +07:00
Florian Hahn 368af7558e
[VPlan] Fix crash caused by not updating all users properly.
Users of VPValues are managed in a vector, so we need to be more
careful when iterating over users while updating them. For now, just
copy them.

Fixes 51798.
2021-09-12 18:10:53 +01:00
Sanjay Patel 3a126134d3 [InstCombine] remove casts from splat-a-bit pattern
https://alive2.llvm.org/ce/z/_AivbM

This case seems clear since we can reduce instruction count
and avoid an intermediate type change, but we might want to
use mask-and-compare for other sequences.

Currently, we can generate more instructions on some related
patterns by trying to use bit-hacks instead of mask+cmp, so
something is not behaving as expected.
2021-09-12 09:18:14 -04:00
Sanjay Patel 75e8eb2b10 [InstCombine] update code/test comments; NFC
Follow-up for post-commit suggestion on:
28afaed691

The comments were partly copied from the original
code, but not updated to match the new code.
2021-09-11 10:53:53 -04:00
Sanjay Patel 28afaed691 [InstCombine] fold sub of min/max intrinsics with invertible ops
This is a translation of the existing code to handle the intrinsics
and another step towards D98152.

https://alive2.llvm.org/ce/z/jA7eBC

This pattern is already handled by underlying folds if there are
less uses, so the minimal tests in this case have extra uses.

The larger cmyk tests show the motivation - when combined with
other folds, we invert a larger sequence and eliminate 'not' ops.
2021-09-11 09:18:46 -04:00
Usman Nadeem ab111e982f Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation""
This reverts commit eee7d225de.
Effectively relanding 98c37247d8
after fixing the failing tests.

Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5
2021-09-10 18:11:24 -07:00
Johannes Doerfert 99ea8ac9f1 Reapply "[OpenMP] Group side-effects to improve guarding efficiency"
This reapplies ca134c3963, effectively
reverting commit d2f206e0af.

Minor test changes to make the test pass.
2021-09-10 15:22:57 -05:00
Johannes Doerfert c09fbbdcfb Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals""
This reapplies commit 7dbba3376f, or, put
differently, this reverts commit d9a8d20827.

The test now requires the amdgpu and nvptx backend explicitly as it
won't work without properly.
2021-09-10 15:22:56 -05:00
Usman Nadeem eee7d225de Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"
This reverts commit 98c37247d8.
2021-09-10 13:01:48 -07:00
Usman Nadeem 98c37247d8 [AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation
Differential Revision: https://reviews.llvm.org/D109118

Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3
2021-09-10 12:52:14 -07:00
Sanjay Patel 188375f478 [InstCombine] add tests for sub of min/max intrinsics; NFC 2021-09-10 14:53:05 -04:00
Joseph Huber 9e2fc0ba37 [OpenMP] Check OpenMP assumptions on call-sites as well
This patch adds functionality to check assumption attributes on call
sites as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109376
2021-09-10 14:52:47 -04:00
Anton Afanasyev 54d8ebbbfd [AggressiveInstCombine] Add `udiv` and `urem` instrs to TruncInstCombine DAG
Add `udiv` and `urem` instructions to the DAG post-dominated by `trunc`,
allowing TruncInstCombine to reduce bitwidth of expressions containing these
instructions. It is sufficient to require that all truncated bits of both
operands are zeros: https://alive2.llvm.org/ce/z/yiithn
(`urem` case is identical).

Differential Revision: https://reviews.llvm.org/D109515
2021-09-10 20:29:08 +03:00
Anton Afanasyev ea7b2c147f [Test][AggressiveInstCombine] Add test for `udiv` and `urem`
Precommit test for D109515
2021-09-10 20:29:08 +03:00
Johannes Doerfert d2f206e0af Revert "[OpenMP] Group side-effects to improve guarding efficiency"
This reverts commit ca134c3963.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:24:00 -05:00
Johannes Doerfert d9a8d20827 Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"
This reverts commit 7dbba3376f.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:23:08 -05:00
Johannes Doerfert 7dbba3376f [GlobalOpt][FIX] Do not embed initializers into AS!=0 globals
Not all address spaces support initializers for globals and we can
therefore not set them without checking if they are allowed. This
patch adds a hook into TTI to check if an AS allows non-undef
initializers. We disable it for all but address space 0 by default,
NVPTX and AMDGPU targets allow all but address space 3.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109337
2021-09-10 12:08:50 -05:00
Johannes Doerfert ca134c3963 [OpenMP] Group side-effects to improve guarding efficiency
When we guard side-effects as part of SPMDzation we do it for
consecutive instructions that need guarding. This patch will try to
reorder guarded side-effects in a block to decrease the number of
guarded regions we need. It does not use any smarts, e.g., alias
analysis, to move side-effects over non-interfering reads. Instead,
it only moves side-effects downwards to the next guarded side-effect
if there was nothing in between that could have possibly be affected.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D109070
2021-09-10 12:08:48 -05:00
Nikita Popov 90ec6dff86 [OpaquePtr] Forbid mixing typed and opaque pointers
Currently, opaque pointers are supported in two forms: The
-force-opaque-pointers mode, where all pointers are opaque and
typed pointers do not exist. And as a simple ptr type that can
coexist with typed pointers.

This patch removes support for the mixed mode. You either get
typed pointers, or you get opaque pointers, but not both. In the
(current) default mode, using ptr is forbidden. In -opaque-pointers
mode, all pointers are opaque.

The motivation here is that the mixed mode introduces additional
issues that don't exist in fully opaque mode. D105155 is an example
of a design problem. Looking at D109259, it would probably need
additional work to support mixed mode (e.g. to generate GEPs for
typed base but opaque result). Mixed mode will also end up
inserting many casts between i8* and ptr, which would require
significant additional work to consistently avoid.

I don't think the mixed mode is particularly valuable, as it
doesn't align with our end goal. The only thing I've found it to
be moderately useful for is adding some opaque pointer tests in
between typed pointer tests, but I think we can live without that.

Differential Revision: https://reviews.llvm.org/D109290
2021-09-10 15:18:23 +02:00
Filipp Zhinkin 745f82b8d9 [InstCombine] add tests for X == 0 ? 0 : X * Y ; NFC
These are the tests for D108408 with current baseline results.
2021-09-10 09:06:48 -04:00
Max Kazantsev 6b69cc09b7 [Test][NFC] Regenerate checks in test 2021-09-10 18:46:10 +07:00
Sjoerd Meijer 6a076fa953 [LoopFlatten] Make the analysis more robust after IV widening
LoopFlatten wasn't triggering on this motivating case after IV widening:

  void foo(int *A, int N, int M) {
    for (int i = 0; i < N; ++i)
      for (int j = 0; j < M; ++j)
        f(A[i*M+j]);
  }

The reason was that the old induction phi nodes were getting in the way. These
narrow and dead induction phis are not always trivially dead, and having both
the narrow and wide IVs confused the analysis and caused it to bail. This adds
some extra bookkeeping for these old phis, so we can filter them out when
checks on phi nodes are performed. Other clean up passes will get rid of these
old phis and increment instructions.

As this was one of the motivating examples from the beginning, it was
surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG
is just slightly different.

Differential Revision: https://reviews.llvm.org/D109309
2021-09-10 12:34:04 +01:00
Rosie Sumpter 9d1bea9c88 [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction
For SVE, when scalarising the PHI instruction the whole vector part is
generated as opposed to creating instructions for each lane for fixed-
width vectors. However, in some cases the lane values may be needed
later (e.g for a load instruction) so we still need to calculate
these values to avoid extractelement being called on the vector part.

Differential Revision: https://reviews.llvm.org/D109445
2021-09-10 11:58:04 +01:00
Sjoerd Meijer 4f9217c519 [FuncSpec] Don't specialise call sites that have the MinSize attribute set
The MinSize attribute can be attached to both the callee and the caller
in the callsite. Function specialisation was already skipped for function
declarations (callees) with MinSize. This also skips specialisations for
the callsite when it has MinSize set.

Differential Revision: https://reviews.llvm.org/D109441
2021-09-10 09:01:45 +01:00
Max Kazantsev 09d0fa3bbe [Test] Add tests showing missed opportunity for SimplifyCFG for switches
Patch by Dmitry Bakunevich!
2021-09-10 11:16:13 +07:00
Nikita Popov af382b9383 [IR] Handle constant expressions in containsUndefinedElement()
If the constant is a constant expression, then getAggregateElement()
will return null. Guard against this before calling HasFn().
2021-09-09 22:04:12 +02:00
Sanjay Patel 3cb5aa8622 [InstCombine] add tests for insertelement with cast ops; NFC 2021-09-09 14:58:39 -04:00
Sanjay Patel 97a4e7b7ff [InstCombine] remove a buggy set of zext-icmp transforms
The motivating case is an infinite loop shown with a reduced test from:
https://llvm.org/PR51762

To solve this, I'm proposing we delete the most obviously broken part of this code.

The bug example shows a fundamental problem: we ask computeKnownBits if a transform
will be profitable, alter the code by creating new instructions, then rely on
computeKnownBits to return the same answer to actually eliminate instructions.

But there's no guarantee that the results will be the same between the 1st and 2nd
calls. In the infinite loop example, we get different answers, so we add
instructions that conflict with some other transform, and we're stuck.

There's at least one other problem visible in the test diff for
`@zext_or_masked_bit_test_uses`: the code doesn't check uses properly, so we can
end up with extra instructions created.

Last, it's not clear if this set of transforms actually improves analysis or
codegen. I spot-checked a few targets and don't see a clear win:
https://godbolt.org/z/x87EWovso

If we do see a regression from this change, codegen seems like the right place to
add a cmp -> bit-hack fold.

If this is too big of a step, we could limit the computeKnownBits calls by not
passing a context instruction and/or limiting the recursion. I checked that those
would stop the infinite loop for PR51762, but that won't guarantee that some other
example does not fall into the same loop.

Differential Revision: https://reviews.llvm.org/D109440
2021-09-09 08:49:39 -04:00
Roman Lebedev 909cba9699
[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125)
I can't seem to wrap my head around the proper fix here,
we should be fine without this requirement, iff we can form this form,
but the naive attempt (https://reviews.llvm.org/D106317) has failed.
So just to unblock the release, put up a restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125
2021-09-09 12:28:09 +03:00
Jun Ma 8ba2adcf9e Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""
Differential Revision: https://reviews.llvm.org/D106056
2021-09-09 16:53:33 +08:00
Nikita Popov 6dfdc6bfd2 [SROA] Support opaque pointers
Make the following changes in order to support opaque pointers in SROA:

 * Generate i8 GEPs for opaque pointers.
 * Explicitly enforce that promotable allocas only have stores of
   the alloca type -- previously this was implicitly enforced.
 * Replace a check for pointer element type with load/store type.

Differential Revision: https://reviews.llvm.org/D109259
2021-09-08 22:25:44 +02:00
Arthur Eubanks b493124ae2 [MemorySSA] Support invariant.group metadata
The implementation is mostly copied from MemDepAnalysis. We want to look
at all loads and stores to the same pointer operand. Bitcasts and zero
GEPs of a pointer are considered the same pointer value. We choose the
most dominating instruction.

Since updating MemorySSA with invariant.group is non-trivial, for now
handling of invariant.group is not cached in any way, so it's part of
the walker. The number of loads/stores with invariant.group is small for
now anyway. We can revisit if this actually noticeably affects compile
times.

To avoid invariant.group affecting optimized uses, we need to have
optimizeUsesInBlock() not use invariant.group in any way.

Co-authored-by: Piotr Padlewski <prazek@google.com>

Reviewed By: asbirlea, nikic, Prazek

Differential Revision: https://reviews.llvm.org/D109134
2021-09-08 13:06:12 -07:00
Akira Hatanaka dea6f71af0 [ObjC][ARC] Use the addresses of the ARC runtime functions instead of
integer 0/1 for the operand of bundle "clang.arc.attachedcall"

https://reviews.llvm.org/D102996 changes the operand of bundle
"clang.arc.attachedcall". This patch makes changes to llvm that are
needed to handle the new IR.

This should make it easier to understand what the IR is doing and also
simplify some of the passes as they no longer have to translate the
integer values to the runtime functions.

Differential Revision: https://reviews.llvm.org/D103000
2021-09-08 11:58:03 -07:00
Joseph Huber 6b9a3ec3a2 [OpenMP] Do not SPMDize generic regions with no parallel
This patch changes SPMDization to not trigger for regions with no
parallelism. Otherwise, this will introduce unnecessary barriers that
will slow the single-threaded region down.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109438
2021-09-08 14:33:15 -04:00
Andrew Litteken c172f1ad39 [IROutliner] Adding supports for multiple exits
When we start outlining across branches, there is the possibility that we will have two different blocks with different output locations, or a single branch that goes to two blocks outside of the region that is being outlined. While the CodeExtractor provides most of the mechanisms by using the return value of the extracted function as the input to a switch statement to correctly branch to the correct location, we need special handling for different output schemas to each location.

This is done by repeating the existing storing scheme for each different exit block. We have a map from the return values used, to the basic block that is used to store the outputs for that particular exit block within the outlined function. Then if needed, we create a switch statement for each return block to branch to the correct set of stored outputs.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106993
2021-09-08 08:58:07 -07:00
Sanjay Patel b041b613e6 [InstCombine] add test for zext with 'or' op; NFC 2021-09-08 09:58:06 -04:00
Sanjay Patel 5639946d89 [InstCombine] remove unnecessary instructions from test; NFC 2021-09-08 09:58:06 -04:00
Sjoerd Meijer a1e8b754eb [FuncSpec] Fix test case: only run funcspec and not any other passes. NFC. 2021-09-08 12:40:58 +01:00
Fraser Cormack 7fb66d4035 [MemCpyOpt] Fix a variety of scalable-type crashes
This patch fixes a variety of crashes resulting from the `MemCpyOptPass`
casting `TypeSize` to a constant integer, whether implicitly or
explicitly.

Since the `MemsetRanges` requires a constant size to work, all but one
of the fixes in this patch simply involve skipping the various
optimizations for scalable types as cleanly as possible.

The optimization of `byval` parameters, however, has been updated to
work on scalable types in theory. In practice, this optimization is only
valid when the length of the `memcpy` is known to be larger than the
scalable type size, which is currently never the case. This could
perhaps be done in the future using the `vscale_range` attribute.

Some implicit casts have been left as they were, under the knowledge
they are only called on aggregate types. These should never be
scalably-sized.

Reviewed By: nikic, tra

Differential Revision: https://reviews.llvm.org/D109329
2021-09-08 11:21:36 +01:00
Max Kazantsev 29d054bf12 [SimplifyCFG] Preserve knowledge about guarding condition by adding assume
This improvement adds "assume" after removal of branch basing on UB in successor block.

Consider the following example:

```
pred:
  x = ...
  cond = x > 10
  br cond, bb, other.succ

bb:
  phi [nullptr, pred], ... // other possible preds
  load(phi) // UB if we came from pred

other.succ:
  // here we know that x <= 10, but this knowledge is lost
  // after the branch is turned to unconditional unless we
  // preserve it with assume.
```

If we remove the branch basing on knowledge about UB in a successor block,
then the fact that x <= 10 is other.succ might be lost if this condition is
not inferrable from any dominating condition. To preserve this knowledge, we
can add assume intrinsic with (possibly inverted) branch condition.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109054
Reviewed By: lebedev.ri
2021-09-08 14:05:17 +07:00
Yuanfang Chen 79c00d3f54 [NPM] Make AddDiscriminators pass required
This is to make sure the pass is not skipped at O0 where optnone is
applied to functions by default.
2021-09-07 17:02:24 -07:00
Sanjay Patel a3c1669b17 [InstCombine] fold icmp equality with 'or' mask ops
This could go either direction since the instruction
count is the same either way, but there are a few
reasons to prefer this:
1. We already do the related transform with 'and'
   (see just above the new code).
2. We try (too hard) to compensate for not having this
   and possibly other folds in transformZExtICmp(),
   and that leads to bugs like https://llvm.org/PR51762 .
3. Codegen looks better across a variety of targets.

https://alive2.llvm.org/ce/z/uEgn4P
2021-09-07 16:34:00 -04:00
Sanjay Patel 9565457aad [InstCombine] add tests for icmp with 'or' ops; NFC 2021-09-07 16:34:00 -04:00
Nikita Popov 58db5f6e95 [ConstFold] Support opaque pointers in constexpr GEPs
Support opaque pointers in SymbolicallyEvaluateGEP() by using the
value type of a GlobalValue base or falling back to i8 if there
isn't one. We don't unconditionally generate i8 GEPs here because
that would lose inrange attribues, and because some optimizations
on globals currently rely on GEP types (e.g. the globals SROA
mentioned in the comment).

Differential Revision: https://reviews.llvm.org/D109297
2021-09-07 20:50:29 +02:00
Roman Lebedev 35fa7b8ad8
Reland "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)"
This reverts commit 91f7a4fff7,
relanding commit 13ec913bdf.

The original commit was reverted because of (essentially)
https://bugs.llvm.org/show_bug.cgi?id=35922
which has now been addressed by d0eeb64be5.
2021-09-07 21:03:52 +03:00
Dávid Bolvanský 3b5f318f5d [InstCombine] ror/rol(X, RotAmt) == C --> X == rol/ror(C, RotAmt) (PR51567)
```
----------------------------------------
define i1 @src(i32 %0) {
%1:
  %2 = fshl i32 %0, i32 %0, i32 25
  %3 = icmp eq i32 %2, 5
  ret i1 %3
}
=>
define i1 @tgt(i32 %0) {
%1:
  %2 = icmp eq i32 %0, 640
  ret i1 %2
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/GdY8Jm

Solves PR51567

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109283
2021-09-07 18:04:58 +02:00
Andrew Litteken 81d3ac0cf2 [IROutliner] Adding outlining for single entry/single exit multiblock regions
Using the similarity found from the IRSimilarity Identifier, we take regions with structural similarity, and deduplicate them into a separate function. The Code Extractor is able to provide most of this functionality.

For simplicity, we start by only outlining regions with a single entry and single exit branch, this reduces the complexity in handling phi nodes outside the region, and handling many sets of outputs for each of the different exit blocks.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D106990
2021-09-07 08:51:54 -07:00
Sanjay Patel 761835521c [InstCombine] add tests for smear-a-set-bit; NFC
Possible follow-ups from patterns discussed in D109155.
2021-09-07 11:42:33 -04:00
Jingu Kang 61d8e27193 [test] precommit a test for D109354 2021-09-07 16:02:53 +01:00
Anton Afanasyev d1f9b21677 [AggressiveInstCombine] Add `AssumptionCache` to aggressive instcombine
Add support for @llvm.assume() to TruncInstCombine allowing
optimizations based on these intrinsics while computing known bits.
2021-09-07 16:45:00 +03:00