llvm-project

Commit Graph

Author	SHA1	Message	Date
Kevin P. Neal	770c57898e	[FPEnv][InstSimplify] Prepush more tests for D106362. In working on D106362 I found that a few more tests were needed. I've been asked to pre-push the tests for that ticket. This should complete the tests needed for now.	2021-10-04 13:48:34 -04:00
Joseph Huber	f074a6a041	[OpenMP] Add options to change Attributor max iterations in OpenMPOpt This patch adds a new command line option `openmp-opt-max-iterations` that controls the maximum number of iterations the attributor will run for when compiling OpenMP target device code. This patch also adds a remark to indicate when the attributor failed because it did not run for enough iterations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110749	2021-10-04 09:39:04 -04:00
Philip Reames	f39978b84f	[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This is an alternate fix to D106852. The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined. In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation. One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece. The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.) Differential Revision: https://reviews.llvm.org/D109845	2021-10-03 15:19:33 -07:00
Sanjay Patel	f32c0fe8e5	[InstCombine] fold cast of right-shift if high bits are not demanded (3rd try) The first two tries at this were reverted because they caused an infinite loop in instcombine. That should be fixed after a series of patches that ended with removing the faulty opposing transform: `3fabd98e5b` Original commit message: (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-10-03 10:37:22 -04:00
Sanjay Patel	88a9c1827e	[InstCombine] add test for shl + demanded bits; NFC This is a reduction of a test that would infinite loop with D110170.	2021-10-03 10:35:59 -04:00
Nikita Popov	3be4acbaa3	[InstSimplify] Add additional load from constant test (NFC) This case does not get folded, because the GEP indexes too deeply (to the i8), making the bitcast logic not apply (on the [8 x i8]).	2021-10-03 15:52:36 +02:00
hyeongyu kim	cf284f6c5e	[LSV] Change the default value of InstertElement to poison This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior. Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value. Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111005	2021-10-03 17:57:34 +09:00
Philip Reames	2ca8a3f213	[SCEV] Stop blindly propagating flags from inbound geps to SCEV nodes This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This was also noted in the (very old) PR23527. The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of any gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true. In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics. The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817. It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that. Differential Revision: https://reviews.llvm.org/D109789	2021-10-01 16:30:44 -07:00
Daniil Suchkov	45bd8d9477	[SimpleLoopUnswitch] Don't unswitch constant conditions Added an additional check for constants after simplification of "select _, true, false" pattern. We need to prevent attempts to unswitch constant conditions for two reasons: a) Doing that doesn't make any sense, in the best case it will just burn some compile time. b) SimpleLoopUnswitch isn't designed to unswitch constant conditions (due to (a)), so attempting that can cause miscompiles. The attached testcase is an example of such miscompile. Also added an assertion that'll make sure we aren't trying to replace constants, so it will help us prevent such bugs in future. The assertion from D110751 is another layer of protection against such cases. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D110752	2021-10-01 21:30:54 +00:00
Daniil Suchkov	bdd52e8bae	[Test] Add a test exposing a miscompile in SimpleLoopUnswitch. The miscompile was introduced by `6b4b1dc6ec`.	2021-10-01 21:30:54 +00:00
Sanjay Patel	3fabd98e5b	[InstCombine] fold (trunc (X>>C1)) << C to shift+mask directly This is no-externally-visible-functional-difference-intended. That is, the test diffs show identical instructions other than name changes (those are included specifically to verify the logic). The existing transforms created extra instructions and relied on subsequent folds to get to the final result, but that could conflict with other transforms like the proposed D110170 (and caused that patch to be reverted twice so far because of infinite combine loops).	2021-10-01 14:22:44 -04:00
Sanjay Patel	baac82b4cf	[InstCombine] add tests for icmp of gep; NFC	2021-10-01 10:53:23 -04:00
Roman Lebedev	3a0643e9c2	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110755	2021-10-01 17:48:13 +03:00
Roman Lebedev	b12aeaec9a	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `2`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110754	2021-10-01 17:48:13 +03:00
Roman Lebedev	f44d9009c2	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/4rY96hnGT - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/vbo37Y3r9 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: =0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110753	2021-10-01 17:48:13 +03:00
Matthew Devereau	f085a9db8b	[AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops Replacing fmul and fadd instrinsics with their binary ops results more succinct AArch64 SVE output, e.g.: 4: 65428041 fmul z1.h, p0/m, z1.h, z2.h 8: 65408020 fadd z0.h, p0/m, z0.h, z1.h -> 4: 65620020 fmla z0.h, p0/m, z1.h, z2.h	2021-10-01 11:24:46 +01:00
Kerry McLaughlin	c1d46d3461	[SLPVectorizer] Fix crash in isShuffle with scalable vectors D104809 changed `buildTree_rec` to check for extract element instructions with scalable types. However, if the extract is extended or truncated, these changes do not apply and we assert later on in isShuffle(), which attempts to cast the type of the extract to FixedVectorType. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D110640	2021-10-01 10:56:44 +01:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Arnold Schwaighofer	2df2b27d94	[cora async] Cleanup undefined llvm.coro.async.resume In situations where the coroutine function is not split we can just replace the async.resume by null. rdar://82591919 Differential Revision: https://reviews.llvm.org/D110191	2021-09-30 13:26:53 -07:00
Florian Hahn	1fbdbb5595	Revert "Recommit "[SCEV] Look through single value PHIs." (take 2)" This reverts commit `764d9aa979`. This patch exposed a few additional cases where SCEV expressions are not properly invalidated. See PR52024, PR52023.	2021-09-30 20:53:51 +01:00
Sanjay Patel	66c069d7d6	[InstCombine] add tests for shift-trunc-shift; NFC	2021-09-30 15:06:13 -04:00
Craig Topper	765348298c	[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted. I believe the cost model was also incorrect for the old expansion. The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result to calculate theirs signs. Then 2 icmps to compare the signs. Followed by an And. The previous cost model was using 3 icmps and 2 selects. Digging back through git blame, those 2 selects in the cost model used to be 2 icmps, but were changed in https://reviews.llvm.org/D90681 Differential Revision: https://reviews.llvm.org/D110739	2021-09-30 09:41:14 -07:00
Adrian Prantl	9232ca4712	Improve the effectiveness of BDCE's debug info salvaging This patch improves the effectiveness of BDCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. This reapplies the previous patch with a fix for a use-after-free. Differential Revision: https://reviews.llvm.org/D110568	2021-09-30 09:28:49 -07:00
Anna Thomas	452714f8f8	[BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults This is analogous to D86156 (which preserves "lossy" BFI in loop passes). Lossy means that the analysis preserved may not be up to date with regards to new blocks that are added in loop passes, but BPI will not contain stale pointers to basic blocks that are deleted by the loop passes. This is achieved through BasicBlockCallbackVH in BPI, which calls eraseBlock that updates the data structures in BPI whenever a basic block is deleted. This patch does not have any changes in the upstream pipeline, since none of the loop passes in the pipeline use BPI currently. However, since BPI wasn't previously preserved in loop passes, the loop predication pass was invoking BPI on the entire function every time it ran in an LPM. This caused massive compile time in our downstream LPM invocation which contained loop predication. See updated test with an invocation of a loop-pipeline containing loop predication and -debug-pass turned ON. Reviewed-By: asbirlea, modimo Differential Revision: https://reviews.llvm.org/D110438	2021-09-30 10:27:05 -04:00
Bjorn Pettersson	3f8027fb67	[test] Update some test cases to use -passes when specifying the pipeline This updates transform test cases for ADCE AddDiscriminators AggressiveInstCombine AlignmentFromAssumptions ArgumentPromotion BDCE CalledValuePropagation DCE Reg2Mem WholeProgramDevirt to use the -passes syntax when specifying the pipeline. Given that LLVM_ENABLE_NEW_PASS_MANAGER isn't set to off (which is a deprecated feature) the updated test cases already used the new pass manager, but they were using the legacy syntax when specifying the passes to run. This patch can be seen as a step toward deprecating that interface. This patch also removes some redundant RUN lines. Here I am referring to test cases that had multiple RUN lines verifying both the legacy "-passname" syntax and the new "-passes=passname" syntax. Since we switched the default pass manager to "new PM" both RUN lines have verified the new PM version of the pass (more or less wasting time running the same test twice), unless LLVM_ENABLE_NEW_PASS_MANAGER is set to "off". It is assumed that it is enough to run these tests with the new pass manager now. Differential Revision: https://reviews.llvm.org/D108472	2021-09-29 21:51:08 +02:00
Sjoerd Meijer	367df18050	[LoopFlatten] Bail if we can't perform flattening after IV widening It can happen that after widening of the IV, flattening may not be possible, e.g. when it is deemed unprofitable. We were not properly checking this, which resulted in flattening being applied when it shouldn't, also leading to incorrect results (miscompilation). This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980) Differential Revision: https://reviews.llvm.org/D110712	2021-09-29 19:53:34 +01:00
Sanjay Patel	4414e2ad97	[InstSimplify] (-1 << x) s>> x --> -1 This was noticed in: https://llvm.org/PR51351 https://alive2.llvm.org/ce/z/aLxunD	2021-09-29 13:03:12 -04:00
Sanjay Patel	ea56dcb730	[InstCombine] fix miscompile from dropRedundantMaskingOfLeftShiftInput() The test is from https://llvm.org/PR51351. There are 2 related logic bugs from over-generalizing "lshr" to "any shr", but I'm not sure how to expose the difference for "MaskC" because instsimplify already folds ashr of -1. I'll extend instsimplify to catch the MaskD pattern as a follow-up, but this patch should be enough to avoid the miscompile.	2021-09-29 11:43:18 -04:00
Sanjay Patel	d3e2067c7c	[InstSimplify] add tests for (-1 << x) s>> x; NFC	2021-09-29 11:43:18 -04:00
Sanjay Patel	ac4f30ac49	[InstCombine] add test for miscompile in dropRedundantMaskingOfLeftShiftInput(); NFC (PR51351)	2021-09-29 11:43:18 -04:00
Simon Pilgrim	17f1fc1e54	[TTI] BasicTTI::getInterleavedMemoryOpCost(): use getScalarizationOverhead() getScalarizationOverhead() results in a somewhat better cost estimation than counting the insertion/extraction costs directly. Notably, this is still overestimating the costs. Original Patch by: @lebedev.ri (Roman Lebedev) Differential Revision: https://reviews.llvm.org/D110713	2021-09-29 16:41:53 +01:00
Florian Hahn	0b4a4cc72d	[IndVarSimplify] Forget phi value after changing incoming value. This fixes an issue exposed by D71539, where IndVarSimplify tries to access an invalid cached SCEV expression after making changes to the underlying PHI instruction earlier. When changing the incoming value of a PHI, forget the cached SCEV for the PHI.	2021-09-29 14:44:13 +01:00
Sanjay Patel	98fde3489a	[InstCombine] reduce redundant code for shl-binop folds This is NFCI (no-functional-change-intended), but there are benign diffs possible with commutable ops as seen in the test diffs. The transforms were repeated for the commutative opcodes, but that should not be necessary if we canonicalize the patterns that we're matching. If both operands of the binop match, that should get folded eventually. The transform that starts with a mask op seems to over-constrain the use checks, so that could be a potential enhancement.	2021-09-28 17:06:45 -04:00
Sanjay Patel	6c1a58fe51	[InstCombine] add multi-use tests for shl folds; NFC	2021-09-28 17:06:45 -04:00
Nikita Popov	abbbc480a1	Revert "Improve the effectiveness of BDCE's debug info salvaging" This reverts commit `f6954bf804`. This breaks the test-suite O3 build: /home/nikic/llvm-test-suite/build-O3/tools/timeit --summary Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.time /home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -save-stats=obj -save-stats=obj -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -MF Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.d -o Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -c ../Bitcode/Benchmarks/Halide/local_laplacian/local_laplacian.bc While deleting: i64 % Use still stuck around after Def is destroyed: %12620 = mul i64 %12619, <badref> clang++: /home/nikic/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.	2021-09-28 21:52:27 +02:00
Anna Thomas	03ce0841da	Add profile count. Regenerate check lines. NFC Function profile counts added to test cases. Regenerated test lines for loop predication test.	2021-09-28 15:33:49 -04:00
Sanjay Patel	09c575e728	[InstCombine] add/move tests for shl with binop; NFC	2021-09-28 14:46:27 -04:00
Adrian Prantl	f6954bf804	Improve the effectiveness of BDCE's debug info salvaging This patch improves the effectiveness of BDCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. Differential Revision: https://reviews.llvm.org/D110568	2021-09-28 10:24:51 -07:00
Adrian Prantl	9637b045e6	Improve the effectiveness of ADCE's debug info salvaging This patch improves the effectiveness of ADCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. Differential Revision: https://reviews.llvm.org/D110462	2021-09-28 10:24:50 -07:00
Adrian Prantl	1b998a5f0c	Add salvageDebugInfo support for truncating/extending ptr/int conversions. This patch enables debug info salvaging for truncating/extending ptr int conversions. The testcase uncovered a bug in adce, which is addressed separately. rdar://80227769 Differential Revision: https://reviews.llvm.org/D110461	2021-09-28 10:24:50 -07:00
Paul Robinson	56e681afcc	[TargetLibraryInfo] Pick new/delete calls by target There are two sets of new/delete functions, one with Windows/MSVC mangling and one with Itanium mangling. Mark one set or the other as unavailable depending on the target. Split the test malloc-free-delete.ll into three parts: malloc-free.dll for the C API tests, new-delete-itanium.ll and new-delete-msvc.ll for the target-specific new/delete tests. Differential Revision: https://reviews.llvm.org/D110419	2021-09-28 10:10:25 -07:00
Alex Richardson	ebb3dc0833	[InstCombine] Fold ptrtoint(gep i8 null, x) -> x This commit is the InstCombine follow-up to the previous constant-folding change that enables noticeable optimizations for CHERI-enabled targets. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110247	2021-09-28 17:57:37 +01:00
Alex Richardson	9049a1c61e	[ConstantFolding] Fold ptrtoint(gep i8 null, x) -> x I was looking at some missed optimizations in CHERI-enabled targets and noticed that we weren't removing vtable indirection for calls via known pointers-to-members. The underlying reason for this is that we represent pointers-to-function-members as {i8 addrspace(200)*, i64} and generate the constant offsets using (gep i8 null, <index>). We use a constant GEP here since inttoptr should be avoided for CHERI capabilities. The pointer-to-member call uses ptrtoint to extract the index, and due to this missing fold we can't infer the actual value loaded from the vtable. This is the initial constant folding change for this pattern, I will add an InstCombine fold as a follow-up. We could fold all inbounds GEP to null (and therefore the ptrtoint to zero) since zero is the only valid offset for an inbounds GEP. If the offset is not zero, that GEP is poison and therefore returning 0 is valid (https://alive2.llvm.org/ce/z/Gzb5iH). However, Clang currently generates inbounds GEPs on NULL for hand-written offsetof() expressions, so this could lead to miscompilations. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110245	2021-09-28 17:57:36 +01:00
Alex Richardson	fc0051011e	[InstCombine][ConstantFold] Baseline tests for ptrtoint(gep null, x) Differential Revision: https://reviews.llvm.org/D110244	2021-09-28 17:57:36 +01:00
Alexey Bataev	f701505c45	[SLP]Improve vectorization of phi nodes by trying wider vectors. Try to improve vectorization of the PHI nodes by trying to vectorize similar instructions at the size of the widest possible vectors, then aggregating with compatible type PHIs and trying to vectoriza again and only if this failed, try smaller sizes of the vector factors for compatible PHI nodes. This restores performance of several benchmarks after tuning of the fp/int conversion instructions costs. Differential Revision: https://reviews.llvm.org/D108740	2021-09-28 07:20:36 -07:00
Sjoerd Meijer	0ea77502e2	[LoopFlatten] Updating Phi nodes after IV widening In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV widening was introduced. If after widening of the IV the transformation is not applied, the narrow phi node was incorrectly modified, which should only happen if flattening happens. This can be seen in the added test widen-iv2.ll, which incorrectly had 1 incoming value, but should have its original 2 incoming values, which is now restored. Differential Revision: https://reviews.llvm.org/D110234	2021-09-28 15:09:20 +01:00
Sanjay Patel	1a1aed8da8	[InstCombine] add tests for icmp-gep; NFC We need more coverage for commuted and (un)signed preds to verify that things behave as expected here. Currently, we do not transform signed preds or non-inbounds geps.	2021-09-28 10:00:35 -04:00
Sanjay Patel	72d991c42e	[InstCombine] add/move tests for icmp with gep operand(s); NFC	2021-09-28 09:40:52 -04:00
Alexey Bataev	8bacfb9bed	[SLP]No need to schedule/check parent for extract{element/value} instruction. The instruction extractelement/extractvalue are not required to be scheduled since they only depend on the source vector/aggregate (with constant indices), smae applies to the parent basic block checks. Improves compile time and saves scheduling budget. Differential Revision: https://reviews.llvm.org/D108703	2021-09-28 06:13:55 -07:00
Florian Hahn	e2f6290e06	[VectorCombine] Discard ScalarizationResult state in early exit. ScalarizationResult's destructor makes sure ToFreeze is not ignored if set. Currently, scalarizeLoadExtract has an early exit if the index is not safe directly. But when it is SafeWithFreeze, we need to discard the state first, otherwise we hit the assert in the destructor. Fixes PR51992.	2021-09-28 12:52:16 +01:00
Florian Hahn	764d9aa979	Recommit "[SCEV] Look through single value PHIs." (take 2) This reverts commit `8fdac7cb7a`. The issue causing the revert has been fixed a while ago in `60b852092c`. Original message: Now that SCEVExpander can preserve LCSSA form, we do not have to worry about LCSSA form when trying to look through PHIs. SCEVExpander will take care of inserting LCSSA PHI nodes as required. This increases precision of the analysis in some cases. Reviewed By: mkazantsev, bmahjour Differential Revision: https://reviews.llvm.org/D71539	2021-09-28 10:32:17 +01:00
Anna Thomas	90fb73aa73	[LoopPred Test] Fix lld-x86_64-win BB failure Need a more general CHECK line for testcase in `5df9112` for correctly handling lld-x86_64-win buildbot.	2021-09-27 21:28:46 -04:00
Anna Thomas	5df9112ce3	Reland "[LoopPredication] Add testcase showing BPI computation. NFC" This relands commit `16a62d4f`. Relanded after fixing CHECK-LINES for opt pipeline output to be more general (based on failures seen in buildbot).	2021-09-27 21:15:46 -04:00
Anna Thomas	a0a9e3e05f	Revert "[LoopPredication] Add testcase showing BPI computation. NFC" This reverts commit `16a62d4f3d`. Needs some update to check lines to fix bb failure.	2021-09-27 17:08:57 -04:00
Anna Thomas	16a62d4f3d	[LoopPredication] Add testcase showing BPI computation. NFC Precommit testcase for D110438. Since we do not preserve BPI in loop pass manager, we are forced to compute BPI everytime Loop predication is invoked. The patch referenced changes that behaviour by preserving lossy BPI for loop passes.	2021-09-27 16:54:22 -04:00
Sanjay Patel	b75ed244af	[InstCombine] add tests for shl-of-sub; NFC	2021-09-27 14:56:01 -04:00
Sanjay Patel	623f93ed1c	[InstCombine] add use check to shl transform This bug was introduced with the refactoring in: `9075edc89b` ...but there were no tests to detect it.	2021-09-27 14:10:26 -04:00
Sanjay Patel	d992950078	[InstCombine] add tests for opposing shifts separated by trunc; NFC	2021-09-27 14:10:26 -04:00
Jameson Nash	e27a6db529	Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location We see that it might otherwise do: %10 = getelementptr {}, <2 x {}> %9, <2 x i32> <i32 10, i32 4> %11 = bitcast <2 x {}*> %10 to <2 x i64> ... %27 = extractelement <2 x i64> %11, i32 0 %28 = bitcast i64 %27 to <2 x i64>* store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2 Which is an out-of-bounds store (the extractelement got offset 10 instead of offset 4 as intended). With the fix, we correctly generate extractelement for i32 1 and generate correct code. Differential Revision: https://reviews.llvm.org/D106613	2021-09-27 14:06:13 -04:00
Sanjay Patel	21429cf43a	[InstCombine] generalize fold for (trunc (X u>> C1)) u>> C This is another step towards trying to re-apply D110170 by eliminating conflicting transforms that cause infinite loops. `a47c8e40c7` was a previous patch in this direction. The diffs here are mostly cosmetic, but intentional: 1. The existing code that would handle this pattern in FoldShiftByConstant() is limited to 'shl' only now. The formatting change to IsLeftShift shows that we could move several transforms into visitShl() directly for efficiency because they are not common shift transforms. 2. The tests are regenerated to show new instruction names to prove that we are getting (almost) identical logic results. 3. The one case where we differ ("trunc_sandwich_small_shift1") shows that we now use a narrow 'and' instruction. Previously, we relied on another transform to do that, but it is limited to legal types. That seems to be a legacy constraint from when IR analysis and codegen were less robust. https://alive2.llvm.org/ce/z/JxyGA4 declare void @llvm.assume(i1) define i8 @src(i32 %x, i32 %c0, i8 %c1) { ; The sum of the shifts must not overflow the source width. %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %ov = icmp ult i32 %sum, 32 call void @llvm.assume(i1 %ov) %sh1 = lshr i32 %x, %c0 %tr = trunc i32 %sh1 to i8 %sh2 = lshr i8 %tr, %c1 ret i8 %sh2 } define i8 @tgt(i32 %x, i32 %c0, i8 %c1) { %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %maskc = lshr i8 -1, %c1 %s = lshr i32 %x, %sum %t = trunc i32 %s to i8 %a = and i8 %t, %maskc ret i8 %a }	2021-09-27 10:57:31 -04:00
Sjoerd Meijer	eba76056a3	[FuncSpec] Don't specialise (or crash) on poison or constexpr values Function specialization was crashing on poison values and constexpr values. The problem is that these values are not added to the solver, so it crashes when a lookup is performed for these values. This fixes that by not specialising on these values. For poison that is obvious, but for constexpr this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but specialising on constexpr values wasn't done very intentionally, and need some more work and tests if we wanted to support this. As a follow up, we need to look if the solver should exit more gracefully and return a "don't know", or that it should really support these constexprs. This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600). Differential Revision: https://reviews.llvm.org/D110529	2021-09-27 14:58:53 +01:00
Sjoerd Meijer	a588ae482b	[LoopFlatten] Precommit new test widen-iv2.ll for D110234.	2021-09-27 14:37:44 +01:00
Jun Ma	3a998c06a8	Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""" This reverts commit `8ba2adcf9e`.	2021-09-27 20:39:05 +08:00
Florian Hahn	4b581e87df	[LV] Add tests where rt checks may make vectorization unprofitable. Add a few additional tests which require a large number of runtime checks for D109368.	2021-09-27 10:32:28 +01:00
Max Kazantsev	e787678cef	[Test] Add some simple tests where IndVars cannot remove a check in loop Previously I've added tests that require context for inference, but it seems tha SCEV can't prove same facts even when the context isn't required.	2021-09-27 12:12:51 +07:00
Sanjay Patel	6063e6b499	[InstCombine] move add after min/max intrinsic This is another regression noted with the proposal to canonicalize to the min/max intrinsics in D98152. Here are Alive2 attempts to show correctness without specifying exact constants: https://alive2.llvm.org/ce/z/bvfCwh (smax) https://alive2.llvm.org/ce/z/of7eqy (smin) https://alive2.llvm.org/ce/z/2Xtxoh (umax) https://alive2.llvm.org/ce/z/Rm4Ad8 (umin) (if you comment out the assume and/or no-wrap, you should see failures) The different output for the umin test is due to a fold added with `c4fc2cb5b2` : // umin(x, 1) == zext(x != 0) We probably want to adjust that, so it applies more generally (umax --> sext or patterns where we can fold to select-of-constants). Some folds that were ok when starting with cmp+select may increase instruction count for the equivalent intrinsic, so we have to decide if it's worth altering a min/max. Differential Revision: https://reviews.llvm.org/D110038	2021-09-26 09:49:10 -04:00
Nikita Popov	ba664d9066	[AA] Move earliest escape tracking from DSE to AA This is a followup to D109844 (and alternative to D109907), which integrates the new "earliest escape" tracking into AliasAnalysis. This is done by replacing the pre-existing context-free capture cache in AAQueryInfo with a replaceable (virtual) object with two implementations: The SimpleCaptureInfo implements the previous behavior (check whether object is captured at all), while EarliestEscapeInfo implements the new behavior from DSE. This combines the "earliest escape" analysis with the full power of BasicAA: It subsumes the call handling from D109907, considers a wider range of escape sources, and works with AA recursion. The compile-time cost is slightly higher than with D109907. Differential Revision: https://reviews.llvm.org/D110368	2021-09-25 22:40:41 +02:00
Nikita Popov	327bbbb10b	[DSE] Make capture check more precise It is sufficient that the object has not been captured before the load that produces the pointer we're loading. A capture after that can not affect the already loaded pointer. This is small part of D110368 applied separately.	2021-09-25 22:23:19 +02:00
Simon Pilgrim	8c83bd3bd4	[CostModel][X86] Adjust vXi32 multiply costs if it can be performed using PMADDWD Update the costs to match the codegen from combineMulToPMADDWD - not only can we use PMADDWD is its zero-extended, but also if its a constant or sign-extended from a vXi16 (which can be replaced with a zero-extension).	2021-09-25 16:28:48 +01:00
Simon Pilgrim	5a14edd8ed	[InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold. We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078	2021-09-25 12:57:43 +01:00
Simon Pilgrim	993f3c61b3	[TTI] getUserCost - Ensure a vector insert/extract index is in unsigned 32-bit range Otherwise fallback to the generic 'unknown index' path Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29050	2021-09-25 10:50:54 +01:00
Nikita Popov	5969e5743a	[IR] Handle large element size when calculating GEP indices This is a fix for the issue reported at https://reviews.llvm.org/D110043#3019942: The ElementSize is a uint64_t and as such may be larger than the index space, or be negative in the index space. This is UB, but shouldn't cause assertion failures. We address this by detecting whether the size is too large and use a zero index in that case (which is always conservatively correct). Differential Revision: https://reviews.llvm.org/D110437	2021-09-24 22:20:20 +02:00
Sanjay Patel	a47c8e40c7	[InstCombine] fold lshr(trunc(lshr X, C1)) C2 Only the multi-use cases are changing here because there's another fold that catches the simpler patterns. But that other fold is the source of infinite loops when we try to add D110170, so removing that is planned as a follow-up. Attempt to show the general proof in Alive2: https://alive2.llvm.org/ce/z/Ns1uS2 Note that the overshift fold-to-zero tests are not currently handled by instsimplify. If they were, we could assert that the shift amount sum is less than the source bitwidth.	2021-09-24 15:44:07 -04:00
Nikita Popov	7774166499	[DSE] Add additional capture tests (NFC) These test other escape sources and the case of multiple underlying objects.	2021-09-24 21:13:29 +02:00
Simon Pilgrim	bdee805b32	[ConstantFold] ConstantFoldGetElementPtr - use APInt::isNegative() instead of getSExtValue() to support big ints Fixes fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39197	2021-09-24 18:18:53 +01:00
Simon Pilgrim	36eb6c0134	[SCCP] Regenerate bigint test checks	2021-09-24 18:18:53 +01:00
Hans Wennborg	1e9afab875	Re-apply "[JumpThreading] Ignore free instructions" It seems the crashes we saw wasn't caused by this (see comments on the review). > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `4604695d7c`.	2021-09-24 18:52:30 +02:00
Florian Hahn	6f28fb7081	Recommit "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts the revert commit `df56fc6ebb`. This version of the patch adjusts the location where the EarliestEscapes cache is cleared when an instruction gets removed. The earliest escaping instruction does not have to be a memory instruction. It could be a ptrtoint instruction like in the added test @earliest_escape_ptrtoint, which subsequently gets removed. We need to invalidate the EarliestEscape entry referring to the ptrtoint when deleting it. This fixes the crash mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6	2021-09-24 17:13:27 +01:00
Sanjay Patel	638a4147fc	[InstCombine] add tests for lshr-trunc-lshr; NFC	2021-09-24 11:38:19 -04:00
Sanjay Patel	3c5500907b	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)" This reverts commit `bb9333c350`. This exposes another existing bug that causes an infinite loop as shown in D110170 ...so reverting while I look at another fix.	2021-09-24 10:47:35 -04:00
Hans Wennborg	4604695d7c	Revert "[JumpThreading] Ignore free instructions" It caused compiler crashes, see comment on the code review for repro. > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `1e3c6fc7cb`.	2021-09-24 16:14:53 +02:00
Nico Weber	df56fc6ebb	Revert "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts commit `5ce89279c0`. Makes clang crash, see comments on https://reviews.llvm.org/D109844	2021-09-24 09:57:59 -04:00
Nikita Popov	1e3c6fc7cb	[JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290	2021-09-23 18:28:36 +02:00
Simon Pilgrim	c931d35216	[CostModel][X86] Increase i64 mul cost from 1 to 2 Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization. Noticed while investigating PR51436.	2021-09-23 14:48:21 +01:00
Sanjay Patel	bb9333c350	[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try) The 1st try at this was reverted because it caused an infinite loop in instcombine. That should be fixed after: `1cd6b44f26` (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-23 09:41:37 -04:00
Florian Hahn	5ce89279c0	[DSE] Track earliest escape, use for loads in isReadClobber. At the moment, DSE only considers whether a pointer may be captured at all in a function. This leads to cases where we fail to remove stores to local objects because we do not check if they escape before potential read-clobbers or after. Doing context-sensitive escape queries in isReadClobber has been removed a while ago in `d1a1cce5b1` to save compile-time. See PR50220 for more context. This patch introduces a new capture tracker, which keeps track of the 'earliest' capture. An instruction A is considered earlier than instruction B, if A dominates B. If 2 escapes do not dominate each other, the terminator of the common dominator is chosen. If not all uses cannot be analyzed, the earliest escape is set to the first instruction in the function entry block. If the query instruction dominates the earliest escape and is not in a cycle, then pointer does not escape before the query instruction. This patch uses this information when checking if a load of a loaded underlying object may alias a write to a stack object. If the stack object does not escape before the load, they do not alias. I will share a follow-up patch to also use the information for call instructions to fix PR50220. In terms of compile-time, the impact is low in general, NewPM-O3: +0.05% NewPM-ReleaseThinLTO: +0.05% NewPM-ReleaseLTO-g: +0.03 with the largest change being tramp3d-v4 (+0.30%) http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Compared to always computing the capture information on demand, we get the following benefits from the caching: NewPM-O3: -0.03% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.04% The biggest speedup is tramp3d-v4 (-0.21%). http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Overall there is a small, but noticeable benefit from caching. I am not entirely sure if the speedups warrant the extra complexity of caching. The way the caching works also means that we might miss a few cases, as it is less precise. Also, there may be a better way to cache things. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109844	2021-09-23 12:45:05 +01:00
Alex Richardson	05663dc146	[InstSimplify] Don't lose inbounds when simplifying a GEP I noticed this while working on a (ptrtoint (gep null, x)) -> x fold. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110168	2021-09-23 09:25:06 +01:00
Johannes Doerfert	c6457dcae8	[OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state This patch fixes a problem when the AAKernelInfo state was invalidated, e.g., due to `optnone` for a kernel, but not all parts indicated the invalidation properly. We further eliminate most full state invalidations as they should never be necessary. Differential Revision: https://reviews.llvm.org/D109468	2021-09-23 00:04:30 -05:00
Johannes Doerfert	57822c3f4f	[OpenMP][NFC] Repair test that contained nested kernels The benchmark contained (partially) nested kernels, something we do not generate nor support.	2021-09-23 00:04:29 -05:00
Johannes Doerfert	92280ae3d8	[OpenMP][NFC] Rerun the test check update script on all OpenMP-Opt tests	2021-09-23 00:04:29 -05:00
Johannes Doerfert	5e835ecb6d	[OpenMP][NFC] Precommit test that exposes a bug in our optnone handling	2021-09-23 00:04:29 -05:00
Usman Nadeem	3b12282b0e	[AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set Differential Revision: https://reviews.llvm.org/D109667 Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea	2021-09-22 20:59:46 -07:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Sanjay Patel	1cd6b44f26	[InstCombine] add one-use check to shift-shift transform We don't want to create extra instructions, and this could infinite loop with the proposed transform in D110170.	2021-09-22 16:31:12 -04:00
Sanjay Patel	55aa4e92f7	[InstCombine] add test for shift-shift with extra use; NFC	2021-09-22 16:31:12 -04:00
Nikita Popov	d8e1203f91	[JumpThreading] Add test with free instructions (NFC) Which demonstrates that "free" instructions can prevent jump threading.	2021-09-22 22:29:39 +02:00
Sanjay Patel	a85d7a56c7	[ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users This is another problem exposed by: https://bugs.llvm.org/PR50836	2021-09-22 15:01:53 -04:00
Sanjay Patel	c240169ff2	[Analysis] improve function matching for strlen libcall The return type of strlen is size_t, not just any integer. This is a partial fix for an example based on: https://llvm.org/PR50836 There's another bug here because we can still crash processing a real strlen or something that looks like it.	2021-09-22 13:50:12 -04:00
Arthur Eubanks	e7249e4acf	[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837	2021-09-22 09:52:37 -07:00
Hongtao Yu	d9b511d8e8	[CSSPGO] Set PseudoProbeInserter as a default pass. Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110209	2021-09-22 09:09:48 -07:00
Alexey Bataev	173dd896db	[SLP][NFC]Add a test to show an issue with incorrectly extracted pointers.	2021-09-22 09:02:13 -07:00
hyeongyu kim	98e96663f6	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110230	2021-09-23 00:48:24 +09:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
hyeongyu kim	ec8311444a	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110227	2021-09-23 00:14:50 +09:00
hyeongyu kim	e5aaf03326	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110226	2021-09-22 23:18:51 +09:00
Alexey Bataev	b6d10beb50	[SLP][NFC]Rename function in the test for better matching of the transformation.	2021-09-22 05:51:18 -07:00
Florian Hahn	a7c6471a85	[Passes] Run vector-combine early with -fenable-matrix. IR with matrix intrinsics is likely to also contain large vector operations, which can benefit from early simplifications. This is the last step in a series of changes to improve code-gen for code using matrix subscript operators with the C/C++ matrix extension in CLang, like using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; } https://clang.godbolt.org/z/6dKxK1Ed7 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D102496	2021-09-22 12:48:32 +01:00
Sanjay Patel	c6013f71a4	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded" This reverts commit `2f6b07316f`. This caused several bots to hit an infinite loop at stage 2, so it needs to be reverted while figuring out how to fix that.	2021-09-22 07:45:21 -04:00
Yi Kong	d0746f2e9b	Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar. SELECT VecC, ScalarIdx, 0 We could splat Idx to a vector but it defeats the purpose of optimisation. Don't apply the folding rule in this case. This fixes a regression from commit `d561b6fbdb`.	2021-09-22 18:11:33 +08:00
Simon Pilgrim	41492d77ba	[LoopVectorize][X86] Add operands to make it more obvious what line the CHECK concerns As we're checking the cost debug analysis these should match the original IR line - so we shouldn't have any variable naming issues. I'm investigating v4i32 mul -> PMADDDW costs handling (for PR47437) and these CHECK lines were proving tricky to keep track of	2021-09-22 10:08:32 +01:00
Florian Hahn	300870a95c	[VectorCombine] Switch to using a worklist. This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines. Suggested in D100302. The main use case at the moment is foldSingleElementStore and scalarizeLoadExtract working together to improve scalarization. Note that we now also do not run SimplifyInstructionsInBlock on the whole function if there have been changes. This means we fail to remove/simplify instructions not related to any of the vector combines. IMO this is fine, as simplifying the whole function seems more like a workaround for not tracking the changed instructions. Compile-time impact looks neutral: NewPM-O3: +0.02% NewPM-ReleaseThinLTO: -0.00% NewPM-ReleaseLTO-g: -0.02% http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D110171	2021-09-22 09:54:58 +01:00
Sanjay Patel	2f6b07316f	[InstCombine] fold cast of right-shift if high bits are not demanded (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-21 16:09:08 -04:00
Antonio Frighetto	43d6991c2a	[IR] Look through bitcast in hasFnAttribute() A logic incompleteness may lead MemorySSA to be too conservative in its results. Specifically, when dealing with a call of kind `call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where the function `test` is declared with readonly attribute, the bitcast is not looked through, obscuring function attributes. Hence, some methods of CallBase (e.g., doesNotReadMemory) could provide suboptimal results. Differential Revision: https://reviews.llvm.org/D109888	2021-09-21 21:57:02 +02:00
Nikita Popov	f2fa6ad047	[MergeICmps] Don't reorder unmerged comparisons MergeICmps will currently sort (by offset) all comparisons in a chain, including those that do not get merged. This is problematic in two ways: * We may end up moving the original first block into the middle of the chain, in which case the "extra work" instructions will also be in the middle of the chain, resulting in invalid IR (reported in https://reviews.llvm.org/D108782#3005583). * Reordering branches is generally not legal, because it may introduce branch on poison, which is UB (PR51845). The merging done by MergeICmps is legal as long as we assume that memcmp() works on frozen memory, but the reordering of unmerged comparisons is definitely incorrect (without inserting freeze instructions), so we should avoid it. There are easier ways to fix the first issue, but I figured it was worthwhile to do this properly to also fix the second one. What we now do is to restore the original relative order of (potentially merged) comparisons. I took the liberty of dropping the MERGEICMPS_DOT_ON functionality, because it would be more awkward to implement now (as the before and after representation is different) and it doesn't seem terribly useful nowadays. Differential Revision: https://reviews.llvm.org/D110024	2021-09-21 21:22:12 +02:00
Owen Anderson	b5fbbdd202	Teach InstCombine to eliminate malloc-realloc-free triplets. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D109988	2021-09-21 18:07:49 +00:00
Danila Malyutin	78b51c7a2c	[LSR] Make sure that Factor fits into Base type Fixes pr42770 Differential Revision: https://reviews.llvm.org/D108772	2021-09-21 20:50:50 +03:00
Ayal Zaks	ab6a69dfea	[LV] Fix crash for reverse interleaved loads with gap under fold-tail. This patch fixes the crash found by PR51614: whenever doing tail folding, interleave groups must be considered under mask. Another fix D108900 follows for targets that support masked loads and stores: when deciding to vectorize with masked interleave groups, check if the access is reverse - which is currently not supported; rather than (only) asserting when computing cost and generating code. Differential Revision: https://reviews.llvm.org/D108891	2021-09-21 20:13:32 +03:00
Dávid Bolvanský	c0fdfc9af2	[InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z) We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer). Requires reassoc. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109954	2021-09-21 18:20:46 +02:00
Sanjay Patel	08ef71ca92	[InstCombine] move/add tests for trunc-of-lshr; NFC Planning to reframe a proposed transform in terms of demanded bits as suggested in D110170. The new tests end with an 'or'.	2021-09-21 12:11:25 -04:00
Florian Hahn	5131037ea9	[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange. isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree. The use VectorCombine is updated to pass through the DT to enable additional scalarization. Note that similar APIs like computeKnownBits already accept optional dominator tree arguments. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110175	2021-09-21 16:54:47 +01:00
Anna Thomas	69921f6f45	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also added an API for retrieving a unique undroppable user. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-21 10:04:04 -04:00
Sanjay Patel	af1c5312d7	[InstCombine] add tests for mask-shift with trunc; NFC	2021-09-21 09:41:41 -04:00
Florian Hahn	ea27dd7497	[VectorCombine] Add tests which require DT to use info from assumes.	2021-09-21 13:07:06 +01:00
Simon Pilgrim	fc8f1e4419	[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057	2021-09-21 13:01:09 +01:00
David Stenberg	7b4cc09b14	[LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist This fixes PR51730, a heap-use-after-free bug in replaceConditionalBranchesOnConstant(). With the attached reproducer we were left with a function looking something like this after replaceAndRecursivelySimplify(): [...] cont2.i: br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i handler.type_mismatch3.i: %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ] unreachable cont4.i: unreachable [...] with both the branch instruction and PHI node being in the worklist. As a result of replacing the branch instruction with an unconditional branch, the PHI node in %handler.type_mismatch3.i would be removed. This then resulted in a heap-use-after-free bug due to accessing that removed PHI node in the next worklist iteration. This is solved by using a value handle worklist. I am a unsure if this is the most idiomatic solution. Another solution could have been to produce a worklist just containing the interesting branch instructions, but I thought that it perhaps was a bit cleaner to keep all worklist filtering in the loop that does the rewrites. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D109221	2021-09-21 11:33:07 +02:00
Max Kazantsev	2c7d5fbc9e	[SCEV] Generalize implication when signedness of FoundPred doesn't matter The implication logic for two values that are both negative or non-negative says that it doesn't matter whether their predicate is signed and unsigned, but only flips unsigned into signed for further inference. This patch adds support for flipping a signed predicate into unsigned as well. Differential Revision: https://reviews.llvm.org/D109959 Reviewed By: nikic	2021-09-21 11:17:56 +07:00
Max Kazantsev	073b254cff	[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block When following a case of a switch instruction is guaranteed to lead to UB, we can safely break these edges and redirect those cases into a newly created unreachable block. As result, CFG will become simpler and we can remove some of Phi inputs to make further analyzes easier. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109428 Reviewed By: lebedev.ri	2021-09-21 10:45:19 +07:00
Usman Nadeem	f417d9d821	[InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses Differential Revision: https://reviews.llvm.org/D109808 Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e	2021-09-20 18:32:24 -07:00
Nikita Popov	dd0226561e	[IR] Add helper to convert offset to GEP indices We implement logic to convert a byte offset into a sequence of GEP indices for that offset in a number of places. This patch adds a DataLayout::getGEPIndicesForOffset() method, which implements the core logic. I've updated SROA, ConstantFolding and InstCombine to use it, and there's a few more places where it looks relevant. Differential Revision: https://reviews.llvm.org/D110043	2021-09-20 20:18:16 +02:00
Florian Hahn	963d3a22b3	[DSE] Add additional tests to cover review comments. Adds additional tests following comments from D109844. Also removes unusued in.ptr arguments and places in the call tests that used loads instead of a getval call.	2021-09-20 17:06:04 +01:00
Alexey Bataev	bc69dd62c0	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-09-20 08:42:19 -07:00
David Sherwood	f988f68064	[Analysis] Add support for vscale in computeKnownBitsFromOperator In ValueTracking.cpp we use a function called computeKnownBitsFromOperator to determine the known bits of a value. For the vscale intrinsic if the function contains the vscale_range attribute we can use the maximum and minimum values of vscale to determine some known zero and one bits. This should help to improve code quality by allowing certain optimisations to take place. Tests added here: Transforms/InstCombine/icmp-vscale.ll Differential Revision: https://reviews.llvm.org/D109883	2021-09-20 15:01:59 +01:00
Max Kazantsev	e9d34c5429	[NFC] Add assert and test showing that revert of D109596 wasn't justified All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify form and rely on it. Added test that shows that this actually stands.	2021-09-20 12:01:12 +07:00
Max Kazantsev	471217cff8	Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"" This reverts commit `6fec6552f5`. The patch was reverted on incorrect claim that this patch may break LCSSA form when the loop is not in a simplify form. All IndVars' transform insure that the loop is in simplify and LCSSA form, so if it wasn't broken before this transform, it will also not be broken after it.	2021-09-20 12:01:10 +07:00
Max Kazantsev	def15c5fb6	[SCEV] Support negative values in signed/unsigned predicate reasoning There is a piece of logic that uses the fact that signed and unsigned versions of the same predicate are equivalent when both values are non-negative. It's also true when both of them are negative. Differential Revision: https://reviews.llvm.org/D109957 Reviewed By: nikic	2021-09-20 11:26:33 +07:00
Chris Jackson	5ba8020326	[DebugInfo][LSR] Emit shorter expressions from scev-based salvaging The scev-based salvaging for LSR can sometimes produce unnecessarily verbose expressions. This patch adds logic to detect when the value to be recovered and the induction variable differ by only a constant offset. Then, the expression to derive the current iteration count can be omitted from the dbg.value in favour of the offset. Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D109044	2021-09-19 21:41:44 +01:00
Sanjay Patel	9555d1edb0	[InstCombine] add/adjust tests for min/max intrinsics; NFC If we transform these, we have to propagate no-wrap/undef carefully.	2021-09-19 10:10:37 -04:00
Nikita Popov	abe21da670	[Tests] Fix noalias metadata in one more test Missed this one in `80110aafa0`. This is another test mixing up alias scopes and alias scope lists.	2021-09-18 21:17:05 +02:00
Nikita Popov	80110aafa0	[Tests] Fix incorrect noalias metadata Mostly this fixes cases where !noalias or !alias.scope were passed a scope rather than a scope list. In some cases I opted to drop the metadata entirely instead, because it is not really relevant to the test.	2021-09-18 20:51:00 +02:00
Usman Nadeem	d841c72e09	Precommit tests for D109807 "[InstCombine] Narrow type of logical operation chains when possible" Change-Id: Iae9bf18619e4926301a866c7e2bd38ced524804e	2021-09-18 11:28:49 -07:00
Joseph Huber	27905eeb89	[Attributor] Change AAExecutionDomain to check intrinsic edges The AAExecutionDomain instance checks if a BB is executed by the main thread only. Currently, this only checks the `__kmpc_kernel_init` call for generic regions to indicate the path taken by the main thread. In the new runtime, we want to be able to detect basic blocks even in SPMD mode. For this we enable it to check thread-ID intrinsics being compared to zero as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109849	2021-09-17 19:51:38 -04:00
Joseph Huber	fec2927e07	[OpenMP] Add NoSync attributes to alloc / free shared RTL calls This patch adds the `nosync` attribute to the `__kmpc_alloc_shared` and `__kmpc_free_shared` runtime library calls. This allows code analysis to know that these functins dont contain any barriers. This will help optimizations reason about the CFG of blocks containing these calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109995	2021-09-17 19:50:13 -04:00
Usman Nadeem	757384abff	[AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations zip1(uzp1(A, B), uzp2(A, B)) --> A zip2(uzp1(A, B), uzp2(A, B)) --> B Differential Revision: https://reviews.llvm.org/D109666 Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd	2021-09-17 15:24:46 -07:00
Sanjay Patel	6da3503602	[InstCombine] add tests for min/max intrinsics with offset operand; NFC	2021-09-17 16:36:46 -04:00
Dávid Bolvanský	d01e0c8c66	[NFC] Precommit tests for D109954	2021-09-17 21:59:35 +02:00
Hongtao Yu	c5fafc1e73	[CSSPGO] Tweakes to lower pseudo probe runtime overhead A couple tweaks to 1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary 2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D109976	2021-09-17 12:28:09 -07:00
Alexey Bataev	2b0b1d5319	[SLP][NFC]Add a test for reorder of alt shuffle operands.	2021-09-17 10:42:45 -07:00
Sanjay Patel	41ff7612b3	[InstCombine] allow splat vectors for narrowing masked fold Mostly cosmetic diffs, but the use of m_APInt matches splat constants.	2021-09-17 11:24:16 -04:00
Sanjay Patel	3a587ed20f	[InstCombine] add vector tests for 'and' folds; NFC	2021-09-17 11:24:16 -04:00
Max Kazantsev	690f76958a	[Test] Add simple test where IndVars fails to remove checks on negative values	2021-09-17 15:40:32 +07:00
Florian Hahn	bdafe3124c	[DSE] Add test cases with stores to objects before they escape. Test cases where stores to local objects can be removed because the object does not escape before calls that may read/write to memory. Includes test from PR50220.	2021-09-17 09:10:53 +01:00
Max Kazantsev	74fa174f33	[Test] One more missing opportunity on IndVars check removal	2021-09-17 14:52:15 +07:00
Sjoerd Meijer	97cc678cc4	[FuncSpec] Specialising on addresses of const global values. This introduces an option to allow specialising on the address of global values. This option is off by default because it is likely not that profitable to do so and needs more investigation. Before, we were specialising on addresses and thus this changes the default behaviour. Differential Revision: https://reviews.llvm.org/D109775	2021-09-17 08:07:05 +01:00
Christudasan Devadasan	167ff5280d	[GlobalOpt] Do not shrink global to bool for an unfavorable AS Do not call `TryToShrinkGlobalToBoolean` for address spaces that don't allow initializers. It inserts an initializer value while shrinking to bool. Used the target hook introduced with D109337 to skip this call for the restricted address spaces. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109823	2021-09-16 23:13:30 -04:00
Daniil Suchkov	fe950cba8f	Update LoopPredication test to fix buildbot failure. This patch updates tests added in `5f2b7879f1`.	2021-09-16 23:37:59 +00:00
Daniil Suchkov	0e36288318	[LoopPredication] Report changes correctly when attempting loop exit predication To make the IR easier to analyze, this pass makes some minor transformations. After that, even if it doesn't decide to optimize anything, it can't report that it changed nothing and preserved all the analyses. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D109855	2021-09-16 22:49:55 +00:00
Daniil Suchkov	5f2b7879f1	NFC. Add tests exposing missing analysis invalidation in LoopPredication.	2021-09-16 22:49:55 +00:00
Jon Roelofs	4b19e7dfae	[LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks Differential revision: https://reviews.llvm.org/D109929	2021-09-16 15:46:26 -07:00
Arthur Eubanks	d49cb5b303	[SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest This makes some tests in vector-reductions-logical.ll more stable when applying D108837. The cost of branching is higher when vector ops are involved due to potential SLP transformations. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108935	2021-09-16 10:50:36 -07:00
Dávid Bolvanský	a4a426c9e0	[InstCombine] Added llvm.powi optimizations If power is even: powi(-x, p) -> powi(x, p) powi(fabs(x), p) -> powi(x, p) powi(copysign(x, y), p) -> powi(x, p)	2021-09-16 19:42:21 +02:00
Dávid Bolvanský	c0afb00924	[NFC] Added tests for llvm.powi optimizations	2021-09-16 19:42:21 +02:00
Sjoerd Meijer	2a1ac2e318	[FuncSpec] Add force flag to test case to trigger the transform. NFC.	2021-09-16 17:48:13 +01:00
Florian Hahn	2f97ff8e7b	[SLP] Add additional memory versioning tests.	2021-09-16 13:31:14 +01:00
Anton Afanasyev	6a5f49a1ac	[AggressiveInstCombine] Add `{insert/extract}element` to `TruncInstCombine` DAG Alive2 for `{insert/extract}element`: https://alive2.llvm.org/ce/z/hwy_E- Actually, no one file of test suite is touched by this change, which means that is rare pattern not generated by frontend. But it's worth being in place. Differential Revision: https://reviews.llvm.org/D109236	2021-09-16 11:24:31 +03:00
Anton Afanasyev	8371a4c9d5	[Test][AggressiveInstCombine] Add test for truncation of vector instructions Precommit test for D109236	2021-09-16 11:24:30 +03:00
Sjoerd Meijer	a4e437e3c9	[FuncSpec] Add a test for specialising on a non-constant global argument. NFC.	2021-09-16 09:17:39 +01:00
Sam Parker	c98a8a09b5	[HardwareLoops] Loop guard intrinsic to recognise zext If a loop count was initially represented by a 32b unsigned int in C then the hardware-loop pass can recognise the loop guard and insert the llvm.test.set.loop.iterations intrinsic. If this was instead a unsigned short/char then clang inserts a zext instruction to expand the loop count to an i32. This patch adds the necessary pattern matching to enable the use of lvm.test.set.loop.iterations in those cases. Patch by: sherwin-dc Differential Revision: https://reviews.llvm.org/D109631	2021-09-16 08:33:16 +01:00
Owen Anderson	68079ef0eb	Teach SimplifyCFG to fold switches into lookup tables in more cases. In particular, it couldn't handle cases where lookup table constant expressions involved bitcasts. This does not seem to come up frequently in C++, but comes up reasonably often in Rust via `#[derive(Debug)]`. Originally reported by pcwalton. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109565	2021-09-15 22:07:08 +00:00
Anna Thomas	f9e4aebe4a	Revert "[InstCombine] Improve TryToSinkInstruction with multiple uses" This reverts commit `4ac4e52189`. There are couple of test failures, which needs update of the test cases. Doing a clean revert and will recommit the change along with fixed testcases.	2021-09-15 18:03:11 -04:00
Anna Thomas	4ac4e52189	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also, the API for retrieving undroppable user has been updated accordingly since in both usecases (Attributor and InstCombine), we seem to care about the user, rather than the use. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-15 20:39:38 +00:00
Sanjay Patel	e5a32d720e	[InstCombine] move extend after insertelement if both operands are extended I was wondering how instcombine does on the examples in D109236, and we're missing a basic transform: inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index) https://alive2.llvm.org/ce/z/z2aBu9 Note that there are several possible extensions of this fold (see TODO comments). Differential Revision: https://reviews.llvm.org/D109537	2021-09-15 14:38:03 -04:00
Anna Thomas	36ef65adc3	[InstCombine] Update test checks through autogeneration, add more tests. NFC Updated check lines. Tests precommitted from D109700.	2021-09-15 16:20:30 +00:00
Max Kazantsev	c78ed20784	[Test] Add a test showing missing opportunities in branch deletion by indvars	2021-09-15 22:17:10 +07:00
Alexey Bataev	446e11fa29	[SLP][NFC]Add a test for tiny tree with stores and with not same/alternate instructions.	2021-09-15 08:07:01 -07:00
Filipp Zhinkin	f5d8952356	[InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y) Enabled mul folding optimization that was previously disabled by being incorrect. To preserve correctness, mul's operand that is not compared with zero in select's condition is now frozen. Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286 Correctness: https://alive2.llvm.org/ce/z/bHef7J https://alive2.llvm.org/ce/z/QcR7sf https://alive2.llvm.org/ce/z/vvBLzt https://alive2.llvm.org/ce/z/jGDXgq https://alive2.llvm.org/ce/z/3Pe8Z4 https://alive2.llvm.org/ce/z/LGga8M https://alive2.llvm.org/ce/z/CTG5fs Differential Revision: https://reviews.llvm.org/D108408	2021-09-15 09:04:06 -04:00
Sanjay Patel	be1028053e	[PhaseOrdering] add tests for PR47023; NFC	2021-09-15 08:44:04 -04:00
Simon Pilgrim	0767e43d87	[CostModel][X86] Adjust bitreverse/ctpop/ctlz/cttz AVX2+ costs based on llvm-mca reports Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).	2021-09-15 13:04:40 +01:00
Florian Hahn	05c120823b	[DSE] Add capture-before test cases with loads. Add a set of test cases where redundant stores may be removable, depending on whether a local allocation gets captured before performing a load.	2021-09-15 11:13:35 +01:00
David Green	61cc873a8e	[LV] Recognize intrinsic min/max reductions This extends the reduction logic in the vectorizer to handle intrinsic versions of min and max, both the floating point variants already created by instcombine under fastmath and the integer variants from D98152. As a bonus this allows us to match a chain of min or max operations into a single reduction, similar to how add/mul/etc work. Differential Revision: https://reviews.llvm.org/D109645	2021-09-15 10:45:50 +01:00
David Green	bddfbf91ed	[LV] Min/max intrinsic reduction test cases.	2021-09-15 09:56:19 +01:00
Florian Hahn	e90d55e1c9	[VPlan] Support sinking recipes with uniform users outside sink target. This is a first step towards addressing the last remaining limitation of the VPlan version of sinkScalarOperands: the legacy version can partially sink operands. For example, if a GEP has uniform users outside the sink target block, then the legacy version will sink all scalar GEPs, other than the one for lane 0. This patch works towards addressing this case in the VPlan version by detecting such cases and duplicating the sink candidate. All users outside of the sink target will be updated to use the uniform clone. Note that this highlights an issue with VPValue naming. If we duplicate a replicate recipe, they will share the same underlying IR value and both VPValues will have the same name ir<%gep>. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104254	2021-09-15 09:21:39 +01:00
Philip Reames	d4e03bccd4	regen an autogened test which is stale	2021-09-14 18:42:23 -07:00
Matt Arsenault	88146230e1	SeparateConstOffsetFromGEP: Fix stack overflow in unreachable code ConstantOffsetExtractor::Find was infinitely recursing on the add referencing itself.	2021-09-14 19:49:38 -04:00
Matt Arsenault	fdd9761dd1	Attributor: Fix crash on undef in !callees	2021-09-14 19:49:34 -04:00
Florian Hahn	e248d69036	Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer." SCEV does not look through non-header PHIs inside the loop. Such phis can be analyzed by adding separate accesses for each incoming pointer value. This results in 2 more loops vectorized in SPEC2000/186.crafty and avoids regressions when sinking instructions before vectorizing. Fixes PR50296, PR50288. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D102266	2021-09-14 11:19:12 +01:00
David Green	5a6dfbb8cd	[ARM] Teach DemandedVectorElts about VMOVN lanes The class of instructions that write to narrow top/bottom lanes only demand the even or odd elements of the input lanes. Which means that a pair of VMOVNT; VMOVNB demands no lanes from the original input. This teaches that to instcombine from the target hooks available through ARMTTIImpl. Differential Revision: https://reviews.llvm.org/D109325	2021-09-14 11:05:31 +01:00
Arthur Eubanks	096d9814aa	[opt] Remove some legacy PM flags Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109664	2021-09-13 15:50:03 -07:00
Kuba Mracek	e80ee4cbd9	[GlobalDCE] In VFE support for relative pointers, allow GEP references to the base symbol This is for Swift VFE support. In some vtable forms that Swift emits, the "base" of a relative pointer is not the global symbol itself directly, but a GEP into it -- so the pointer is relative to a particular field in the global. So getPointerAtOffset() needs to be able to see through the GEP and allow it in a SUB expression, to correctly recognize the offset as a vtable slot. Differential Revision: https://reviews.llvm.org/D109169	2021-09-13 15:22:11 -07:00
Jon Chesterfield	6775ad2025	[openmp] Apply test change from D109500	2021-09-13 18:36:29 +01:00
Jon Chesterfield	bfcf979978	Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu" This reverts commit `d5c049a3f6`. Going to re-commit it in pieces for easier application to 13	2021-09-13 18:25:07 +01:00
Philip Reames	6fec6552f5	Revert "[IndVars] Replace PHIs if loop exits on 1st iteration" This reverts commit `5a6dfb27ca`. See original review for why.	2021-09-13 10:11:18 -07:00
Philip Reames	5746c76f3f	Revert "[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration" This reverts commit `d9ca444835`. See review for why.	2021-09-13 10:10:49 -07:00
dpalermo	d5c049a3f6	[openmp] Fix 51647, corrupt bitcode on amdgpu Patch by @dpalermo The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader. This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use. This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space. Reviewed By: ronlieb, jdoerfert, dpalermo-phab Differential Revision: https://reviews.llvm.org/D109500	2021-09-13 15:24:48 +01:00
Florian Hahn	4b342268c0	[VPlan] Add test that requires duplicating recipe for sinking.	2021-09-13 14:21:20 +01:00
Florian Hahn	c24fc37e47	[VectorCombine] Support AND/UREM indices that require freezing. `38b098be66` limited scalarization to indices that are known non-poison. For certain patterns that restrict the range of an index, we can insert a freeze of the original value, to prevent propagation of poison. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107580	2021-09-13 11:21:45 +01:00
Jingu Kang	2a26d47a2d	[LoopBoundSplit] Check the start value of split cond AddRec After transformation, we assume the split condition of the pre-loop is always true. In order to guarantee it, we need to check the start value of the split cond AddRec satisfies the split condition. Differential Revision: https://reviews.llvm.org/D109354	2021-09-13 10:32:35 +01:00
Max Kazantsev	7e337d8ba2	[Test] Add more sophisticated tests for switch UB opt Optimizer is being too smart with existing tests, and the transform gets concealed by following transforms.	2021-09-13 15:28:23 +07:00
Simon Pilgrim	6d970e83fa	[InstCombine] Add PR51784 test cases	2021-09-13 08:36:43 +01:00
Max Kazantsev	d9ca444835	[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit is taken on 1st iteration, we make all branches in the following exiting blocks always branch out of the loop and their conditions simplified away. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D108910 Reviewed By: lebedev.ri	2021-09-13 11:30:55 +07:00
Max Kazantsev	5a6dfb27ca	[IndVars] Replace PHIs if loop exits on 1st iteration This is a part of D108910. We replace all loop PHIs with values coming from the loop preheader if we proved that backedge is never taken. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D109596 Reviewed By: lebedev.ri	2021-09-13 10:50:33 +07:00
Florian Hahn	368af7558e	[VPlan] Fix crash caused by not updating all users properly. Users of VPValues are managed in a vector, so we need to be more careful when iterating over users while updating them. For now, just copy them. Fixes 51798.	2021-09-12 18:10:53 +01:00
Sanjay Patel	3a126134d3	[InstCombine] remove casts from splat-a-bit pattern https://alive2.llvm.org/ce/z/_AivbM This case seems clear since we can reduce instruction count and avoid an intermediate type change, but we might want to use mask-and-compare for other sequences. Currently, we can generate more instructions on some related patterns by trying to use bit-hacks instead of mask+cmp, so something is not behaving as expected.	2021-09-12 09:18:14 -04:00
Sanjay Patel	75e8eb2b10	[InstCombine] update code/test comments; NFC Follow-up for post-commit suggestion on: `28afaed691` The comments were partly copied from the original code, but not updated to match the new code.	2021-09-11 10:53:53 -04:00
Sanjay Patel	28afaed691	[InstCombine] fold sub of min/max intrinsics with invertible ops This is a translation of the existing code to handle the intrinsics and another step towards D98152. https://alive2.llvm.org/ce/z/jA7eBC This pattern is already handled by underlying folds if there are less uses, so the minimal tests in this case have extra uses. The larger cmyk tests show the motivation - when combined with other folds, we invert a larger sequence and eliminate 'not' ops.	2021-09-11 09:18:46 -04:00
Usman Nadeem	ab111e982f	Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"" This reverts commit `eee7d225de`. Effectively relanding `98c37247d8` after fixing the failing tests. Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5	2021-09-10 18:11:24 -07:00
Johannes Doerfert	99ea8ac9f1	Reapply "[OpenMP] Group side-effects to improve guarding efficiency" This reapplies `ca134c3963`, effectively reverting commit `d2f206e0af`. Minor test changes to make the test pass.	2021-09-10 15:22:57 -05:00
Johannes Doerfert	c09fbbdcfb	Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"" This reapplies commit `7dbba3376f`, or, put differently, this reverts commit `d9a8d20827`. The test now requires the amdgpu and nvptx backend explicitly as it won't work without properly.	2021-09-10 15:22:56 -05:00
Usman Nadeem	eee7d225de	Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation" This reverts commit `98c37247d8`.	2021-09-10 13:01:48 -07:00
Usman Nadeem	98c37247d8	[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation Differential Revision: https://reviews.llvm.org/D109118 Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3	2021-09-10 12:52:14 -07:00
Sanjay Patel	188375f478	[InstCombine] add tests for sub of min/max intrinsics; NFC	2021-09-10 14:53:05 -04:00
Joseph Huber	9e2fc0ba37	[OpenMP] Check OpenMP assumptions on call-sites as well This patch adds functionality to check assumption attributes on call sites as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109376	2021-09-10 14:52:47 -04:00
Anton Afanasyev	54d8ebbbfd	[AggressiveInstCombine] Add `udiv` and `urem` instrs to TruncInstCombine DAG Add `udiv` and `urem` instructions to the DAG post-dominated by `trunc`, allowing TruncInstCombine to reduce bitwidth of expressions containing these instructions. It is sufficient to require that all truncated bits of both operands are zeros: https://alive2.llvm.org/ce/z/yiithn (`urem` case is identical). Differential Revision: https://reviews.llvm.org/D109515	2021-09-10 20:29:08 +03:00
Anton Afanasyev	ea7b2c147f	[Test][AggressiveInstCombine] Add test for `udiv` and `urem` Precommit test for D109515	2021-09-10 20:29:08 +03:00
Johannes Doerfert	d2f206e0af	Revert "[OpenMP] Group side-effects to improve guarding efficiency" This reverts commit `ca134c3963`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:24:00 -05:00
Johannes Doerfert	d9a8d20827	Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals" This reverts commit `7dbba3376f`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:23:08 -05:00
Johannes Doerfert	7dbba3376f	[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals Not all address spaces support initializers for globals and we can therefore not set them without checking if they are allowed. This patch adds a hook into TTI to check if an AS allows non-undef initializers. We disable it for all but address space 0 by default, NVPTX and AMDGPU targets allow all but address space 3. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109337	2021-09-10 12:08:50 -05:00
Johannes Doerfert	ca134c3963	[OpenMP] Group side-effects to improve guarding efficiency When we guard side-effects as part of SPMDzation we do it for consecutive instructions that need guarding. This patch will try to reorder guarded side-effects in a block to decrease the number of guarded regions we need. It does not use any smarts, e.g., alias analysis, to move side-effects over non-interfering reads. Instead, it only moves side-effects downwards to the next guarded side-effect if there was nothing in between that could have possibly be affected. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D109070	2021-09-10 12:08:48 -05:00
Nikita Popov	90ec6dff86	[OpaquePtr] Forbid mixing typed and opaque pointers Currently, opaque pointers are supported in two forms: The -force-opaque-pointers mode, where all pointers are opaque and typed pointers do not exist. And as a simple ptr type that can coexist with typed pointers. This patch removes support for the mixed mode. You either get typed pointers, or you get opaque pointers, but not both. In the (current) default mode, using ptr is forbidden. In -opaque-pointers mode, all pointers are opaque. The motivation here is that the mixed mode introduces additional issues that don't exist in fully opaque mode. D105155 is an example of a design problem. Looking at D109259, it would probably need additional work to support mixed mode (e.g. to generate GEPs for typed base but opaque result). Mixed mode will also end up inserting many casts between i8* and ptr, which would require significant additional work to consistently avoid. I don't think the mixed mode is particularly valuable, as it doesn't align with our end goal. The only thing I've found it to be moderately useful for is adding some opaque pointer tests in between typed pointer tests, but I think we can live without that. Differential Revision: https://reviews.llvm.org/D109290	2021-09-10 15:18:23 +02:00
Filipp Zhinkin	745f82b8d9	[InstCombine] add tests for X == 0 ? 0 : X * Y ; NFC These are the tests for D108408 with current baseline results.	2021-09-10 09:06:48 -04:00
Max Kazantsev	6b69cc09b7	[Test][NFC] Regenerate checks in test	2021-09-10 18:46:10 +07:00
Sjoerd Meijer	6a076fa953	[LoopFlatten] Make the analysis more robust after IV widening LoopFlatten wasn't triggering on this motivating case after IV widening: void foo(int A, int N, int M) { for (int i = 0; i < N; ++i) for (int j = 0; j < M; ++j) f(A[iM+j]); } The reason was that the old induction phi nodes were getting in the way. These narrow and dead induction phis are not always trivially dead, and having both the narrow and wide IVs confused the analysis and caused it to bail. This adds some extra bookkeeping for these old phis, so we can filter them out when checks on phi nodes are performed. Other clean up passes will get rid of these old phis and increment instructions. As this was one of the motivating examples from the beginning, it was surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG is just slightly different. Differential Revision: https://reviews.llvm.org/D109309	2021-09-10 12:34:04 +01:00
Rosie Sumpter	9d1bea9c88	[SVE][LoopVectorize] Optimise code generated by widenPHIInstruction For SVE, when scalarising the PHI instruction the whole vector part is generated as opposed to creating instructions for each lane for fixed- width vectors. However, in some cases the lane values may be needed later (e.g for a load instruction) so we still need to calculate these values to avoid extractelement being called on the vector part. Differential Revision: https://reviews.llvm.org/D109445	2021-09-10 11:58:04 +01:00
Sjoerd Meijer	4f9217c519	[FuncSpec] Don't specialise call sites that have the MinSize attribute set The MinSize attribute can be attached to both the callee and the caller in the callsite. Function specialisation was already skipped for function declarations (callees) with MinSize. This also skips specialisations for the callsite when it has MinSize set. Differential Revision: https://reviews.llvm.org/D109441	2021-09-10 09:01:45 +01:00
Max Kazantsev	09d0fa3bbe	[Test] Add tests showing missed opportunity for SimplifyCFG for switches Patch by Dmitry Bakunevich!	2021-09-10 11:16:13 +07:00
Nikita Popov	af382b9383	[IR] Handle constant expressions in containsUndefinedElement() If the constant is a constant expression, then getAggregateElement() will return null. Guard against this before calling HasFn().	2021-09-09 22:04:12 +02:00
Sanjay Patel	3cb5aa8622	[InstCombine] add tests for insertelement with cast ops; NFC	2021-09-09 14:58:39 -04:00
Sanjay Patel	97a4e7b7ff	[InstCombine] remove a buggy set of zext-icmp transforms The motivating case is an infinite loop shown with a reduced test from: https://llvm.org/PR51762 To solve this, I'm proposing we delete the most obviously broken part of this code. The bug example shows a fundamental problem: we ask computeKnownBits if a transform will be profitable, alter the code by creating new instructions, then rely on computeKnownBits to return the same answer to actually eliminate instructions. But there's no guarantee that the results will be the same between the 1st and 2nd calls. In the infinite loop example, we get different answers, so we add instructions that conflict with some other transform, and we're stuck. There's at least one other problem visible in the test diff for `@zext_or_masked_bit_test_uses`: the code doesn't check uses properly, so we can end up with extra instructions created. Last, it's not clear if this set of transforms actually improves analysis or codegen. I spot-checked a few targets and don't see a clear win: https://godbolt.org/z/x87EWovso If we do see a regression from this change, codegen seems like the right place to add a cmp -> bit-hack fold. If this is too big of a step, we could limit the computeKnownBits calls by not passing a context instruction and/or limiting the recursion. I checked that those would stop the infinite loop for PR51762, but that won't guarantee that some other example does not fall into the same loop. Differential Revision: https://reviews.llvm.org/D109440	2021-09-09 08:49:39 -04:00
Roman Lebedev	909cba9699	[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125) I can't seem to wrap my head around the proper fix here, we should be fine without this requirement, iff we can form this form, but the naive attempt (https://reviews.llvm.org/D106317) has failed. So just to unblock the release, put up a restriction. Fixes https://bugs.llvm.org/show_bug.cgi?id=51125	2021-09-09 12:28:09 +03:00
Jun Ma	8ba2adcf9e	Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" Differential Revision: https://reviews.llvm.org/D106056	2021-09-09 16:53:33 +08:00
Nikita Popov	6dfdc6bfd2	[SROA] Support opaque pointers Make the following changes in order to support opaque pointers in SROA: * Generate i8 GEPs for opaque pointers. * Explicitly enforce that promotable allocas only have stores of the alloca type -- previously this was implicitly enforced. * Replace a check for pointer element type with load/store type. Differential Revision: https://reviews.llvm.org/D109259	2021-09-08 22:25:44 +02:00
Arthur Eubanks	b493124ae2	[MemorySSA] Support invariant.group metadata The implementation is mostly copied from MemDepAnalysis. We want to look at all loads and stores to the same pointer operand. Bitcasts and zero GEPs of a pointer are considered the same pointer value. We choose the most dominating instruction. Since updating MemorySSA with invariant.group is non-trivial, for now handling of invariant.group is not cached in any way, so it's part of the walker. The number of loads/stores with invariant.group is small for now anyway. We can revisit if this actually noticeably affects compile times. To avoid invariant.group affecting optimized uses, we need to have optimizeUsesInBlock() not use invariant.group in any way. Co-authored-by: Piotr Padlewski <prazek@google.com> Reviewed By: asbirlea, nikic, Prazek Differential Revision: https://reviews.llvm.org/D109134	2021-09-08 13:06:12 -07:00
Akira Hatanaka	dea6f71af0	[ObjC][ARC] Use the addresses of the ARC runtime functions instead of integer 0/1 for the operand of bundle "clang.arc.attachedcall" https://reviews.llvm.org/D102996 changes the operand of bundle "clang.arc.attachedcall". This patch makes changes to llvm that are needed to handle the new IR. This should make it easier to understand what the IR is doing and also simplify some of the passes as they no longer have to translate the integer values to the runtime functions. Differential Revision: https://reviews.llvm.org/D103000	2021-09-08 11:58:03 -07:00
Joseph Huber	6b9a3ec3a2	[OpenMP] Do not SPMDize generic regions with no parallel This patch changes SPMDization to not trigger for regions with no parallelism. Otherwise, this will introduce unnecessary barriers that will slow the single-threaded region down. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109438	2021-09-08 14:33:15 -04:00
Andrew Litteken	c172f1ad39	[IROutliner] Adding supports for multiple exits When we start outlining across branches, there is the possibility that we will have two different blocks with different output locations, or a single branch that goes to two blocks outside of the region that is being outlined. While the CodeExtractor provides most of the mechanisms by using the return value of the extracted function as the input to a switch statement to correctly branch to the correct location, we need special handling for different output schemas to each location. This is done by repeating the existing storing scheme for each different exit block. We have a map from the return values used, to the basic block that is used to store the outputs for that particular exit block within the outlined function. Then if needed, we create a switch statement for each return block to branch to the correct set of stored outputs. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106993	2021-09-08 08:58:07 -07:00
Sanjay Patel	b041b613e6	[InstCombine] add test for zext with 'or' op; NFC	2021-09-08 09:58:06 -04:00
Sanjay Patel	5639946d89	[InstCombine] remove unnecessary instructions from test; NFC	2021-09-08 09:58:06 -04:00
Sjoerd Meijer	a1e8b754eb	[FuncSpec] Fix test case: only run funcspec and not any other passes. NFC.	2021-09-08 12:40:58 +01:00
Fraser Cormack	7fb66d4035	[MemCpyOpt] Fix a variety of scalable-type crashes This patch fixes a variety of crashes resulting from the `MemCpyOptPass` casting `TypeSize` to a constant integer, whether implicitly or explicitly. Since the `MemsetRanges` requires a constant size to work, all but one of the fixes in this patch simply involve skipping the various optimizations for scalable types as cleanly as possible. The optimization of `byval` parameters, however, has been updated to work on scalable types in theory. In practice, this optimization is only valid when the length of the `memcpy` is known to be larger than the scalable type size, which is currently never the case. This could perhaps be done in the future using the `vscale_range` attribute. Some implicit casts have been left as they were, under the knowledge they are only called on aggregate types. These should never be scalably-sized. Reviewed By: nikic, tra Differential Revision: https://reviews.llvm.org/D109329	2021-09-08 11:21:36 +01:00
Max Kazantsev	29d054bf12	[SimplifyCFG] Preserve knowledge about guarding condition by adding assume This improvement adds "assume" after removal of branch basing on UB in successor block. Consider the following example: ``` pred: x = ... cond = x > 10 br cond, bb, other.succ bb: phi [nullptr, pred], ... // other possible preds load(phi) // UB if we came from pred other.succ: // here we know that x <= 10, but this knowledge is lost // after the branch is turned to unconditional unless we // preserve it with assume. ``` If we remove the branch basing on knowledge about UB in a successor block, then the fact that x <= 10 is other.succ might be lost if this condition is not inferrable from any dominating condition. To preserve this knowledge, we can add assume intrinsic with (possibly inverted) branch condition. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109054 Reviewed By: lebedev.ri	2021-09-08 14:05:17 +07:00
Yuanfang Chen	79c00d3f54	[NPM] Make AddDiscriminators pass required This is to make sure the pass is not skipped at O0 where optnone is applied to functions by default.	2021-09-07 17:02:24 -07:00
Sanjay Patel	a3c1669b17	[InstCombine] fold icmp equality with 'or' mask ops This could go either direction since the instruction count is the same either way, but there are a few reasons to prefer this: 1. We already do the related transform with 'and' (see just above the new code). 2. We try (too hard) to compensate for not having this and possibly other folds in transformZExtICmp(), and that leads to bugs like https://llvm.org/PR51762 . 3. Codegen looks better across a variety of targets. https://alive2.llvm.org/ce/z/uEgn4P	2021-09-07 16:34:00 -04:00
Sanjay Patel	9565457aad	[InstCombine] add tests for icmp with 'or' ops; NFC	2021-09-07 16:34:00 -04:00
Nikita Popov	58db5f6e95	[ConstFold] Support opaque pointers in constexpr GEPs Support opaque pointers in SymbolicallyEvaluateGEP() by using the value type of a GlobalValue base or falling back to i8 if there isn't one. We don't unconditionally generate i8 GEPs here because that would lose inrange attribues, and because some optimizations on globals currently rely on GEP types (e.g. the globals SROA mentioned in the comment). Differential Revision: https://reviews.llvm.org/D109297	2021-09-07 20:50:29 +02:00
Roman Lebedev	35fa7b8ad8	Reland "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)" This reverts commit `91f7a4fff7`, relanding commit `13ec913bdf`. The original commit was reverted because of (essentially) https://bugs.llvm.org/show_bug.cgi?id=35922 which has now been addressed by `d0eeb64be5`.	2021-09-07 21:03:52 +03:00
Dávid Bolvanský	3b5f318f5d	[InstCombine] ror/rol(X, RotAmt) == C --> X == rol/ror(C, RotAmt) (PR51567) ``` ---------------------------------------- define i1 @src(i32 %0) { %1: %2 = fshl i32 %0, i32 %0, i32 25 %3 = icmp eq i32 %2, 5 ret i1 %3 } => define i1 @tgt(i32 %0) { %1: %2 = icmp eq i32 %0, 640 ret i1 %2 } Transformation seems to be correct! ``` https://alive2.llvm.org/ce/z/GdY8Jm Solves PR51567 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109283	2021-09-07 18:04:58 +02:00
Andrew Litteken	81d3ac0cf2	[IROutliner] Adding outlining for single entry/single exit multiblock regions Using the similarity found from the IRSimilarity Identifier, we take regions with structural similarity, and deduplicate them into a separate function. The Code Extractor is able to provide most of this functionality. For simplicity, we start by only outlining regions with a single entry and single exit branch, this reduces the complexity in handling phi nodes outside the region, and handling many sets of outputs for each of the different exit blocks. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D106990	2021-09-07 08:51:54 -07:00
Sanjay Patel	761835521c	[InstCombine] add tests for smear-a-set-bit; NFC Possible follow-ups from patterns discussed in D109155.	2021-09-07 11:42:33 -04:00
Jingu Kang	61d8e27193	[test] precommit a test for D109354	2021-09-07 16:02:53 +01:00
Anton Afanasyev	d1f9b21677	[AggressiveInstCombine] Add `AssumptionCache` to aggressive instcombine Add support for @llvm.assume() to TruncInstCombine allowing optimizations based on these intrinsics while computing known bits.	2021-09-07 16:45:00 +03:00

... 3 4 5 6 7 ...

20031 Commits