llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	97c473ad39	[SLP] rename variable to not be misleading; NFC The reduction matching was probably only dealing with binops when it was written, but we have now generalized it to handle select and intrinsics too, so assert on that too.	2021-07-07 14:40:21 -04:00
Arnold Schwaighofer	2937f8d148	[coro async] Cap the alignment of spilled values (vs. allocas) at the max frame alignment Before this patch we would normally use the ABI alignment which can be to high for the context alginment. For spilled values we don't need ABI alignment, since the frame entry's address is not escaped. rdar://79664965 Differential Revision: https://reviews.llvm.org/D105288	2021-07-07 08:06:25 -07:00
Philip Reames	723144665b	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4) Resubmit after the following changes: * Fix a latent bug related to unrolling with required epilogue (see `e49d65f`). I believe this is the cause of the prior PPC buildbot failure. * Disable non-latch exits for epilogue vectorization to be safe (`9ffa90d`) * Split out assert movement (`600624a`) to reduce churn if this gets reverted again. Previous commit message (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-07-07 07:44:35 -07:00
Dylan Fleming	7215dcfe36	[SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths Depends on D104239 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105341	2021-07-07 15:30:10 +01:00
Arnold Schwaighofer	033de11150	[coro async] Move code to proper switch While upstreaming patches this code somehow was applied to the wrong switch statement. Differential Revision: https://reviews.llvm.org/D105504	2021-07-07 06:19:08 -07:00
Max Kazantsev	19885c7adf	[NFC] Remove duplicate function calls Removed repeated call of L->getHeader(). Now using previously stored return value. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D105535 Reviewed By: mkazantsev	2021-07-07 17:02:36 +07:00
Dylan Fleming	7586b47fb6	[SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104239	2021-07-07 09:58:05 +01:00
Johannes Doerfert	168a9234d7	[Attributor][FIX] Replace uses first, then values Before we replaced value by registering all their uses. However, as we replace a value old uses become stale. We now replace values explicitly and keep track of "new values" when doing so to avoid replacing only uses in stale/old values but not their replacements.	2021-07-06 22:43:51 -05:00
Johannes Doerfert	aa3768278d	[Attributor] Introduce a helper function to deal with undef + none We often need to deal with the value lattice that contains none and undef as special values. A simple helper makes this much nicer. Differential Revision: https://reviews.llvm.org/D103857	2021-07-06 22:41:21 -05:00
Johannes Doerfert	fc82409b5c	[Attributor] Simplify operands inside of simplification AAs first When we do simplification via AAPotentialValues or AAValueConstantRange we need to simplify the operands of an instruction we deconstruct first. This does not only improve the result, see for example range.ll, but is required as we allow outside AAs to provide simplification rules via callbacks. If we do ignore the simplification rules and base other simplifications on the IR instead we can create an inconsistent state.	2021-07-06 22:41:18 -05:00
Eli Friedman	7ac1c7bead	Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Recommitting with fix to MemoryDepChecker::isDependent. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 12:16:05 -07:00
Eli Friedman	a6d081b2cb	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit `74d6ce5d5f`. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Philip Reames	9ffa90d6c2	[LV] Disable epilogue vectorization for non-latch exits When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed. Better late (4 months), than never right? I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error. Thus, let's play it safe for now.	2021-07-06 10:57:10 -07:00
Philip Reames	600624a103	[LoopVersion] Move an assert [nfc-ish]	2021-07-06 10:57:10 -07:00
Eli Friedman	74d6ce5d5f	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Arnold Schwaighofer	846a530e7d	Fix coro lowering of single predecessor phis Code assumes that uses of single predecessor phis are not live accross suspend points. Cleanup any single predecessor phis preceeding the code making this assumption. rdar://76020301 Differential Revision: https://reviews.llvm.org/D105488	2021-07-06 10:22:25 -07:00
Alexey Bataev	4e1a0684f1	[SLP]Fix non-determinism in PHI sorting. Compare type IDs and DFS numbering for basic block instead of addresses to fix non-determinism. Differential Revision: https://reviews.llvm.org/D105031	2021-07-06 08:45:45 -07:00
Arnold Schwaighofer	130ea3ceb4	Use swift mangling for resume functions The resume partial functions generated for swift suspend points will now use a Swift mangling suffix. Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_") and suspend resume partial functions will use the suffix 'TY'[0-9]+'_' (e.g "...TY1_"). Reviewed By: nate_chandler Differential Revision: https://reviews.llvm.org/D104144	2021-07-06 08:27:46 -07:00
Florian Hahn	ef0d147cdc	Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups. This reverts commit `706bbfb35b`. The committed version moves the definition of VPReductionPHIRecipe out of an ifdef only intended for ::print helpers. This should resolve the build failures that caused the revert	2021-07-06 14:15:42 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Florian Hahn	706bbfb35b	Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups This reverts commit `3fed6d443f`, `bbcbf21ae6` and `6c3451cd76`. The changes causing build failures with certain configurations, e.g. https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction, llvm::ArrayRef<llvm::VPValue>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]': LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe' collect2: error: ld returned 1 exit status	2021-07-06 12:10:03 +01:00
Florian Hahn	3fed6d443f	[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual. VPReductionRecipe overrides those implementations. Mark them as virtual in the VPWidenPHIRecipe to unbreak build in certain configurations.	2021-07-06 12:00:41 +01:00
Florian Hahn	bbcbf21ae6	[VPlan] Add destructor to VPReductionRecipe to unbreak build. Attempt to unbreak https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio	2021-07-06 11:41:20 +01:00
Florian Hahn	6c3451cd76	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Kerry McLaughlin	17b701c43c	[LV] Collect a list of all element types found in the loop (NFC) Splits `getSmallestAndWidestTypes` into two functions, one of which now collects a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not have to iterate over all instructions in the loop again in other places, such as in D102253 which disables scalable vectorization of a loop if any of the instructions use invalid types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105437	2021-07-06 10:37:41 +01:00
Akira Hatanaka	28fe9afdba	[ObjC][ARC] Prevent moving objc_retain calls past objc_release calls that release the retained object This patch fixes what looks like a longstanding bug in ARC optimizer where it reverses the order of objc_retain calls and objc_release calls that retain and release the same object. The code in ARC optimizer that is responsible for code motion takes the following steps: 1. Traverse the CFG bottom-up and determine how far up objc_release calls can be moved. Determine the insertion points for the objc_release calls, but don't actually move them. 2. Traverse the CFG top-down and determine how far down objc_retain calls can be moved. Determine the insertion points for the objc_retain calls, but don't actually move them. 3. Try to move the objc_retain and objc_release calls if they can't be removed. The problem is that the insertion points for the objc_retain calls are determined in step 2 without taking into consideration the insertion points for objc_release calls determined in step 1, so the order of an objc_retain call and an objc_release call can be reversed, which is incorrect, even though each step is correct in isolation. To fix this bug, this patch teaches the top-down traversal step to take into consideration the insertion points for objc_release calls determined in the bottom-up traversal step. Code motion for an objc_retain call is disabled if there is a possibility that it can be moved past an objc_release call that releases the retained object. rdar://79292791 Differential Revision: https://reviews.llvm.org/D104953	2021-07-05 12:16:15 -07:00
Sanjay Patel	40b752d28d	[InstCombine] fold icmp slt/sgt of offset value with constant This follows up patches for the unsigned siblings: `0c400e8953` `c7b658aeb5` We are translating an offset signed compare to its unsigned equivalent when one end of the range is at the limit (zero or unsigned max). (X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1) (X + C2) <s C --> X >u (C ^ SMAX) (if C == C2) This probably does not show up much in IR derived from C/C++ source because that would likely have 'nsw', and we have folds for that already. As with the previous unsigned transforms, the folds could be generalized to handle non-constant patterns: https://alive2.llvm.org/ce/z/Y8Xrrm ; sgt define i1 @src(i8 %a, i8 %c) { %c2 = add i8 %c, 1 %t = add i8 %a, %c2 %ov = icmp sgt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_off = sub i8 127, %c ; SMAX %ov = icmp ult i8 %a, %c_off ret i1 %ov } https://alive2.llvm.org/ce/z/c8uhnk ; slt define i1 @src(i8 %a, i8 %c) { %t = add i8 %a, %c %ov = icmp slt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_offnot = xor i8 %c, 127 ; SMAX %ov = icmp ugt i8 %a, %c_offnot ret i1 %ov }	2021-07-05 10:08:31 -04:00
Caroline Concatto	b868a2d2c6	[SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector. The function vectorizeChainsInBlock does not support scalable vector, because function like canReuseExtract and isCommutative in the code path assert with scalable vectors. This patch avoids vectorizing blocks that have extract instructions with scalable vector.. Differential Revision: https://reviews.llvm.org/D104809	2021-07-05 12:43:41 +01:00
Stephen Tozer	14b62f7e2f	[DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops This patch fixes an issue which occurred in CodeGenPrepare and HWAddressSanitizer, which both at some point create a map of Old->New instructions and update dbg.value uses of these. They did this by iterating over the dbg.value's location operands, and if an instance of the old instruction was found, replaceVariableLocationOp would be called on that dbg.value. This would cause an error if the same operand appeared multiple times as a location operand, as the first call to replaceVariableLocationOp would update all uses of the old instruction, invalidating the old iterator and eventually hitting an assertion. This has been fixed by no longer iterating over the dbg.value's location operands directly, but by first collecting them into a set and then iterating over that, ensuring that we never attempt to replace a duplicated operand multiple times. Differential Revision: https://reviews.llvm.org/D105129	2021-07-05 10:35:19 +01:00
Nikita Popov	a213f735d8	[IR] Deprecate GetElementPtrInst::CreateInBounds without element type This API is not compatible with opaque pointers, the method accepting an explicit pointer element type should be used instead. Thankfully there were few in-tree users. The BPF case still ends up using the pointer element type for now and needs something like D105407 to avoid doing so.	2021-07-04 16:49:30 +02:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Roman Lebedev	fc150cecd7	[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable This replaces the current ad-hoc implementation, by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`, with one exception that here in SimplifyCFG we are allowed to remove EH instructions. Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return), arithmetic/logic operations, etc. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105374	2021-07-03 10:45:44 +03:00
Fangrui Song	252a1eecc0	[ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for isDeclarationForLinker `GlobalValue`s. It missed a case for imported declarations (`doImportAsDefinition` is false while `isPerformingImport` is true). This can lead to a linker error for a default visibility symbol in `ld.lld -shared`. When `ClearDSOLocalOnDeclarations` is true, we check `isPerformingImport() && !doImportAsDefinition(&GV)` along with `GV.isDeclarationForLinker()`. The new condition checks an imported declaration. This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104986	2021-07-02 17:08:25 -07:00
Roman Lebedev	53fef0b293	[NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs Mimics similar change for InstCombine: `ce192ced2b` / D104602 All these uses are in blocks that aren't reachable from function's entry, and said blocks are removed by SimplifyCFG itself, so we can't really test this change.	2021-07-02 22:11:52 +03:00
Heejin Ahn	51fecd17bb	[InstCombine] Don't combine PHI before catchswitch This tries to bail out if the PHI is in a `catchswitch` BB in InstCombine. A PHI cannot be combined into a non-PHI instruction if it is in a `catchswitch` BB, because `catchswitch` BB cannot have any non-PHI instruction other than `catchswitch` itself. The given test case started crashing after D98058. Reviewed By: lebedev.ri, rnk Differential Revision: https://reviews.llvm.org/D105309	2021-07-02 12:10:24 -07:00
Roman Lebedev	da81ec6158	[SimplifyCFG] Volatile memory operations do not trap Somewhat related to D105338. While it is up for discussion whether or not volatile store traps, so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap. And even if simplifycfg currently concervatively believes that to be the case, instcombine does not: https://godbolt.org/z/5vhv4K5b8 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105343	2021-07-02 21:47:44 +03:00
Alexey Bataev	7f7e4aed21	[SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by V.Dmitriev. Reduces number of arguments	2021-07-02 10:30:13 -07:00
Jon Roelofs	37b6e03c18	[Intrinsics] Make MemCpyInlineInst a MemCpyInst This opens up more optimization opportunities in passes that already handle MemCpyInst's. Differential revision: https://reviews.llvm.org/D105247	2021-07-02 10:25:24 -07:00
Roman Lebedev	13e35ac124	[NFC][InstCombine] visitUnreachableInst(): enhance comments somewhat	2021-07-02 17:30:01 +03:00
Roman Lebedev	dadedc99e9	[InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. This is a recommit of `d9d65527c2`, which i immediately reverted because i have messed up something during branch switch, and `597ccc92ce` accidentally ended up being pushed, which was very much not the intention. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:20:21 +03:00
Roman Lebedev	24d271bb18	Revert "https://godbolt.org/z/5vhv4K5b8 " This reverts commit `597ccc92ce`.	2021-07-02 17:17:55 +03:00
Roman Lebedev	93a1642763	Revert "[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable" This reverts commit `d9d65527c2`.	2021-07-02 17:17:47 +03:00
Roman Lebedev	d9d65527c2	[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:17:03 +03:00
Roman Lebedev	597ccc92ce	https://godbolt.org/z/5vhv4K5b8	2021-07-02 17:16:19 +03:00
Nico Weber	a92964779c	Revert "[InstrProfiling] Use external weak reference for bias variable" This reverts commit `33a7b4d9d8`. Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176	2021-07-02 09:05:12 -04:00
Florian Hahn	a3ca578eb9	[Matrix] Fix crash during fusion if the same load is re-used. This patch fixes a crash when the same load is used for both operands of a fuseable multiply.	2021-07-02 14:00:17 +01:00
Alexey Bataev	28ac873bcb	[SLP]Fix gathering of the scalars by not ignoring UndefValues. The compiler should not ignore UndefValue when gathering the scalars, otherwise the resulting code may be less defined than the original one. Also, grouped scalars to insert them at first to reduce the analysis in further passes. Differential Revision: https://reviews.llvm.org/D105275	2021-07-02 04:46:48 -07:00
Florian Hahn	7655061cc6	[Matrix] Hoist address computation before multiply to enable fusion. If the store address does not dominate the matrix multiply, try to hoist address computation instructions without side-effects and/or memory reads before the multiply, to allow fusion. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D105193	2021-07-02 09:52:11 +01:00
Evgeniy Brevnov	9568811cb8	[NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D101044	2021-07-02 10:00:47 +07:00
Craig Topper	066524ea54	[ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0. Previously we used the vector type, but we're loading/storing invididual elements so I think only element alignment should matter. Noticed while looking at the code for something else so I don't have a test case. Differential Revision: https://reviews.llvm.org/D105220	2021-07-01 19:08:47 -07:00
Petr Hosek	33a7b4d9d8	[InstrProfiling] Use external weak reference for bias variable We need the compiler generated variable to override the weak symbol of the same name inside the profile runtime, but using LinkOnceODRLinkage results in weak symbol being emitted which leads to an issue where the linker might choose either of the weak symbols potentially disabling the runtime counter relocation. This change replaces the use of weak definition inside the runtime with an external weak reference to address the issue. We also place the compiler generated symbol inside a COMDAT group so dead definition can be garbage collected by the linker. Differential Revision: https://reviews.llvm.org/D105176	2021-07-01 15:25:31 -07:00
Philip Reames	955f125899	[instcombine] Fold overflow check using overflow intrinsic to comparison This follows up to D104665 (which added umulo handling alongside the existing uaddo case), and generalizes for the remaining overflow intrinsics. I went to add analogous handling to LVI, and discovered that LVI already had a more general implementation. Instead, we can port was LVI does to instcombine. (For context, LVI uses makeExactNoWrapRegion to constrain the value 'x' in blocks reached after a branch on the condition `op.with.overflow(x, C).overflow`.) Differential Revision: https://reviews.llvm.org/D104932	2021-07-01 09:41:55 -07:00
Arnold Schwaighofer	4a361f5209	[coro async] Add support for specifying which parameter is swiftself in async resume functions Differential Revision: https://reviews.llvm.org/D104147	2021-07-01 07:33:15 -07:00
David Sherwood	51b4ab26ca	[NFC] Add new setDebugLocFromInst that uses the class Builder by default In lots of places we were calling setDebugLocFromInst and passing in the same Builder member variable found in InnerLoopVectorizer. I personally found this confusing so I've changed the interface to take an Optional<IRBuilder<> *> and we can now pass in None when we want to use the class member variable. Differential Revision: https://reviews.llvm.org/D105100	2021-07-01 14:23:34 +01:00
Roman Lebedev	333d3a3cdf	[NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO' I.e. it will be `false` for thin lto.	2021-07-01 10:09:24 +03:00
Chuanqi Xu	51fbd18706	[Coroutine] Recommit Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-07-01 11:01:28 +08:00
Sanjay Patel	0c400e8953	[InstCombine] fold icmp ult of offset value with constant This is one sibling of the fold added with `c7b658aeb5` . (X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN) I'm still not sure how to describe it best, but we're translating 2 constants from an unsigned range comparison to signed because that eliminates the offset (add) op. This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/K-fMBf define i1 @src(i8 %a, i8 %c2) { %t = add i8 %a, %c2 %c = add i8 %c2, 128 ; SMIN %ov = icmp ult i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c2) { %not_c2 = xor i8 %c2, -1 %ov = icmp sgt i8 %a, %not_c2 ret i1 %ov }	2021-06-30 19:00:12 -04:00
Xun Li	822b92aae4	[Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html In the existing design, An SCC that contains a coroutine will go through the folloing passes: Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat. As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended. What we really wanted is this: Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline (note that we don't really need to run Inliner again on the ramp function after split). Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function. This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually. By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines. Reviewed By: aeubanks, ChuanqiXu Differential Revision: https://reviews.llvm.org/D95807	2021-06-30 11:38:14 -07:00
Sanjay Patel	c7b658aeb5	[InstCombine] fold icmp of offset value with constant There must be a better way to describe this pattern in words? (X + C2) >u C --> X <s -C2 (if C == C2 + SMAX) This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/rdfNFP define i1 @src(i8 %a, i8 %c1) { %t = add i8 %a, %c1 %c2 = add i8 %c1, 127 ; SMAX %ov = icmp ugt i8 %t, %c2 ret i1 %ov } define i1 @tgt(i8 %a, i8 %c1) { %neg_c1 = sub i8 0, %c1 %ov = icmp slt i8 %a, %neg_c1 ret i1 %ov } The pattern was noticed as a by-product of D104932.	2021-06-30 13:37:31 -04:00
Philip Reames	c4fc2cb5b2	[instcombine] umin(x, 1) == zext(x != 0) We already implemented this for the select form, but the intrinsic form was missing. Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.	2021-06-30 10:20:01 -07:00
Joseph Huber	ecabc6684f	[OpenMP] Change analysis remarks to not emit on cold functions The remarks will trigger on some functions that are marked cold, such as the `__muldc3` intrinsic functions. Change the remarks to avoid these functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105196	2021-06-30 11:54:24 -04:00
Nico Weber	db86e5c914	Revert "[Coroutine] Add statistics for the number of elided coroutine" This reverts commit `1d9539cf49`. Test fails in LLVM_ENABLE_ASSERTIONS=OFF builds (such as regular release builds).	2021-06-30 10:22:45 -04:00
Joseph Huber	0edb87773b	[OpenMP] Add additional remarks for OpenMPOpt This patch adds additional remarks, suggesting the use of `noescape` for failed globalization and indicating when internalization failed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105150	2021-06-30 09:49:25 -04:00
David Sherwood	7b7b5b5a26	[NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-30 11:11:49 +01:00
Chuanqi Xu	801c2b9bba	[FuncSpec] Add an option to specializing literal constant Now the option is off by default. Since we are not sure if this option would make the compile time increase aggressively. Although we tested it on SPEC2017, we may need to test more to make it on by default. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104365	2021-06-30 11:26:44 +08:00
Chuanqi Xu	1d9539cf49	[Coroutine] Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-06-30 11:20:53 +08:00
Jianzhou Zhao	ae6648cee0	[dfsan] Expose dfsan_get_track_origins to get origin tracking status This allows application code checks if origin tracking is on before printing out traces. -dfsan-track-origins can be 0,1,2. The current code only distinguishes 1 and 2 in compile time, but not at runtime. Made runtime distinguish 1 and 2 too. Reviewed By: browneee Differential Revision: https://reviews.llvm.org/D105128	2021-06-29 20:32:39 +00:00
Nikita Popov	c4de78e91c	[SanitizerCoverage] Fix global type check with opaque pointers The code was previously relying on the fact that an incorrectly typed global would result in the insertion of a BitCast constant expression. With opaque pointers, this is no longer the case, so we should check the type explicitly.	2021-06-29 20:32:14 +02:00
Alexey Bataev	129ae515fb	[INSTCOMBINE] Transform reduction(shuffle V, poison, unique_mask) to reduction(V). After SLP + LTO we may have have reduction(shuffle V, poison, mask). This can be simplified to just reduction(V) if the mask is only for single vector and just all elements from this vector are permuted, without reusing, replacing with undefs and/or other values, etc. Differential Revision: https://reviews.llvm.org/D105053	2021-06-29 10:02:38 -07:00
Philip Reames	e49d65f36d	[LV] Fix bug when unrolling (only) a loop with non-latch exit If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so. The included test case on an unmodified opt will access memory one past the expected bound. As a result, this patch is fixing a latent miscompile. Differential Revision: https://reviews.llvm.org/D103700	2021-06-29 08:04:26 -07:00
Johannes Doerfert	7af91a2b8f	[Attributor][NFCI] Make the state of AAValueSimplify explicit As we have done with other states we want the AAValueSimplify state to be explicit to use it more easily in our helpers.	2021-06-29 09:38:22 -05:00
Johannes Doerfert	dcbe58d94c	[Attributor][NFCI] Remove unneeded namespace	2021-06-29 09:38:20 -05:00
Johannes Doerfert	457bd5c8d5	[Attributor] Teach AAPotentialValues about constant select conditions There was a TODO but now we actually check if the select condition is assumed constant and only look at the relevant operand.	2021-06-29 09:38:18 -05:00
Johannes Doerfert	8dc9bb6d85	[Attributor][NFC] Clang format	2021-06-29 09:38:15 -05:00
Johannes Doerfert	a33e128012	[InstCombine] Gracefully handle an alloca outside the alloca-AS While we might eventually want to disallow allocas that do not have the alloca-AS set, it seems undesirable to crash on them. Add a cast when required so that we can support such allocas (at least here). Differential Revision: https://reviews.llvm.org/D104866	2021-06-29 09:38:13 -05:00
David Sherwood	9de63367d8	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `9dde514162`.	2021-06-29 15:20:22 +01:00
David Sherwood	9dde514162	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 14:34:30 +01:00
David Sherwood	8a3365fba2	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `dcfc2c3fac`.	2021-06-29 14:04:42 +01:00
Florian Hahn	47215e1c62	[LV] Fix crash when target instruction for sinking is dead. This patch fixes a crash when the target instruction for sinking is dead. In that case, no recipe is created and trying to get the recipe for it results in a crash. To ensure all sink targets are alive, find & use the first previous alive instruction. Note that the case where the sink source is dead is already handled. Found by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104603	2021-06-29 13:31:22 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
Roman Lebedev	6cf6f6f65f	[NFC][InstCombine] foldAggregateConstructionIntoAggregateReuse(): cast to Instruction eagerly In all of these, the value must be an instruction for us to succeed anyway, so change it to maybe hopefully make further changes more straight-forward.	2021-06-29 13:29:18 +03:00
David Sherwood	dcfc2c3fac	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 09:14:35 +01:00
Sanjay Patel	9d0bf7699c	[InstCombine] don't try to fold a constant expression that can trap (PR50906) We could use a bigger hammer and bail out on any constant expression, but there's a regression test that appears to validly do the transform (although it may not have been intending to check that optimization).	2021-06-28 17:00:21 -04:00
Joseph Huber	57ad2e1067	[OpenMP] Prevent OpenMPOpt from internalizing uncalled functions Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary. Depends on D102423 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104890	2021-06-28 16:47:53 -04:00
Nikita Popov	7ac0442fe5	[SanitizerCoverage] Support opaque pointers Pass element type rather than pointer type to some functions, so we know which type to use for the global variables.	2021-06-28 22:18:42 +02:00
Arnold Schwaighofer	3dee1e8a84	[coro] Fix rematerializable instruction sinking to coro.suspend blocks There is a constraint that coro.suspend instructions need to be in their own blocks. The coro split pass initially creates IR that obeys this constraint (which is later checked). Sinking rematerializable instructions into these blocks breaks that constraint. Instead rematerialize in the predecessor block to the suspend's single predecessor block. Differential Revision: https://reviews.llvm.org/D104051	2021-06-28 09:37:45 -07:00
Nico Weber	540b4a5fb3	Revert "[DebugInfo] Enable variadic debug value salvaging" This reverts commit `adace79652`. Still breaks things, see comment on https://reviews.llvm.org/D91722	2021-06-28 11:25:09 -04:00
Reshabh Sharma	ae983de6cc	[InferAddressSpaces] NFC: For noop IntToPtr/PtrToInt pair cast to operator instead of PtrToInt Compiler crashes at an assertion while casting operands to PtrToIntInst at some cases when ptrtoint is present as an explicit operand to inttoptr. Explicit instruction operator as operand can not be casted to an Instruction. This patch replaces cast from PtrToInst to Operator which are later checked for constant expressions. Differential Revision: https://reviews.llvm.org/D105002	2021-06-28 19:24:26 +05:30
Joseph Huber	13b2fba239	[OpenMP][NFC] Fix typo in OpenMPOpt	2021-06-28 09:49:14 -04:00
Joseph Huber	4024087731	[OpenMP][NFC] Fix missing argument	2021-06-28 09:15:01 -04:00
Joseph Huber	4a6bd8e3e7	[OpenMP] Increase attributor iterations on the GPU Increase the number of attributor iterations on a GPU target. I forgot to change this in D104416. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104920	2021-06-28 08:50:49 -04:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Max Kazantsev	d58514d41c	[LSR][NFC] Make sure that after the canonicalization the formula is canonical	2021-06-28 12:50:04 +07:00
Max Kazantsev	7c73c2ede8	[LoopDeletion] Benefit from branches by undef conditions when symbolically executing 1st iteration We can exploit branches by `undef` condition. Frankly, the LangRef says that such branches are UB, so we can assume that all outgoing edges of such blocks are dead. However, from practical perspective, we know that this is not supported correctly in some other places. So we are being conservative about it. Branch by undef is treated in the following way: - If it is a loop-exiting branch, we always assume it exits the loop; - If not, we arbitrarily assume it takes `true` value. Differential Revision: https://reviews.llvm.org/D104689 Reviewed By: nikic	2021-06-28 11:39:46 +07:00
Nikita Popov	e81702912e	[DSE] Preserve address space Preserve address space when inserting i8* cast.	2021-06-27 20:26:00 +02:00
Nikita Popov	9aa951e80e	[MemCpyOpt] Preserve address space Preserve address space when generating the cast to i8*.	2021-06-27 20:21:19 +02:00
Nikita Popov	f00941e061	[DSE] Support opaque pointers For the start shortening optimization, always use a i8 type for the GEP, as it is a raw offset calculation. Handling of non-i8* memset/memcpy arguments requires insertion of casts. These cases were previously miscompiled, as the offset calculation was performed on the wrong type.	2021-06-27 17:41:40 +02:00
Nikita Popov	f025053977	[MemCpyOpt] Handle unusual memcpy element type Apparently, it is legal to use memcpy/memset with pointer types other than i8. Prior to `81fcdae68c` this case was silently miscompiled, as the i8 offset calculation was performed on some other type. Now it would crash due to a type mismatch. Fix this by inserting an explicit bitcast to i8.	2021-06-27 16:21:44 +02:00

1 2 3 4 5 ...

27868 Commits