llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	6abce17fc2	[VPlan] Use Exiting-block instead of Exit-block terminology (NFC). In LLVM's common loop terminology, an exit block is a block outside a loop with a predecessor inside the loop. An exiting block is a block inside the loop which branches to an exit block outside the loop. This patch updates a few places where VPlan was using ExitBlock for a block exiting a region. Those instances have been updated to use ExitingBlock. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126173	2022-05-28 21:16:05 +01:00
eopXD	6a84579243	[LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D126350	2022-05-27 15:22:23 -07:00
Sanjay Patel	b5b6aa4d53	[InstCombine] fold multiply by signbit-splat to cmp+select (ashr i32 X, 31) * C --> (X < 0) ? -C : 0 https://alive2.llvm.org/ce/z/G8u9SS With a constant operand, this is an improvement in IR and codegen (where it can be converted to a mask op). Without a constant operand, we would have to negate the operand, so that is probably better left to the backend. This is similar but not the same optimization that is requested in #55618.	2022-05-27 11:54:19 -04:00
Sanjay Patel	5a6e085757	[InstCombine] reduce code duplication; NFC	2022-05-27 11:54:19 -04:00
Enna1	52992f136b	Add !nosanitize to FixedMetadataKinds This patch adds !nosanitize metadata to FixedMetadataKinds.def, !nosanitize indicates that LLVM should not insert any sanitizer instrumentation. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D126294	2022-05-27 09:46:13 +08:00
Arthur Eubanks	36096c2b38	[NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter All callers pass true. select-unfold-freeze.ll is now a subset of select.ll so delete it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126501	2022-05-26 16:13:34 -07:00
Sanjay Patel	c4c750058f	[InstCombine] fold mul of signbit directly to X < 0 ? Y : 0 This is effectively NFC (intentionally no test diffs) because we already have the related fold that converts the 'and' pattern to select. So this is just an efficiency improvement.	2022-05-26 16:19:15 -04:00
Sanjay Patel	49f8b05137	[InstCombine] fold icmp equality with sdiv and SMIN This extends the fold from D126410 / `3952c905ef` to allow for the only case where it works with signed division: https://alive2.llvm.org/ce/z/k7_ypu (X s/ Y) == SMIN --> (X == SMIN) && (Y == 1) (X s/ Y) != SMIN --> (X != SMIN) \|\| (Y != 1) This is another improvement based on #55695.	2022-05-26 16:19:15 -04:00
Sanjay Patel	ed5be1523f	[InstCombine] reduce code duplication in icmp+div folds; NFC	2022-05-26 16:19:15 -04:00
Owen Anderson	939a43461b	Revert "Replace the custom linked list in LeaderTableEntry with TinyPtrVector." This reverts commit `1e91149844`. Pending further discussion.	2022-05-26 09:50:36 -07:00
Nikita Popov	c8eb83f2d0	[ControlHeightReduction] Use logical and Use logical instead of bitwise and to combine conditions, to avoid propagating poison from a later condition if an earlier one is already false. This avoids introducing branch on poison. Differential Revision: https://reviews.llvm.org/D125898	2022-05-26 18:03:35 +02:00
Alexey Bataev	7b809c30b9	[SLP]Improve compile time, NFC. Patch improves compile time. For function calls, which cannot be vectorized, create a unique group for each such a call instead of subgroup. It prevents them from being grouped by a subgroups and attempts for their vectorization. Also, looks through casts operand to try to check their groups/subgroups. Reduces number of vectorization attempts. No changes in the statistics for SPEC2017/2006/llvm-test-suite. Differential Revision: https://reviews.llvm.org/D126476	2022-05-26 08:40:59 -07:00
Alexey Bataev	120d52b0ef	[SLP]Fix PR55653: emit undefs where required, not poison. Need to handle a corner case correctly, if all elements are Undefs/Poisons, need to emit actual values, not just poisons. Differential Revision: https://reviews.llvm.org/D126298	2022-05-26 08:38:50 -07:00
Alex Zhikhartsev	8b0d763474	[DFAJumpThreading] Relax analysis to handle unpredictable initial values Responding to a feature request from the Rust community: https://github.com/rust-lang/rust/issues/80630 void foo(X) { for (...) switch (X) case A X = B case B X = C } Even though the initial switch value is non-constant, the switch statement can still be threaded: the initial value will hit the switch statement but the rest of the state changes will proceed by jumping unconditionally. The early predictability check is relaxed to allow unpredictable values anywhere, but later, after the paths through the switch statement have been enumerated, no non-constant state values are allowed along the paths. Any state value not along a path will be an initial switch value, which can be safely ignored. Differential Revision: https://reviews.llvm.org/D124394	2022-05-26 11:29:54 -04:00
Simon Pilgrim	14258d6fb5	[SLP] Move canVectorizeLoads implementation to simplify the diff in D105986. NFC.	2022-05-26 15:23:58 +01:00
Alexey Bataev	9139d484d4	[SLP]Fix crash on reordering of ScatterVectorize nodes. ScatterVectorize nodes should be handled same way as gathers in reorderBottomToTop function, since we can simple reorder the loads in this node. Because of that need to include such nodes to the list of gathered nodes to fix compiler crash. Differential Revision: https://reviews.llvm.org/D126378	2022-05-26 06:25:58 -07:00
Sanjay Patel	3952c905ef	[InstCombine] fold icmp equality with udiv and large constant With large compare constant: (X u/ Y) == C --> (X == C) && (Y == 1) (X u/ Y) != C --> (X != C) \|\| (Y != 1) https://alive2.llvm.org/ce/z/EhKwh6 There are various potential missing icmp (div) transforms shown here: https://github.com/llvm/llvm-project/issues/55695 This is a generalization for part of the udiv + equality. I didn't check in detail, but some of those may only make sense as codegen transforms. This results in one extra instruction in IR, but it is better for analysis, and looks much better in codegen on all targets that I tried. Differential Revision: https://reviews.llvm.org/D126410	2022-05-26 09:08:47 -04:00
Florian Hahn	f96aa493f0	[SimpleLoopUnswitch] Always skip trivial select and set condition. When updating the branch instruction outside the loopduring non-trivial unswitching, always skip trivial selects and update the condition. Otherwise we might create invalid IR, because the trivial select is inside the loop, while the condition is outside the loop. Fixes #55697.	2022-05-26 09:46:24 +01:00
Florian Hahn	390c0ac28d	[LV] Fix indentation in tryToCreateWidenRecipe (NFC).	2022-05-26 08:53:34 +01:00
Owen Anderson	1e91149844	Replace the custom linked list in LeaderTableEntry with TinyPtrVector. The purpose of the custom linked list was to optimize for the case of a single-element list. It turns out that TinyPtrVector handles the same basic scenario even better, reducing the size of LeaderTableEntry by 33%, and requiring only log2(N) allocations as the size of the list grows. The only downside is that we have to store the Value's and BasicBlock's in separate vectors, which is slightly awkward in a few cases. Fortunately that ends up being entirely encapsulated inside helper functions. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D125205	2022-05-25 23:52:44 -07:00
Serguei Katkov	c2eccc67ce	[GuardWidening] Remove nuw/nsw flags for hoisted instructions When we hoist instructions over guard we must clear flags due to these flags might be implied using this guard, so they make sense only after the guard. As an example of the bug due to current behavior. L is known to be in range say [0, 100) c1 = x u< L guard (c1) x1 = add x, 1 c2 = x1 u< L guard(c2) basing on guard(c1) we can say that x1 = add nuw nsw x, 1 after guard widening we get c1 = x u< L x1 = add nuw nsw x, 1 c2 = x1 u< L c = and c1, c2 guard(c) now, basing on fact that x + 1 < L and x >= 0 due to x + 1 is nuw we can prove that x + 1 u< L implies that x u< L, so we can just remove c1 x1 = add nuw nsw x, 1 c2 = x1 u< L guard(c2) But that is not correct due to we will pass x == -1 value. Reviewed By: mkazantsev Subscribers: llvm-commits, nikic Differential Revision: https://reviews.llvm.org/D126354	2022-05-26 13:20:55 +07:00
serge-sans-paille	fb67d683db	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `7030654296` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D126417	2022-05-26 08:12:34 +02:00
Chenbing Zheng	1486a9c9fe	[InstCombine] [NFC] refector foldXorOfICmps Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126268	2022-05-26 11:07:18 +08:00
Chenbing Zheng	41aab93afc	[InstCombine] bitcast(logic(bitcast(X), bitcast(Y))) -> bitcast'(logic(bitcast'(X), Y)) This patch break foldBitCastBitwiseLogic limite the destination must have an integer element type, and eliminate one bitcast by doing the logic op in the type of the input that has an integer element type. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126184	2022-05-26 10:23:44 +08:00
Alexey Bataev	3bf5c2c8ec	[SLP]Do not try to generate ScatterVectorize if it will be scalarized. SLP should build ScatterVectorize nodes only if they actually end up with masked gather rather than with scalarization. In the second scenario better to build a gather node. Differential Revision: https://reviews.llvm.org/D126379	2022-05-25 14:25:07 -07:00
Alexey Bataev	10f41a2147	[SLP]Fix PR55688: Miscompile due to incorrect nuw/nsw handling. Need to use all ReductionOps when propagating flags for the reduction ops, otherwise transformation is not correct. Plus, need to drop nuw/nsw flags. Differential Revision: https://reviews.llvm.org/D126371	2022-05-25 13:59:06 -07:00
David Sherwood	87936c7b13	[LoopVectorize] Fix assertion failure in fixReduction when tail-folding When compiling the attached new test in scalable-reductions-tf.ll we were hitting this assertion in fixReduction: Assertion `isa<PHINode>(U) && "Reduction exit must feed Phi's or select" The loop contains a reduction and an intermediate store of the reduction value. When vectorising with tail-folding the contains of 'U' in the assertion above happened to be a scatter_store. It turns out that we were still creating a widen recipe for the invariant store, despite knowing that we can actually sink it. The simplest fix is to change buildVPlanWithVPRecipes so that we look for invariant stores before attempting to widen it. Differential Revision: https://reviews.llvm.org/D126295	2022-05-25 11:46:32 +01:00
Florian Hahn	c6e45ea074	[VPlan] Exit earlier when trying to widen with scalar VFs. This simplifies the code a bit, suggested in D124718. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D125029	2022-05-25 11:05:23 +01:00
Florian Hahn	1ba42dd04b	[VPlan] Use MapVector for LiveOuts for deterministic iteration. During code-gen, we iterate over the LiveOuts and the differences in iteration order can cause slightly different outputs.	2022-05-25 09:30:02 +01:00
Chenbing Zheng	269e3f7369	[InstCombine] [NFC] Move transforms for truncated shifts into narrowBinOp Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126056	2022-05-25 10:21:39 +08:00
Martin Sebor	46c0ec9df4	[InstCombine] Fold memrchr calls with sequences of identical bytes. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123631	2022-05-24 17:00:11 -06:00
Vasileios Porpodas	9df0568b07	[SLP] Fix crash caused by reorderBottomToTop(). The crash is caused by incorrect order set by reorderBottomToTop(), which happens when it is reordering a TreeEntry which has a user that has already been reordered earlier. Please see the detailed description in the lit test. Differential Revision: https://reviews.llvm.org/D126099	2022-05-24 12:24:19 -07:00
Sanjay Patel	05527b68a0	[InstCombine] fold more shuffles with FP<->Int cast operands shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask) This extends the transform added with `0353c2c996`. If the shuffle reduces vector length, the transform reduces the width of the cast, so that should be a win for most codegen (if not, it can be inverted).	2022-05-24 15:11:38 -04:00
Nikita Popov	e6e0eb3bc8	[InstCombine] Strip bitcasts in GEP diff fold Bitcasts were stripped in one case, but not the other. Of course, this no longer really matters with opaque pointers, but as I went through the trouble of tracking this down, we may as well remove one typed vs opaque pointer optimization discrepancy.	2022-05-24 16:12:01 +02:00
Nikita Popov	b2a13d3e2d	[InstCombine] Use IRBuilder in freeze pushing transform (PR55619) Use IRBuilder so that the newly created freeze instructions automatically gets inserted back into the IC worklist. The changed worklist processing order leads to some cosmetic differences in tests. Fixes https://github.com/llvm/llvm-project/issues/55619.	2022-05-24 15:48:28 +02:00
Alexey Bataev	f9c806ae5c	[SLP][NFC]Make isFirstInsertElement a weak strict ordering comparator. To be used correctly in a sort-like function, isFirstInsertElement function must follow weak strict ordering rule, i.e. isFirstInsertElement(IE1, IE1) should return false.	2022-05-24 06:02:42 -07:00
Nikita Popov	a7c079aaa2	[InstCombine] Support logical and in masked icmp fold Most of the folds implemented in this function work fine with logical operations. We only need to be careful for the cases that work on non-constant masks, where the RHS operand shouldn't be poison. This is a conservative implementation that bails out of illegal transforms, but we could also change these to insert freeze instead.	2022-05-24 11:16:33 +02:00
Nikita Popov	5abaabed22	[InstCombine] Use m_APInt() in asymmetric masked icmp fold This is mostly intended as code cleanup, but it does also add support for splat vectors to this fold.	2022-05-24 10:57:28 +02:00
Nikita Popov	c0e06c7448	[InstCombine] Handle logical and/or in recursive and/or of icmps fold The and/or of icmps fold is also applied in reassociated form. However, this currently only happens for bitwise and of bitwise and, but not for bitwise and of logical and (or other combinations, but this is the one being addressed here). We can do this for bitwise+logical combinations as well, but need to be a bit careful about which of the resulting ands are logical: https://alive2.llvm.org/ce/z/WYSjGh https://alive2.llvm.org/ce/z/guxYnz https://alive2.llvm.org/ce/z/S5SYxY https://alive2.llvm.org/ce/z/2rAWeW	2022-05-24 10:13:10 +02:00
Nikita Popov	81c648a3d9	[LoopUnroll] Freeze tripcount rather than condition This is a followup to D125754. We introduce two branches, one before the unrolled loop and one before the epilogue (and similar for the prologue case). The previous patch only froze the condition on the first branch. Rather than independently freezing the second condition, this patch instead freezes TripCount and bases BECount on it. These are the two quantities involved in the conditions, and this ensures that both work on a consistent, non-poisonous trip count. Differential Revision: https://reviews.llvm.org/D125896	2022-05-24 09:42:39 +02:00
Hendrik Greving	4f93d5cc1d	[BasicBlockUtils] Do not move loop metadata if outer loop header. Fixes a bug preventing moving the loop's metadata to an outer loop's header, which happens if the loop's exit is also the header of an outer loop. Adjusts test for above. Fixes #55416. Differential Revision: https://reviews.llvm.org/D125574	2022-05-23 16:39:54 -07:00
Alexey Bataev	319a722f6f	[SLP][NFC]Improve compile time, NFC. Builds UserIgnore list only once as a SmallDenseSet without rebuilding it between the runs, iterate over gathers instead list of reduction ops, do some checks in the buildTree_rec only if the corresponding containers are not empty.	2022-05-23 12:15:27 -07:00
Sanjay Patel	e8c20d995b	[IR] add and use pattern match specialization for sqrt intrinsic; NFC This was included in D126190 originally, but it's independent and a useful change for readability.	2022-05-23 14:16:30 -04:00
Benjamin Kramer	2f2ca30d0a	Fix an unused variable warning in no-asserts build mode	2022-05-23 19:53:40 +02:00
Nikita Popov	f45c1e436e	[InstCombine] Change operand order in recursive and/or of icmps fold The order obviously doesn't matter for bitwise and/or, but would matter for logical and/or, so change it to preserve the original order.	2022-05-23 17:29:33 +02:00
Jingu Kang	bb82f74612	Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit `42ebfa8269`. The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure. Differential Revision: https://reviews.llvm.org/D118979	2022-05-23 16:15:45 +01:00
Alexey Bataev	2ac5ebedea	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-23 07:06:45 -07:00
Sanjay Patel	1ebad988b1	[InstCombine] fold icmp of zext bool based on limited range X <u (zext i1 Y) --> (X == 0) && Y https://alive2.llvm.org/ce/z/avQDRY This is a generalization of `4069cccf3b` based on the post-commit suggestion. This also adds the i1 type check and tests that were missing from the earlier attempt; that commit caused several bot fails and was reverted. Differential Revision: https://reviews.llvm.org/D126171	2022-05-23 09:59:21 -04:00
Nikita Popov	45226d04f0	[InstCombine] Reuse icmp of and/or folds for logical and/or Similarly to a change recently done for fcmps, add a flag that indicates whether the and/or is logical to foldAndOrOfICmps, and reuse the function when folding logical and/or. We were already calling some parts of it, but this gives us a clearer indication of which parts may need poison-safe variants, and would also allow to fold combinations of bitwise and logical and/or. This change should be close to NFC, because all folds this enables were either already called previously, or can make use of implied poison reasoning.	2022-05-23 15:37:07 +02:00
Peter Waller	ade47bdc31	[LV] Improve register pressure estimate at high VFs Previously, `getRegUsageForType` was implemented using `getTypeLegalizationCost`. `getRegUsageForType` is used by the loop vectorizer to estimate the register pressure caused by using a vector type. However, `getTypeLegalizationCost` currently only appears to understand splitting and not scalarization, so significantly underestimates the register requirements. Instead, use `getNumRegisters`, which understands when scalarization can occur (via computeRegisterProperties). This was discovered while investigating D118979 (Set maximum VF with shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the loop vectorizer previously ends up costing an v128i1 as 2 v64i* registers where it actually occupies 128 i32 registers. I'm sending this patch early for comment, I'm still doing some sanity checking with LNT. I note that getRegisterClassForType appears to return VectorRC even though the type in question (large vNi1 types) end up occupying scalar registers. That might be worth fixing too. Differential Revision: https://reviews.llvm.org/D125918	2022-05-23 07:57:45 +00:00

1 2 3 4 5 ...

30598 Commits