llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	95e5f28337	[InstCombine] Remove redundant/bogus mul_with_overflow combines As pointed out in D60518 folding mulo(%x, undef) to {undef, undef} isn't correct. As a correct version of this already exists in InstructionSimplify (`bd8056ef32/lib/Analysis/InstructionSimplify.cpp (L4750-L4757)`) this is just dead code though. Drop it together with the mul(%x, 0) -> {0, false} fold that is also already handled by InstSimplify. Differential Revision: https://reviews.llvm.org/D60649 llvm-svn: 358339	2019-04-13 19:43:35 +00:00
Nikita Popov	0a8228fd28	[InstCombine] Handle ssubo always overflow Following D60483 and D60497, this adds support for AlwaysOverflows handling for ssubo. This is the last case we can handle right now. Differential Revision: https://reviews.llvm.org/D60518 llvm-svn: 358100	2019-04-10 16:32:15 +00:00
Nikita Popov	ef23e88480	[InstCombine] Handle saddo always overflow Followup to D60483: Handle AlwaysOverflow conditions for saddo as well. Differential Revision: https://reviews.llvm.org/D60497 llvm-svn: 358095	2019-04-10 16:18:01 +00:00
Nikita Popov	09020ec2a7	[InstCombine] Handle usubo always overflow Check AlwaysOverflow condition for usubo. The implementation is the same as the existing handling for uaddo and umulo. Handling for saddo and ssubo will follow (smulo doesn't have the necessary ValueTracking support). Differential Revision: https://reviews.llvm.org/D60483 llvm-svn: 358052	2019-04-10 07:10:53 +00:00
Nikita Popov	596cbeb705	[InstCombine] Directly call computeOverflow methods in OptimizeOverflowCheck; NFC Instead of using the willOverflow helpers. This makes it easier to extend handling of AlwaysOverflows. llvm-svn: 358051	2019-04-10 07:10:44 +00:00
Nikita Popov	eda3b9326e	[InstCombine] Restructure OptimizeOverflowCheck; NFC Change the code to always handle the unsigned+signed cases together with the same basic structure for add/sub/mul. The simple folds are always handled first and then the ValueTracking overflow checks are used. llvm-svn: 358025	2019-04-09 18:32:28 +00:00
Luqman Aden	8911c5be46	[InstCombine] Combine no-wrap sub and icmp w/ constant. Teach InstCombine the transformation `(icmp P (sub nuw\|nsw C2, Y), C) -> (icmp swap(P) Y, C2-C)` Reviewers: majnemer, apilipenko, sanjoy, spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: dmgreen, lebedev.ri, nikic, hiraditya, JDevlieghere, jfb, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59916 llvm-svn: 357674	2019-04-04 07:08:30 +00:00
Sanjay Patel	781d883862	[InstCombine] Fix crashing from (icmp (bitcast ([su]itofp X)), Y) This fixes a class of bugs introduced by D44367, which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y. If the bitcast is between vector types with a different number of elements, the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>. This patch suppresses the transform if the bitcast changes the number of vector elements. Patch by: @AndrewScheidecker (Andrew Scheidecker) Differential Revision: https://reviews.llvm.org/D57871 llvm-svn: 353467	2019-02-07 21:12:01 +00:00
Sanjay Patel	e7f46c3db3	[InstCombine] refactor folds for (icmp (bitcast X), Y); NFCI llvm-svn: 353462	2019-02-07 20:54:09 +00:00
Sanjay Patel	68bc5fb0ad	[InstCombine] X \| C == C --> (X & ~C) == 0 We should canonicalize to one of these forms, and compare-with-zero could be more conducive to follow-on transforms. This also leads to generally better codegen as shown in PR40611: https://bugs.llvm.org/show_bug.cgi?id=40611 llvm-svn: 353313	2019-02-06 16:43:54 +00:00
James Y Knight	7976eb5838	[opaque pointer types] Pass function types to CallInst creation. This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909	2019-02-01 20:43:25 +00:00
Nikita Popov	6515db205a	[InstCombine] Simplify cttz/ctlz + icmp ugt/ult Followup to D55745, this time handling comparisons with ugt and ult predicates (which are the canonical forms for non-equality predicates). For ctlz we can convert into a simple icmp, for cttz we can convert into a mask check. Differential Revision: https://reviews.llvm.org/D56355 llvm-svn: 351645	2019-01-19 09:56:01 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Nikita Popov	20853a7807	[InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check Checking whether a number has a certain number of trailing / leading zeros means checking whether it is of the form XXXX1000 / 0001XXXX, which can be done with an and+icmp. Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next step, this can be extended to non-equality predicates. Differential Revision: https://reviews.llvm.org/D55745 llvm-svn: 349530	2018-12-18 19:59:50 +00:00
Nikita Popov	36e03ac6ee	[InstCombine] Fix negative GEP offset evaluation for 32-bit pointers This fixes https://bugs.llvm.org/show_bug.cgi?id=39908. The evaluateGEPOffsetExpression() function simplifies GEP offsets for use in comparisons against zero, basically by converting XScale+Offset==0 to X+Offset/Scale==0 if Scale divides Offset. However, before this is done, Offset is masked down to the pointer size. This results in incorrect results for negative Offsets, because we basically end up dividing the 32-bit offset zero* extended to 64-bit bits (rather than sign extended). Fix this by explicitly sign extending the truncated value. Differential Revision: https://reviews.llvm.org/D55449 llvm-svn: 348987	2018-12-12 23:19:03 +00:00
Roman Lebedev	98cb1216a6	[InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts I was finally able to quantify what i thought was missing in the fix, it was vector constants. If we have a scalar (and %x, -1), it will be instsimplified before we reach this code, but if it is a vector, we may still have a -1 element. Thus, we want to avoid the fold if at least one element is -1. Or in other words, ignoring the undef elements, no sign bits should be set. Thus, m_NonNegative(). A follow-up for rL348181 https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348462	2018-12-06 08:14:24 +00:00
Sanjay Patel	baffae91b2	[InstCombine] simplify icmps with same operands based on dominating cmp The tests here are based on the motivating cases from D54827. More background: 1. We don't get these cases in general with SimplifyCFG because the root of the pattern match is an icmp, not a branch. I'm not sure how often we encounter this pattern vs. the seemingly more likely case with branches, but I don't see evidence to leave the minimal pattern unoptimized. 2. This has a chance of increasing compile-time because we're using a ValueTracking call to handle the match. The motivating cases could be handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/ isImpliedFalseByMatchingCmp, but I saw that we have a more comprehensive wrapper around those, so we might as well use it here unless there's evidence that it's significantly slower. 3. Ideally, we'd handle the fold to constants in InstSimplify, but as with the existing code here, we could extend this to handle cases where the result is not a constant, but a new combined predicate. That would mean splitting the logic across the 2 passes and possibly duplicating the pattern-matching cost. 4. As mentioned in D54827, this seems like the kind of thing that should be handled in Correlated Value Propagation, but that pass is currently limited to dealing with instructions with constant operands, so extending this bit of InstCombine is the smallest/easiest way to get these patterns optimized. llvm-svn: 348367	2018-12-05 15:04:00 +00:00
Sanjay Patel	d23b5ed857	[InstCombine] rearrange foldICmpWithDominatingICmp; NFC Move it out from under the constant check, reorder predicates, add comments. This makes it easier to extend to handle the non-constant case. llvm-svn: 348284	2018-12-04 17:44:24 +00:00
Sanjay Patel	a40bf9fff7	[InstCombine] add helper for icmp with dominator; NFC There's a potential small enhancement to this code that could solve the cases currently under proposal in D54827 via SimplifyCFG. Whether instcombine should be doing this kind of semi-non-local analysis in the first place is an open question, but separating the logic out can only help if/when we decide to move it to a different pass. AFAICT, any proposal to do this in SimplifyCFG could also be seen as an overreach + it would be incomplete to start the fold from a branch rather than an icmp. There's another question here about the code for processUGT_ADDCST_ADD(). That part may be completely dead after rL234638 ? llvm-svn: 348273	2018-12-04 15:35:17 +00:00
Roman Lebedev	7bf2fed167	[InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds. These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: https://rise4fun.com/Alive/9au https://rise4fun.com/Alive/UcQM Fixes https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348181	2018-12-03 20:07:58 +00:00
Sanjay Patel	57a08b3343	[InstCombine] propagate FMF for fcmp+fabs folds By morphing the instruction rather than deleting and creating a new one, we retain fast-math-flags and potentially other metadata (profile info?). llvm-svn: 346331	2018-11-07 16:15:01 +00:00
Sanjay Patel	bb521e63af	[InstCombine] peek through fabs() when checking isnan() That should be the end of the missing cases for this fold. See earlier patches in this series: rL346321 rL346324 llvm-svn: 346327	2018-11-07 15:44:26 +00:00
Sanjay Patel	fa5f146872	[InstCombine] add folds for fcmp Pred fabs(X), 0.0 Similar to rL346321, we had folds for the ordered versions of these compares already, so add the unordered siblings for completeness. llvm-svn: 346324	2018-11-07 15:33:03 +00:00
Sanjay Patel	76faf5145d	[InstCombine] add fold for fabs(X) u< 0.0 The sibling fold for 'oge' --> 'ord' was already here, but this half was missing. The result of fabs() must be positive or nan, so asking if the result is negative or nan is the same as asking if the result is nan. This is another step towards fixing: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346321	2018-11-07 15:11:32 +00:00
Sanjay Patel	d1172a0c20	[IR] add optional parameter for copying IR flags to compare instructions As shown, this is used to eliminate redundant code in InstCombine, and there are more cases where we should be using this pattern, but we're currently unintentionally dropping flags. llvm-svn: 346282	2018-11-07 00:00:42 +00:00
Sanjay Patel	724014adde	[InstCombine] allow vector types for fcmp+fpext fold llvm-svn: 346245	2018-11-06 17:20:20 +00:00
Sanjay Patel	46bf3922c1	[InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2 llvm-svn: 346242	2018-11-06 16:45:27 +00:00
Sanjay Patel	7c3ee4da42	[InstCombine] rearrange code for fcmp+fpext; NFCI llvm-svn: 346241	2018-11-06 16:37:35 +00:00
Sanjay Patel	1b85f00201	[InstCombine] propagate fast-math-flags when folding fcmp+fpext llvm-svn: 346240	2018-11-06 16:23:03 +00:00
Sanjay Patel	2fd5b0ebfb	[InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2 llvm-svn: 346238	2018-11-06 15:58:57 +00:00
Sanjay Patel	05e70fb978	[InstCombine] reduce code; NFC llvm-svn: 346235	2018-11-06 15:53:58 +00:00
Sanjay Patel	70282a0501	[InstCombine] propagate fast-math-flags when folding fcmp+fneg This is another part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 This might be enough to fix that particular issue, but as noted with the FIXME, we're still dropping FMF on other folds around here. llvm-svn: 346234	2018-11-06 15:49:45 +00:00
Sanjay Patel	c26fd1e772	[InstCombine] canonicalize -0.0 to +0.0 in fcmp As stated in IEEE-754 and discussed in: https://bugs.llvm.org/show_bug.cgi?id=38086 ...the sign of zero does not affect any FP compare predicate. Known regressions were fixed with: rL346097 (D54001) rL346143 The transform will help reduce pattern-matching complexity to solve: https://bugs.llvm.org/show_bug.cgi?id=39475 ...as well as improve CSE and codegen (a zero constant is almost always easier to produce than 0x80..00). llvm-svn: 346147	2018-11-05 17:26:42 +00:00
Sanjay Patel	1c254c6716	[InstCombine] refactor fabs+fcmp fold; NFC Also, remove/replace/minimize/enhance the tests for this fold. The code drops FMF, so it needs more tests and at least 1 fix. llvm-svn: 345734	2018-10-31 16:34:43 +00:00
Sanjay Patel	b9fe3fbb57	[InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC The 'OLT' case was updated at rL266175, so I assume it was just an oversight that 'UGE' was not included because that patch handled both predicates in InstSimplify. llvm-svn: 345727	2018-10-31 15:31:45 +00:00
Sanjay Patel	85cba3b6fb	[InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725	2018-10-31 14:57:23 +00:00
Sanjay Patel	4c39dfc91e	[InstCombine] use 'match' to reduce code; NFC llvm-svn: 345647	2018-10-30 20:52:25 +00:00
Sanjay Patel	68a61cb07c	[InstCombine] use getFltSemantics() instead of duplicating it; NFC llvm-svn: 345613	2018-10-30 16:21:56 +00:00
Sanjay Patel	05aadf885d	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181	2018-10-10 20:47:46 +00:00
Sanjay Patel	58fc00d0bc	revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization This commit accidentally included the diffs from D53057. llvm-svn: 344178	2018-10-10 20:39:39 +00:00
Sanjay Patel	e9ca7ea3e5	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082	2018-10-09 21:26:01 +00:00
Jesper Antonsson	c954b86391	[InstCombine] Handle vector compares in foldGEPIcmp(), take 2 Summary: This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.) This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type. The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger. Reviewers: lebedev.ri, majnemer, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52494 llvm-svn: 343486	2018-10-01 14:59:25 +00:00
Sanjay Patel	c3f50ff92e	[InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0) When C is not zero and infinites are not allowed (C / X) > 0 is a sign test. Depending on the sign of C, the predicate must be swapped. E.g.: foo(double X) { if ((-2.0 / X) <= 0) ... } => foo(double X) { if (X >= 0) ... } Patch by: @marels (Martin Elshuber) Differential Revision: https://reviews.llvm.org/D51942 llvm-svn: 343228	2018-09-27 15:59:24 +00:00
Jesper Antonsson	719fa055d0	[InstCombine] Handle vector compares in foldGEPIcmp() Summary: This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding": https://bugs.llvm.org/show_bug.cgi?id=38984 Reviewers: majnemer, spatel, lattner, lebedev.ri Reviewed By: lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52263 llvm-svn: 342647	2018-09-20 13:37:28 +00:00
Roman Lebedev	f50023d37c	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask Summary: The last low-bit-mask-pattern-producing-pattern i can think of. https://rise4fun.com/Alive/UGzE <- non-canonical But we can not canonicalize it because of extra uses. https://bugs.llvm.org/show_bug.cgi?id=38123 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52148 llvm-svn: 342548	2018-09-19 13:35:46 +00:00
Roman Lebedev	ca2bdb03d6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask Summary: Same as to D52146. `((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl We can not canonicalize it due to the extra uses. But we can handle it here. Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52147 llvm-svn: 342547	2018-09-19 13:35:40 +00:00
Roman Lebedev	183a465dc6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask Summary: Two folds are happening here: 1. https://rise4fun.com/Alive/oaFX 2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4 This change doesn't just add the handling for eq/ne predicates, it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work, so all the 16 fold variants* are immediately supported. I'm indeed only testing these two predicates. I do not feel like re-proving all 16 folds, because they were already proven for the general case of constant with all-ones in low bits. So as long as the mask produces all-ones in low bits, i'm pretty sure the fold is valid. But required, i can re-prove, let me know. eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total. https://bugs.llvm.org/show_bug.cgi?id=38123 https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52146 llvm-svn: 342546	2018-09-19 13:35:27 +00:00
Roman Lebedev	1b7fc87020	[InstCombine] Inefficient pattern for high-bits checking 3 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) The last (as far i know?) pattern, non-canonical due to the extra use. https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52062 llvm-svn: 342321	2018-09-15 12:04:13 +00:00
Roman Lebedev	6dc87004fa	[InstCombine] Inefficient pattern for high-bits checking 2 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) More complicated, canonical pattern: https://rise4fun.com/Alive/uhA We do need to have two `switch()`'es like this, to not mismatch the swappable predicates. https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52001 llvm-svn: 342173	2018-09-13 20:33:12 +00:00
Roman Lebedev	75404fb9f8	[InstCombine] Inefficient pattern for high-bits checking (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only `n` bits wide (where `n` is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) Let's handle the second variant first, since it is much simpler. https://rise4fun.com/Alive/LYjY https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51985 llvm-svn: 342067	2018-09-12 18:19:43 +00:00

1 2 3 4 5 ...

572 Commits