llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	46bf3922c1	[InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2 llvm-svn: 346242	2018-11-06 16:45:27 +00:00
Sanjay Patel	7c3ee4da42	[InstCombine] rearrange code for fcmp+fpext; NFCI llvm-svn: 346241	2018-11-06 16:37:35 +00:00
Sanjay Patel	1b85f00201	[InstCombine] propagate fast-math-flags when folding fcmp+fpext llvm-svn: 346240	2018-11-06 16:23:03 +00:00
Sanjay Patel	2fd5b0ebfb	[InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2 llvm-svn: 346238	2018-11-06 15:58:57 +00:00
Sanjay Patel	05e70fb978	[InstCombine] reduce code; NFC llvm-svn: 346235	2018-11-06 15:53:58 +00:00
Sanjay Patel	70282a0501	[InstCombine] propagate fast-math-flags when folding fcmp+fneg This is another part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 This might be enough to fix that particular issue, but as noted with the FIXME, we're still dropping FMF on other folds around here. llvm-svn: 346234	2018-11-06 15:49:45 +00:00
Simon Pilgrim	c1da5f757e	[InstCombine] Ensure nested shifts are in range (OSS-Fuzz #9880 ) llvm-svn: 346225	2018-11-06 11:28:22 +00:00
Sanjay Patel	1440107821	[InstSimplify] fold select (fcmp X, Y), X, Y This is NFCI for InstCombine because it calls InstSimplify, so I left the tests for this transform there. As noted in the code comment, we can allow this fold more often by using FMF and/or value tracking. llvm-svn: 346169	2018-11-05 21:51:39 +00:00
Sanjay Patel	c26fd1e772	[InstCombine] canonicalize -0.0 to +0.0 in fcmp As stated in IEEE-754 and discussed in: https://bugs.llvm.org/show_bug.cgi?id=38086 ...the sign of zero does not affect any FP compare predicate. Known regressions were fixed with: rL346097 (D54001) rL346143 The transform will help reduce pattern-matching complexity to solve: https://bugs.llvm.org/show_bug.cgi?id=39475 ...as well as improve CSE and codegen (a zero constant is almost always easier to produce than 0x80..00). llvm-svn: 346147	2018-11-05 17:26:42 +00:00
Sanjay Patel	87aa10062c	[InstCombine] loosen FP 0.0 constraint for fcmp+select substitution It looks like we correctly removed edge cases with 0.0 from D50714, but we were a bit conservative because getBinOpIdentity() doesn't distinguish between +0.0 and -0.0 and 'nsz' is effectively always true for fcmp (see discussion in: https://bugs.llvm.org/show_bug.cgi?id=38086 Without this change, we would get regressions by canonicalizing to +0.0 in all fcmp, and that's a step towards solving: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346143	2018-11-05 16:50:44 +00:00
Volkan Keles	3ca146d083	[InstCombine] Combine nested min/max intrinsics with constants Reviewers: arsenm, spatel Reviewed By: spatel Subscribers: lebedev.ri, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53774 llvm-svn: 345751	2018-10-31 17:50:52 +00:00
Sanjay Patel	1c254c6716	[InstCombine] refactor fabs+fcmp fold; NFC Also, remove/replace/minimize/enhance the tests for this fold. The code drops FMF, so it needs more tests and at least 1 fix. llvm-svn: 345734	2018-10-31 16:34:43 +00:00
Sanjay Patel	b9fe3fbb57	[InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC The 'OLT' case was updated at rL266175, so I assume it was just an oversight that 'UGE' was not included because that patch handled both predicates in InstSimplify. llvm-svn: 345727	2018-10-31 15:31:45 +00:00
Sanjay Patel	85cba3b6fb	[InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725	2018-10-31 14:57:23 +00:00
Sanjay Patel	4c39dfc91e	[InstCombine] use 'match' to reduce code; NFC llvm-svn: 345647	2018-10-30 20:52:25 +00:00
Quentin Colombet	900678227c	[InstCombine] Teach the move free before null test opti how to deal with noop casts InstCombine features an optimization that essentially replaces: if (a) free(a) into: free(a) Right now, this optimization is gated by the minsize attribute and therefore we only perform it if we can prove that we are going to be able to eliminate the branch and the destination block. However when casts are involved the optimization would fail to apply, because the optimization was not smart enough to realize that it is possible to also move the casts away from the destination block and that is harmless to the performance since they are just noops. E.g., foo(int a) if (a) free((char)a) Wouldn't be optimized by instcombine, because - We would refuse to hoist the `bitcast i32* %a to i8` in the source block - We would fail to see that `bitcast i32* %a to i8` and %a are the same value. This patch fixes both these problems: - It teaches the pattern matching of the comparison how to look through casts. - It checks that whether the additional instruction in the destination block can be hoisted and are harmless performance-wise. - It hoists all the code of the destination block in the source block. Differential Revision: D53356 llvm-svn: 345644	2018-10-30 20:51:04 +00:00
Sanjay Patel	68a61cb07c	[InstCombine] use getFltSemantics() instead of duplicating it; NFC llvm-svn: 345613	2018-10-30 16:21:56 +00:00
Sanjay Patel	b12e410082	[InstCombine] try to turn shuffle into insertelement shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC' The motivating case is at least a couple of steps away: I noticed that SLPVectorizer does not analyze shuffles as well as sequences of insert/extract in PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 ...so SLP may fail to vectorize when source code has shuffles to start with or instcombine has converted insert/extract to shuffles. Independent of that, an insertelement is always a simpler op for IR analysis vs. a shuffle, so we should transform to insert when possible. I don't think there's any codegen concern here - if a target can't insert a scalar directly to some fixed element in a vector (x86?), then this should get expanded to the insert+shuffle that we started with. Differential Revision: https://reviews.llvm.org/D53507 llvm-svn: 345607	2018-10-30 15:26:39 +00:00
Cameron McInally	384a74b0e6	[FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes Replacing BinaryOperator::isFNeg(...) to avoid regressions when we separate FNeg from the FSub IR instruction. Differential Revision: https://reviews.llvm.org/D53650 llvm-svn: 345295	2018-10-25 18:09:33 +00:00
Gabor Buella	1f6ca0ba15	Add -instcombine-code-sinking option Reviewers: craig.topper, andrew.w.kaylor, efriedma Reviewed By: craig.topper, andrew.w.kaylor, efriedma Differential Revision: https://reviews.llvm.org/D52709 llvm-svn: 345248	2018-10-25 08:32:29 +00:00
Sanjay Patel	3b206305fd	[InstCombine] try harder to form select from logic ops (2nd try) The original patch was committed here: rL344609 ...and reverted: rL344612 ...because it did not properly check/test data types before calling ComputeNumSignBits(). The tests that caused bot failures for the previous commit are over-reaching front-end tests that run the entire -O optimizer pipeline: Clang :: CodeGen/builtins-systemz-zvector.c Clang :: CodeGen/builtins-systemz-zvector2.c I've added a negative test here to ensure coverage for that case. The new early exit check also tests the type of the 'B' parameter, so we don't waste time on matching if either value is unsuitable. Original commit message: This is part of solving PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 The patterns shown here are a special case of something that we already convert to select. Using ComputeNumSignBits() catches that case (but not the more complicated motivating patterns yet). The backend has hooks/logic to convert back to logic ops if that's better for the target. llvm-svn: 345149	2018-10-24 15:17:56 +00:00
Sanjay Patel	95790c546f	[InstCombine] use 'match' to simplify code There's probably some vector-with-undef-element pattern that shows an improvement, so this is probably not quite 'NFC'. This is the last step towards removing the fake binop queries for not/neg. Ie, there are no more uses of those functions in trunk. Fneg should follow. llvm-svn: 345050	2018-10-23 16:54:28 +00:00
Sanjay Patel	747feb28e4	[InstCombine] use 'match' to handle vectors and simplify code This is another step towards completely removing the fake binop queries for not/neg/fneg. llvm-svn: 345036	2018-10-23 15:05:12 +00:00
Sanjay Patel	ad76c682c7	[InstCombine] swap select profile metadata when swapping select ops llvm-svn: 345034	2018-10-23 14:43:31 +00:00
Sanjay Patel	0522b0da31	[InstCombine] use 'match' to simplify code; NFC llvm-svn: 344855	2018-10-20 17:15:57 +00:00
Sanjay Patel	ec572ade20	[InstCombine] make code more flexible with lambda; NFC I couldn't tell from svn history when these checks were added, but it pre-dates the split of instcombine into its own directory at rL92459. The motivation for changing the check is partly shown by the code in PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 There are also existing regression tests for SLPVectorizer with sequences of extract+insert that are likely assumed to become shuffles by the vectorizer cost models. llvm-svn: 344854	2018-10-20 16:58:27 +00:00
Sanjay Patel	729c4362cf	[InstCombine] add explanatory comment for strange vector logic; NFC llvm-svn: 344852	2018-10-20 16:25:55 +00:00
Thomas Lively	c339250e12	[InstCombine] InstCombine and InstSimplify for minimum and maximum Summary: Depends on D52765 Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52766 llvm-svn: 344799	2018-10-19 19:01:26 +00:00
Sanjay Patel	70daf85bc2	[InstCombine] use m_Neg() in dyn_castNegVal() to match vectors with undef elts llvm-svn: 344793	2018-10-19 17:54:53 +00:00
Fangrui Song	2e83b2e9ee	Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC llvm-svn: 344774	2018-10-19 06:12:02 +00:00
Mikael Holmen	e3605d0f70	Add a emitUnaryFloatFnCall version that fetches the function name from TLI Summary: In several places in the code we use the following pattern: if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) { [...] Value Res = emitUnaryFloatFnCall(X, TLI.getName(LibFunc_tan), B, Attrs); [...] } In short, we check if there is a lib-function for a certain type, and then we _always_ fetch the name of the "double" version of the lib function and construct a call to the appropriate function, that we just checked exists, using that "double" name as a basis. This is of course a problem in cases where the target doesn't support the "double" version, but e.g. only the "float" version. In that case TLI.getName(LibFunc_tan) returns "", and emitUnaryFloatFnCall happily appends an "f" to "", and we erroneously end up with a call to a function called "f". To solve this, the above pattern is changed to if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) { [...] Value Res = emitUnaryFloatFnCall(X, &TLI, LibFunc_tan, LibFunc_tanf, LibFunc_tanl, B, Attrs); [...] } I.e instead of first fetching the name of the "double" version and then letting emitUnaryFloatFnCall() add the final "f" or "l", we let emitUnaryFloatFnCall() fetch the right name from TLI. Reviewers: eli.friedman, efriedma Reviewed By: efriedma Subscribers: efriedma, bjope, llvm-commits Differential Revision: https://reviews.llvm.org/D53370 llvm-svn: 344725	2018-10-18 06:27:53 +00:00
Sanjay Patel	bb3dd34e62	revert rL344609: [InstCombine] try harder to form select from logic ops I noticed a missing check and added it at rL344610, but there actually are codegen tests that will fail without that, so I'll edit those and submit a fixed patch with more tests. llvm-svn: 344612	2018-10-16 15:26:08 +00:00
Sanjay Patel	f6a7c8b1fc	[InstCombine] make sure type is integer before calling ComputeNumSignBits llvm-svn: 344610	2018-10-16 14:44:50 +00:00
Sanjay Patel	0c48c977b8	[InstCombine] try harder to form select from logic ops This is part of solving PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 The patterns shown here are a special case of something that we already convert to select. Using ComputeNumSignBits() catches that case (but not the more complicated motivating patterns yet). The backend has hooks/logic to convert back to logic ops if that's better for the target. llvm-svn: 344609	2018-10-16 14:35:21 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Sanjay Patel	7181146c6c	[InstCombine] combine a shuffle and an extract subvector shuffle This is part of the missing IR-level folding noted in D52912. This should be ok as a canonicalization because the new shuffle mask can't be any more complicated than the existing shuffle mask. If there's some target where the shorter vector shuffle is not legal, it should just end up expanding to something like the pair of shuffles that we're starting with here. Differential Revision: https://reviews.llvm.org/D53037 llvm-svn: 344476	2018-10-14 15:25:06 +00:00
Sanjay Patel	47579b21e2	[InstCombine] fix complexity canonicalization with fake unary vector ops This is a preliminary step to avoid regressions when we add an actual 'fneg' instruction to IR. See D52934 and D53205. llvm-svn: 344458	2018-10-13 16:15:37 +00:00
Amara Emerson	54f60255a2	[InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it. InstCombine keeps a worklist and assumes that optimizations don't eraseFromParent() the instruction, which SimplifyLibCalls violates. This change adds a new callback to SimplifyLibCalls to let clients specify their own hander for erasing actions. Differential Revision: https://reviews.llvm.org/D52729 llvm-svn: 344251	2018-10-11 14:51:11 +00:00
David Green	8066198442	[InstCombine] Demand bits of UMin This is the umin alternative to the umax code from rL344237. We use DeMorgans law on the umax case to bring us to the same thing on umin, but using countLeadingOnes, not countLeadingZeros. Differential Revision: https://reviews.llvm.org/D53036 llvm-svn: 344239	2018-10-11 11:28:27 +00:00
David Green	30c0e98b9c	[InstCombine] Demand bits of UMax Use the demanded bits of umax(A,C) to prove we can just use A so long as the lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C Differential Revision: https://reviews.llvm.org/D53033 llvm-svn: 344237	2018-10-11 11:04:09 +00:00
Sanjay Patel	05aadf885d	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181	2018-10-10 20:47:46 +00:00
Sanjay Patel	58fc00d0bc	revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization This commit accidentally included the diffs from D53057. llvm-svn: 344178	2018-10-10 20:39:39 +00:00
Sanjay Patel	e9ca7ea3e5	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082	2018-10-09 21:26:01 +00:00
Sanjay Patel	88194dfe1a	[InstCombine] make helper function 'static'; NFC llvm-svn: 344056	2018-10-09 15:29:26 +00:00
Neil Henning	57f5d0a885	[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with. To fix this I've: - Added an ArrayRef<Type > member to both CreateIntrinsic overloads. - Used that array to pass into the Intrinsic::getDeclaration call. - Added a CreateUnaryIntrinsic to replace the most common use of CreateIntrinsic where the type was auto-deduced from operand 0. - Added a bunch more unit tests to test CreateIntrinsic calls that weren't being tested (including the FMF flag that wasn't checked). This was suggested as part of the AMDGPU specific atomic optimizer review (https://reviews.llvm.org/D51969). Differential Revision: https://reviews.llvm.org/D52087 llvm-svn: 343962	2018-10-08 10:32:33 +00:00
Ewan Crawford	fa120cbdbc	[InstCombine] Fix incongruous GEP type addrspace Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR. Input: ``` define void @insertelem_after_gep(<16 x i32>* %t0) { %t1 = bitcast <16 x i32>* %t0 to [16 x i32]* %t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)* %t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0 %t4 = insertelement <16 x i32 addrspace(3)> undef, i32 addrspace(3) %t3, i32 0 call void @extern_vec_pointers_func(<16 x i32 addrspace(3)> %t4) ret void } ``` Output: ``` define void @insertelem_after_gep(<16 x i32> %t0) { %t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0 %t4 = insertelement <16 x i32 addrspace(3)> undef, i32 addrspace(3) %t3, i32 0 call void @my_extern_func(<16 x i32 addrspace(3)> %t4) ret void } ``` Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space. ``` opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32' but expected 'i32 addrspace(3)' %t4 = insertelement <16 x i32 addrspace(3)> undef, i32 addrspace(3)* %t3, i32 0 ``` I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier. Reviewers: spatel, lebedev.ri Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52294 llvm-svn: 343956	2018-10-08 08:40:45 +00:00
Sanjay Patel	3436dc2923	[InstCombine] drop poison flags in SimplifyVectorDemandedElts We established the (unfortunately complicated) rules for UB/poison propagation with vector ops in: D48893 D48987 D49047 It's clear from the affected tests that we are potentially creating poison where none existed before the transforms. For add/sub/mul, the answer is simple: just drop the flags because the extra undef vector lanes are generally more valuable for analysis and codegen. llvm-svn: 343819	2018-10-04 21:36:50 +00:00
Sanjay Patel	9d6688e38d	[InstCombine] reduce code duplication in SimplifyDemandedVectorElts; NFCI llvm-svn: 343806	2018-10-04 19:12:07 +00:00
Sanjay Patel	3746e11abe	[InstCombine] allow bitcast to/from FP for vector insert/extract transform This is a follow-up to rL343482 / D52439. This was a pattern that initially caused the commit to be reverted because the transform requires a bitcast as shown here. llvm-svn: 343794	2018-10-04 16:25:05 +00:00
Sanjay Patel	cafdeb1aa6	[InstCombine] allow SimplifyDemandedVectorElts to work with FP binops We're a long way from D50992 and D51553, but this is where we have to start. We weren't back-propagating undefs into binop constant values for anything but add/sub/mul/and/or/xor. This is likely because we have to be careful about not introducing UB/poison with div/rem/shift. But I suspect we already are getting the poison part wrong for add/sub/mul (although it may not be possible to expose the bug currently because we use SimplifyDemandedVectorElts from a limited set of opcodes). See the discussion/implementation from D48987 and D49047. This patch just enables functionality for FP ops because those do not have UB/poison potential. llvm-svn: 343727	2018-10-03 21:44:59 +00:00
Sanjay Patel	306f14ceb8	[InstCombine] clean up foldVectorBinop(); NFC 1. Fix include ordering. 2. Improve variable name (width is bitwidth not number-of-elements). 3. Add local Opcode variable to reduce code duplication. llvm-svn: 343694	2018-10-03 15:46:03 +00:00
Sanjay Patel	79dceb2903	[InstCombine] name change: foldShuffledBinop -> foldVectorBinop; NFC This function will deal with more than shuffles with D50992, and I have another potential per-element fold that could live here. llvm-svn: 343692	2018-10-03 15:20:58 +00:00
David Green	1e44c3b62c	[InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A This is an attempt to get out of a local-minimum that instcombine currently gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at least neutral. This involves using IsFreeToInvert, which has been expanded a little to include selects that can be easily inverted. This is trying to fix PR35875, using the ideas from Sanjay. It is a large improvement to one of our rgb to cmy kernels. Differential Revision: https://reviews.llvm.org/D52177 llvm-svn: 343569	2018-10-02 09:48:34 +00:00
Jesper Antonsson	c954b86391	[InstCombine] Handle vector compares in foldGEPIcmp(), take 2 Summary: This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.) This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type. The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger. Reviewers: lebedev.ri, majnemer, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52494 llvm-svn: 343486	2018-10-01 14:59:25 +00:00
Sanjay Patel	31b07198f1	[InstCombine] try to convert vector insert+extract to trunc; 2nd try This was originally committed at rL343407, but reverted at rL343458 because it crashed trying to handle a case where the destination type is FP. This version of the patch adds a check for that possibility. Tests added at rL343480. Original commit message: This transform is requested for the backend in: https://bugs.llvm.org/show_bug.cgi?id=39016 ...but I figured it was worth doing in IR too, and it's probably easier to implement here, so that's this patch. In the simplest case, we are just truncating a scalar value. If the extract index doesn't correspond to the LSBs of the scalar, then we have to shift-right before the truncate. Endian-ness makes this tricky, but hopefully the ASCII-art helps visualize the transform. Differential Revision: https://reviews.llvm.org/D52439 llvm-svn: 343482	2018-10-01 14:40:00 +00:00
Hans Wennborg	a60aa91374	Revert r343407 "[InstCombine] try to convert vector insert+extract to trunc" This caused Chromium builds to fail with "Illegal Trunc" assertion. See https://crbug.com/890723 for repro. > This transform is requested for the backend in: > https://bugs.llvm.org/show_bug.cgi?id=39016 > ...but I figured it was worth doing in IR too, and it's probably > easier to implement here, so that's this patch. > > In the simplest case, we are just truncating a scalar value. If the > extract index doesn't correspond to the LSBs of the scalar, then we > have to shift-right before the truncate. Endian-ness makes this tricky, > but hopefully the ASCII-art helps visualize the transform. > > Differential Revision: https://reviews.llvm.org/D52439 llvm-svn: 343458	2018-10-01 12:07:45 +00:00
Sanjay Patel	1e0f1f645a	[InstCombine] try to convert vector insert+extract to trunc This transform is requested for the backend in: https://bugs.llvm.org/show_bug.cgi?id=39016 ...but I figured it was worth doing in IR too, and it's probably easier to implement here, so that's this patch. In the simplest case, we are just truncating a scalar value. If the extract index doesn't correspond to the LSBs of the scalar, then we have to shift-right before the truncate. Endian-ness makes this tricky, but hopefully the ASCII-art helps visualize the transform. Differential Revision: https://reviews.llvm.org/D52439 llvm-svn: 343407	2018-09-30 14:34:01 +00:00
Sanjay Patel	26c119a9c2	[InstCombine] allow lengthening of insertelement to eliminate shuffles As noted in post-commit comments for D52548, the limitation on increasing vector length can be applied by opcode. As a first step, this patch only allows insertelement to be widened because that has no logical downsides for IR and has little risk of pessimizing codegen. This may cause PR39132 to go into hiding during a full compile, but that bug is not fixed. llvm-svn: 343406	2018-09-30 13:50:42 +00:00
Sanjay Patel	54d31ef87e	[InstCombine] fix formatting in vector evaluators; NFC We need to alter the functionality as shown in D52548. llvm-svn: 343379	2018-09-29 15:05:24 +00:00
Sanjay Patel	242f90fe82	[InstCombine] don't propagate wider shufflevector arguments to predecessors InstCombine would propagate shufflevector insts that had wider output vectors onto predecessors, which would sometimes push undef's onto the divisor of a div/rem and result in bad codegen. I've fixed this by just banning propagating shufflevector back if the result of the shufflevector is wider than the input vectors. Patch by: @sheredom (Neil Henning) Differential Revision: https://reviews.llvm.org/D52548 llvm-svn: 343329	2018-09-28 15:24:41 +00:00
Sanjay Patel	c3f50ff92e	[InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0) When C is not zero and infinites are not allowed (C / X) > 0 is a sign test. Depending on the sign of C, the predicate must be swapped. E.g.: foo(double X) { if ((-2.0 / X) <= 0) ... } => foo(double X) { if (X >= 0) ... } Patch by: @marels (Martin Elshuber) Differential Revision: https://reviews.llvm.org/D51942 llvm-svn: 343228	2018-09-27 15:59:24 +00:00
Sanjay Patel	69ed4710b8	[InstCombine] narrow binops on concatenated vectors (PR33026) The motivating case from: https://bugs.llvm.org/show_bug.cgi?id=33026 ...has no shuffles now. This kind of pattern may occur during vectorization when targets have lumpy ISAs like SSE/AVX. llvm-svn: 342988	2018-09-25 15:57:37 +00:00
Sanjay Patel	4674c7765d	[InstCombine] add bitcast+extelt helper function; NFC We can handle patterns where the elements have different sizes, so refactoring ahead of trying to add another blob within these clauses. llvm-svn: 342918	2018-09-24 20:41:22 +00:00
Sanjay Patel	7a52626a08	[InstCombine] improve variable name and use 'match'; NFC 'width' of a vector usually refers to the bit-width. https://bugs.llvm.org/show_bug.cgi?id=39016 shows a case where we could extend this fold to handle a case where the number of elements in the bitcasted vector is not equal to the resulting value. llvm-svn: 342902	2018-09-24 16:39:03 +00:00
Sanjay Patel	09e02fbf51	[InstCombine][x86] try even harder to convert blendv intrinsic to generic IR (PR38814) Follow-up to rL342324 (D52059): Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 This is an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. llvm-svn: 342806	2018-09-22 14:43:55 +00:00
Craig Topper	2b3f5df73a	[InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible Summary: This restores the combine that was reverted in r341883. The infinite loop from the failing test no longer occurs due to changes from r342163. Reviewers: spatel, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52070 llvm-svn: 342797	2018-09-22 05:53:27 +00:00
Jesper Antonsson	719fa055d0	[InstCombine] Handle vector compares in foldGEPIcmp() Summary: This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding": https://bugs.llvm.org/show_bug.cgi?id=38984 Reviewers: majnemer, spatel, lattner, lebedev.ri Reviewed By: lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52263 llvm-svn: 342647	2018-09-20 13:37:28 +00:00
Roman Lebedev	f50023d37c	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask Summary: The last low-bit-mask-pattern-producing-pattern i can think of. https://rise4fun.com/Alive/UGzE <- non-canonical But we can not canonicalize it because of extra uses. https://bugs.llvm.org/show_bug.cgi?id=38123 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52148 llvm-svn: 342548	2018-09-19 13:35:46 +00:00
Roman Lebedev	ca2bdb03d6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask Summary: Same as to D52146. `((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl We can not canonicalize it due to the extra uses. But we can handle it here. Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52147 llvm-svn: 342547	2018-09-19 13:35:40 +00:00
Roman Lebedev	183a465dc6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask Summary: Two folds are happening here: 1. https://rise4fun.com/Alive/oaFX 2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4 This change doesn't just add the handling for eq/ne predicates, it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work, so all the 16 fold variants* are immediately supported. I'm indeed only testing these two predicates. I do not feel like re-proving all 16 folds, because they were already proven for the general case of constant with all-ones in low bits. So as long as the mask produces all-ones in low bits, i'm pretty sure the fold is valid. But required, i can re-prove, let me know. eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total. https://bugs.llvm.org/show_bug.cgi?id=38123 https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52146 llvm-svn: 342546	2018-09-19 13:35:27 +00:00
Craig Topper	2da7381678	[InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y)) Summary: If the sub doesn't overflow in the original type we can move it above the sext/zext. This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52075 llvm-svn: 342335	2018-09-15 18:54:10 +00:00
Sanjay Patel	296d35a5e9	[InstCombine][x86] try harder to convert blendv intrinsic to generic IR (PR38814) Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 If this works, it's an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. I don't think that's too likely, but I've kept this patch minimal with a 'TODO', so we can test that theory in the wild before expanding the transform. Differential Revision: https://reviews.llvm.org/D52059 llvm-svn: 342324	2018-09-15 14:25:44 +00:00
Roman Lebedev	1b7fc87020	[InstCombine] Inefficient pattern for high-bits checking 3 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) The last (as far i know?) pattern, non-canonical due to the extra use. https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52062 llvm-svn: 342321	2018-09-15 12:04:13 +00:00
Sanjay Patel	90a36346bc	[InstCombine] refactor mul narrowing folds; NFCI Similar to rL342278: The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. D52075 should be able to use this code too rather than duplicating all of the logic. llvm-svn: 342292	2018-09-14 22:23:35 +00:00
Sanjay Patel	46945b9e9d	[InstCombine] add/use overflowing math helper functions; NFC The mul case can already be refactored to use this similar to rL342278. The sub case is proposed in D52075. llvm-svn: 342289	2018-09-14 21:30:07 +00:00
Sanjay Patel	2426eb46dd	[InstCombine] refactor add narrowing folds; NFCI The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. llvm-svn: 342278	2018-09-14 20:40:46 +00:00
Roman Lebedev	6dc87004fa	[InstCombine] Inefficient pattern for high-bits checking 2 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) More complicated, canonical pattern: https://rise4fun.com/Alive/uhA We do need to have two `switch()`'es like this, to not mismatch the swappable predicates. https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52001 llvm-svn: 342173	2018-09-13 20:33:12 +00:00
Craig Topper	8fc05ce340	[InstCombine] Fold (xor (min/max X, Y), -1) -> (max/min ~X, ~Y) when X and Y are freely invertible. This allows the xor to be removed completely. This might help with recomitting r341674, but seems good regardless. Coincidentally fixes PR38915. Differential Revision: https://reviews.llvm.org/D51964 llvm-svn: 342163	2018-09-13 18:52:58 +00:00
Sanjay Patel	37e464876b	[InstCombine] remove checks for IsFreeToInvert() I accidentally committed this diff with rL342147 because I had applied D51964. We probably do need those checks, but D51964 has tests and more discussion/motivation, so they should be re-added with that patch. llvm-svn: 342149	2018-09-13 16:18:12 +00:00
Sanjay Patel	6f00fc3317	[InstCombine] reorder folds to reduce chance of infinite loops I don't have a test case for this, but it's motivated by the discussion in D51964, and I've added TODO comments for the better fix - move simplifications into instsimplify because that's more efficient and reduces risk of infinite loops in instcombine caused by transforms trying to do the opposite folds. In this case, we know that the transform that tries to move 'not' through min/max can be fooled by the multiple uses of a value in another min/max, so try to squash the foldSPFofSPF() patterns first. llvm-svn: 342147	2018-09-13 16:04:06 +00:00
Roman Lebedev	75404fb9f8	[InstCombine] Inefficient pattern for high-bits checking (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only `n` bits wide (where `n` is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) Let's handle the second variant first, since it is much simpler. https://rise4fun.com/Alive/LYjY https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51985 llvm-svn: 342067	2018-09-12 18:19:43 +00:00
Sanjay Patel	1cf0734b2f	[InstCombine] add folds for unsigned-overflow compares Name: op_ugt_sum %a = add i8 %x, %y %r = icmp ugt i8 %x, %a => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx Name: sum_ult_op %a = add i8 %x, %y %r = icmp ult i8 %a, %x => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx https://rise4fun.com/Alive/ZRxI AFAICT, this doesn't interfere with any add-saturation patterns because those have >1 use for the 'add'. But this should be better for IR analysis and codegen in the basic cases. This is another fold inspired by PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 342004	2018-09-11 22:40:20 +00:00
Sanjay Patel	26725bdc50	[InstCombine] add folds for icmp with xor mask constant These are the folds in Alive; Name: xor_ult Pre: isPowerOf2(-C1) %xor = xor i8 %x, C1 %r = icmp ult i8 %xor, C1 => %r = icmp ugt i8 %x, ~C1 Name: xor_ugt Pre: isPowerOf2(C1+1) %xor = xor i8 %x, C1 %r = icmp ugt i8 %xor, C1 => %r = icmp ugt i8 %x, C1 https://rise4fun.com/Alive/Vty The ugt case in its simplest form was already handled by DemandedBits, but that's not ideal as shown in the multi-use test. I'm not sure if these are all of the symmetrical folds, but I adjusted the existing code for one of the folds to try to show the similarities. There's no obvious connection, but this is another preliminary step for PR14613... https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 341997	2018-09-11 22:00:15 +00:00
Sanjay Patel	342c3bcf11	[InstCombine] enhance vector demanded elements to look at a vector select condition operand I noticed that we were not back-propagating undef lanes to shuffle masks when we have a shuffle that reduces the vector width. This is part of investigating/solving PR38691: https://bugs.llvm.org/show_bug.cgi?id=38691 The DAG equivalent was proposed with: D51696 Differential Revision: https://reviews.llvm.org/D51433 llvm-svn: 341981	2018-09-11 18:49:00 +00:00
Craig Topper	4e63db8387	[InstCombine] Fix incorrect usage of getPrimitiveSizeInBits when we should be using the element size for vectors For vectors, getPrimitiveSizeInBits returns the full vector width. This code should using the element size for vectors. This could be fixed by calling getScalarSizeInBits, but its even easier to just get it from the APInt we're checking. Differential Revision: https://reviews.llvm.org/D51938 llvm-svn: 341971	2018-09-11 17:57:20 +00:00
Craig Topper	12fd6bd4ad	[InstCombine] Use dyn_cast instead of match(m_Constant). NFC llvm-svn: 341962	2018-09-11 16:51:26 +00:00
Craig Topper	a57bb61a3e	[InstCombine] Support (mul (sext x), cst) --> (sext (mul x, cst')) and (mul (zext x), cst) --> (zext (mul x, cst')) for vectors constants. Similar to D51236, but for mul instead of add. Differential Revision: https://reviews.llvm.org/D51900 llvm-svn: 341961	2018-09-11 16:51:24 +00:00
Alina Sbirlea	116caa2920	[InstCombine] Partially revert rL341674 due to PR38897. Summary: Revert min/max changes in rL341674 dues to high compile times causing timeouts (PR38897). Checking in to unblock failing builds. Patch available for post-commit review and re-revert once resolved. Working on a smaller reproducer for PR38897. Reviewers: craig.topper, spatel Subscribers: sanjoy, jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D51897 llvm-svn: 341883	2018-09-10 23:47:21 +00:00
Sanjay Patel	691d1a40e2	[InstCombine] use SelectInst operand names to make code clearer; NFC Cleanup step for D51433. llvm-svn: 341850	2018-09-10 18:37:59 +00:00
Tim Northover	12c1f7675f	InstCombine: move hasOneUse check to the top of foldICmpAddConstant There were two combines not covered by the check before now, neither of which actually differed from normal in the benefit analysis. The most recent seems to be because it was just added at the top of the function (naturally). The older is from way back in 2008 (r46687) when we just didn't put those checks in so routinely, and has been diligently maintained since. llvm-svn: 341831	2018-09-10 14:26:44 +00:00
Sanjay Patel	c1416b60f2	[InstCombine] narrow vector select with padded condition and extracted result (PR38691) shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) --> sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask) The motivating case from: https://bugs.llvm.org/show_bug.cgi?id=38691 ...is the last regression test. In that case, we're just left with the narrow select. Note that if we do create new shuffles, they use the existing extraction identity mask, so there's no danger that this transform creates arbitrary shuffles. Differential Revision: https://reviews.llvm.org/D51496 llvm-svn: 341708	2018-09-07 21:03:34 +00:00
Craig Topper	040c2b0acf	[InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible If the ~X wasn't able to simplify above the max/min, we might be able to simplify it by moving it below the max/min. I had to modify the ~(min/max ~X, Y) transform to prevent getting stuck in a loop when we saw the new ~(max/min X, ~Y) before the ~Y had been folded away to remove the new not. Differential Revision: https://reviews.llvm.org/D51398 llvm-svn: 341674	2018-09-07 16:19:50 +00:00
Florian Hahn	e32ff4b28a	[InstCombine] Do not fold scalar ops over select with vector condition. If OtherOpT or OtherOpF have scalar types and the condition is a vector, we would create an invalid select. Reviewers: spatel, john.brawn, mssimpso, craig.topper Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D51781 llvm-svn: 341666	2018-09-07 14:40:06 +00:00
Sanjay Patel	93bd15a005	[InstCombine] add xor+not folds This fold is needed to avoid a regression when we try to recommit rL300977. We can't see the most basic win currently because demanded bits changes the patterns: https://rise4fun.com/Alive/plpp llvm-svn: 341559	2018-09-06 16:23:40 +00:00
Sanjay Patel	1a00ffd656	[InstCombine] fix formatting in SimplifyDemandedVectorElts->Select; NFCI I'm preparing to add the same functionality both here and to the DAG version of this code in D51696 / D51433, so try to make those cases as similar as possible to avoid bugs. llvm-svn: 341545	2018-09-06 13:19:22 +00:00
Sanjay Patel	63cf26cf01	[InstCombine] fix xor-or-xor fold to check uses and handle commutes I'm probably missing some way to use m_Deferred to remove the code duplication, but that can be a follow-up. The improvement in demand_shrink_nsw.ll is an example of missing the fold because the pattern matching was deficient. I didn't try to follow the bits in that test, but Alive says it's correct: https://rise4fun.com/Alive/ugc llvm-svn: 341426	2018-09-04 23:22:13 +00:00
Sanjay Patel	0f70f86ce0	[InstCombine] make ((X & C) ^ C) form consistent for vectors It would be better to create a 'not' here, but that's not possible yet. llvm-svn: 341410	2018-09-04 21:17:14 +00:00
Sanjay Patel	a89f183253	[InstCombine] simplify code for xor folds; NFCI This is just a cleanup step. The TODO comments show what is wrong with the 'and' version of the fold. Fixing this should be part of recommitting: rL300977 llvm-svn: 341405	2018-09-04 21:00:13 +00:00
Nicola Zaghen	9588ad9611	[InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2) Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares. This is the Alive proof: https://rise4fun.com/Alive/nyY Differential Revision: https://reviews.llvm.org/D50972 llvm-svn: 341353	2018-09-04 10:29:48 +00:00
Sanjay Patel	2fe1f62c88	[InstCombine] simplify xor/not folds; NFCI llvm-svn: 341336	2018-09-03 18:40:56 +00:00

1 2 3 4 5 ...

3136 Commits