llvm-project

Commit Graph

Author	SHA1	Message	Date
JF Bastien	8fc9eea43a	Test that volatile load type isn't changed Summary: As discussed in D75505, it's not particularly useful to change the type of a load to/from floating-point/integer because it's followed by a bitcast, and it might lead to surprising code generation. Check that this doesn't generally happen. Reviewers: lebedev.ri Subscribers: jkorous, dexonsmith, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75644	2020-03-09 11:19:23 -07:00
Nikita Popov	45555c3819	[InstSimplify] Simplify calls with "returned" attribute If a call argument has the "returned" attribute, we can simplify the call to the value of that argument. The "-inst-simplify" pass already handled this for the constant integer argument case via known bits, which is invoked in SimplifyInstruction. However, non-constant (or non-int) arguments are not handled at all right now. This addresses one of the regressions from D75801. Differential Revision: https://reviews.llvm.org/D75815	2020-03-09 18:53:47 +01:00
Nikita Popov	c3ca6876ed	[InstCombine] Don't simplify calls without uses When simplifying a call without uses, replaceInstUsesWith() is going to do nothing, but we'll skip all following folds. We can only run into this problem with calls that both simplify and are not trivially dead if unused, which currently seems to happen only with calls to undef, as the test diff shows. When extending SimplifyCall() to handle "returned" attributes, this becomes a much bigger problem, so I'm fixing this first. Differential Revision: https://reviews.llvm.org/D75814	2020-03-09 18:47:46 +01:00
Nikita Popov	51a466a61f	[InstCombine] Fix known bits handling in SimplifyDemandedUseBits Fixes a regression from D75801. SimplifyDemandedUseBits() is also supposed to compute the known bits (of the demanded subset) of the instruction. For unknown instructions it does so by directly calling computeKnownBits(). For known instructions it will compute known bits itself. However, for instructions where only some cases are handled directly (e.g. a constant shift amount) the known bits invocation for the unhandled case is sometimes missing. This patch adds the missing calls and thus removes the main discrepancy with ExpensiveCombines mode. Differential Revision: https://reviews.llvm.org/D75804	2020-03-07 18:16:41 +01:00
Nikita Popov	f2419adc48	[InstCombine] Regenerate test checks; NFC	2020-03-07 17:58:33 +01:00
Nikita Popov	2904a332fe	[InstCombine] Add additional known bits folding tests; NFC	2020-03-07 17:17:03 +01:00
Nikita Popov	4cfb4afb70	[InstCombine] Highlight tests using expensive combines; NFC	2020-03-07 17:16:47 +01:00
Sanjay Patel	89fdee87f7	[InstCombine] regenerate complete test checks; NFC	2020-03-07 10:20:38 -05:00
Sanjay Patel	564f5eed1a	[InstCombine] add test for gep (select),... (PR45084); NFC	2020-03-07 10:00:31 -05:00
Roman Lebedev	1badf7c33a	[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org/z/G24anE This resolves phase-ordering bug that was introduced in D75145 for https://godbolt.org/z/2gBwF2 https://godbolt.org/z/XvgSua Reviewers: spatel, nikic, dmgreen, xbolva00 Reviewed By: nikic, xbolva00 Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75757	2020-03-06 21:39:07 +03:00
Roman Lebedev	69ec84f8e7	[NFC][InstCombine] Add 'x - (x & y)' tests with multi-use 'and' If %y is constant, we could still perform the fold	2020-03-06 19:41:19 +03:00
Nikita Popov	9b5de84e27	[InstCombine] Use IRBuilder to create bitcast This makes sure that the constant expression bitcast goes through target-dependent constant folding, and thus avoids an additional iteration of InstCombine.	2020-03-04 18:28:38 +01:00
Roman Lebedev	9e1443e6f6	[NFC][InstCombine] Add test with non-CSE'd casts of load in @t0 we can still change type of load and get rid of casts.	2020-03-03 11:27:27 +03:00
Reid Kleckner	1adbe86d87	[WinEH] Fix inttoptr+phi optimization in presence of catchswitch getFirstInsertionPt's return value must be checked for validity before casting it to Instruction*. Don't attempt to insert casts after a phi in a catchswitch block. Fixes PR45033, introduced in D37832. Reviewed By: davidxl, hfinkel Differential Revision: https://reviews.llvm.org/D75381	2020-03-01 07:49:28 -08:00
Simon Pilgrim	7e9747b50b	[X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554) This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default. Differential Revision: https://reviews.llvm.org/D75162	2020-02-29 18:57:35 +00:00
Nikita Popov	4ef272ec9c	[InstCombine] DCE instructions earlier When InstCombine initially populates the worklist, it already performs constant folding and DCE. However, as the instructions are initially visited in program order, this DCE can pick up only the last instruction of a dead chain, the rest would only get picked up in the main InstCombine run. To avoid this, we instead perform the DCE in separate pass over the collected instructions in reverse order, which will allow us to pick up full dead instruction chains. We already need to do this reverse iteration anyway to populate the worklist, so this shouldn't add extra cost. This by itself only fixes a small part of the problem though: The same basic issue also applies during the main InstCombine loop. We generally always want DCE to occur as early as possible, because it will allow one-use folds to happen. Address this by also performing DCE while adding deferred instructions to the main worklist. This drops the number of tests that perform more than 2 InstCombine iterations from ~80 to ~40. There's some spurious test changes due to operand order / icmp toggling. Differential Revision: https://reviews.llvm.org/D75008	2020-02-27 18:45:59 +01:00
Simon Moll	ddd11273d9	Remove BinaryOperator::CreateFNeg Use UnaryOperator::CreateFNeg instead. Summary: With the introduction of the native fneg instruction, the fsub -0.0, %x idiom is obsolete. This patch makes LLVM emit fneg instead of the idiom in all places. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D75130	2020-02-27 09:06:03 -08:00
Simon Pilgrim	080890a9f3	[InstCombine] Add PR14365 test cases + vector equivalents.	2020-02-27 15:54:14 +00:00
Nikita Popov	56f7de5baa	[InstCombine] Remove trivially empty ranges from end InstCombine removes pairs of start+end intrinsics that don't have anything in between them. Currently this is done by starting at the start intrinsic and scanning forwards. This patch changes it to start at the end intrinsic and scan backwards. The motivation here is as follows: When we process the start intrinsic, we have not yet looked at the following instructions, which may still get folded/removed. If they do, we will only be able to remove the start/end pair on the next iteration. When we process the end intrinsic, all the instructions before it have already been visited, and we don't run into this problem. Differential Revision: https://reviews.llvm.org/D75011	2020-02-26 20:04:11 +01:00
Roman Lebedev	2855c8fed9	[InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): fix miscompile (PR44802) Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(), as input, we have the following pattern: icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0 We want to rewrite that as: icmp eq/ne (and (x shift (Q+K)), y), 0 iff (Q+K) u< bitwidth(x) While we know that originally (Q+K) would not overflow (because 2 * (N-1) u<= iN -1), we may have looked past extensions of shift amounts. so it may now overflow in smaller bitwidth. To ensure that does not happen, we need to ensure that the total maximal shift amount is still representable in that smaller bitwidth. If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus. https://bugs.llvm.org/show_bug.cgi?id=44802	2020-02-25 18:23:58 +03:00
Roman Lebedev	6f807ca00d	[NFC][InstCombine] Add shift amount reassociation in bittest miscompile example from PR44802 https://bugs.llvm.org/show_bug.cgi?id=44802	2020-02-25 18:23:58 +03:00
Roman Lebedev	781d077afb	[InstCombine] reassociateShiftAmtsOfTwoSameDirectionShifts(): fix miscompile (PR44802) As input, we have the following pattern: Sh0 (Sh1 X, Q), K We want to rewrite that as: Sh x, (Q+K) iff (Q+K) u< bitwidth(x) While we know that originally (Q+K) would not overflow (because 2 * (N-1) u<= iN -1), we may have looked past extensions of shift amounts. so it may now overflow in smaller bitwidth. To ensure that does not happen, we need to ensure that the total maximal shift amount is still representable in that smaller bitwidth. If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus. https://bugs.llvm.org/show_bug.cgi?id=44802	2020-02-25 18:23:51 +03:00
Roman Lebedev	425ef99938	[NFC][InstCombine] Add shift amount reassociation miscompile example from PR44802 https://bugs.llvm.org/show_bug.cgi?id=44802	2020-02-25 18:23:16 +03:00
Nuno Lopes	98ac6e7696	[NFC] fix test nan value	2020-02-23 12:42:47 +00:00
Krzysztof Parzyszek	d2b7c09e79	[Hexagon] Simplify intrinsic (vandvrt (vandqrt q b) m) -> q if possible When each byte in b&m is non-zero, this conversion Q->V->Q is a no-op.	2020-02-21 13:56:04 -06:00
Nikita Popov	b178555318	[InstCombine] Improve simplify demanded bits worklist management This fixes a small mistake from D72944: The worklist add should happen before assigning the new operand, not after. In case an actual replacement happens, the old operand needs to be added for DCE. If no actual replacement happens, then old/new are the same, so it doesn't matter. This drops one iteration from the annotated test case.	2020-02-21 18:51:41 +01:00
Nikita Popov	a8db806d52	[SimplifyLibCalls][IRBuilder] Accept any IRBuilder in SimplifyLibCalls This changes the SimplifyLibCalls utility to accept an IRBuilderBase, which allows us to pass through the IRBuilder used by InstCombine. This will ensure that new instructions get added to the worklist. The annotated test-case drops from 4 to 2 InstCombine iterations thanks to this. To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard, which is basically the same as the existing InsertPointGuard and FastMathFlagsGuard, but for operand bundles. Also add a setDefaultOperandBundles() method so these can be set outside the constructor. Differential Revision: https://reviews.llvm.org/D74792	2020-02-21 18:26:05 +01:00
Nikita Popov	f6875c434e	Reapply [IRBuilder] Always respect inserter/folder Some IRBuilder methods that were originally defined on IRBuilderBase do not respect custom IRBuilder inserters/folders, because those were not accessible prior to D73835. Fix this by making use of existing (and now accessible) IRBuilder methods, which will handle inserters/folders correctly. There are some changes in OpenMP and Instrumentation tests, where bitcasts now get constant folded. I've also highlighted one InstCombine test which now finishes in two rather than three iterations, thanks to new instructions being inserted into the worklist. Differential Revision: https://reviews.llvm.org/D74787	2020-02-19 20:51:38 +01:00
Nikita Popov	b92b1701cd	Revert "[IRBuilder] Always respect inserter/folder" This reverts commit `f12fb2d99b`. I missed some changes in instrumentation test cases.	2020-02-19 17:51:55 +01:00
Nikita Popov	f12fb2d99b	[IRBuilder] Always respect inserter/folder Some IRBuilder methods that were originally defined on IRBuilderBase do not respect custom IRBuilder inserters/folders, because those were not accessible prior to D73835. Fix this by making use of existing (and now accessible) IRBuilder methods, which will handle inserters/folders correctly. There are some changes in OpenMP tests, where bitcasts now get constant folded. I've also highlighted one InstCombine test which now finishes in two rather than three iterations, thanks to new instructions being inserted into the worklist. Differential Revision: https://reviews.llvm.org/D74787	2020-02-19 17:44:43 +01:00
Nikita Popov	1ab37fad61	[InstCombine] Fix worklist management when simplifying demanded bits When simplifying demanded bits, we currently only report the instruction on which SimplifyDemandedBits was called as changed. However, this is a recursive call, and the actually modified instruction will usually be further up the chain. Additionally, all the intermediate instructions should also be revisited, as additional combines may be possible after the demanded bits simplification. We fix this by explicitly adding them back to the worklist. Differential Revision: https://reviews.llvm.org/D72944	2020-02-18 17:55:40 +01:00
Nikita Popov	c9540fe59b	[InstCombine] Fix multi-use handling in cttz transform The select-of-cttz transform can currently duplicate cttz intrinsics and zext/trunc ops. The cause is that it unnecessarily duplicates the intrinsic and the zext/trunc when setting the "undef_on_zero" flag to false. However, it's always legal to set the flag from true to false, so we can make this replacement even if there are extra users. Differential Revision: https://reviews.llvm.org/D74685	2020-02-18 17:55:00 +01:00
Nikita Popov	9adedd146d	[InstCombine] Relax preconditions for ashr+and+icmp fold (PR44754) Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have a fold that converts icmp (and (ashr X, C3), C2), C1 into icmp (and C2'), C1', but it imposed overly strict requirements on the transform. Relax this by checking that both C2 and C1 don't shift out bits (in a signed sense) when forming the new constants. Alive proofs (https://rise4fun.com/Alive/PTz0): Name: ashr_legal Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1 %a = ashr i16 %x, C3 %b = and i16 %a, C2 %c = icmp i16 %b, C1 => %d = and i16 %x, C2 << C3 %c = icmp i16 %d, C1 << C3 Name: ashr_shiftout_eq Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1 %a = ashr i16 %x, C3 %b = and i16 %a, C2 %c = icmp eq i16 %b, C1 => %c = false Note that >> corresponds to ashr here. The case of an equality comparison has some special handling in this transform, because it will form to a true/false result if the condition on the comparison constant it violated. Differential Revision: https://reviews.llvm.org/D74294	2020-02-18 17:49:46 +01:00
Nikita Popov	9bc6bc2d8c	[InstCombine] Add more tests for icmp+and+ashr; NFC	2020-02-18 17:47:48 +01:00
Florian Hahn	6c85e92bcf	[InstCombine] Simplify a umul overflow check to a != 0 && b != 0. This patch adds a simplification if an OR weakens the overflow condition for umul.with.overflow by treating any non-zero result as overflow. In that case, we overflow if both umul.with.overflow operands are != 0, as in that case the result can only be 0, iff the multiplication overflows. Code like this is generated by code using __builtin_mul_overflow with negative integer constants, e.g. bool test(unsigned long long v, unsigned long long *res) { return __builtin_mul_overflow(v, -4775807LL, res); } ``` ---------------------------------------- Name: D74141 %res = umul_overflow {i8, i1} %a, %b %mul = extractvalue {i8, i1} %res, 0 %overflow = extractvalue {i8, i1} %res, 1 %cmp = icmp ne %mul, 0 %ret = or i1 %overflow, %cmp ret i1 %ret => %t0 = icmp ne i8 %a, 0 %t1 = icmp ne i8 %b, 0 %ret = and i1 %t0, %t1 ret i1 %ret %res = umul_overflow {i8, i1} %a, %b %mul = extractvalue {i8, i1} %res, 0 %cmp = icmp ne %mul, 0 %overflow = extractvalue {i8, i1} %res, 1 Done: 1 Optimization is correct! ``` Reviewers: nikic, lebedev.ri, spatel, Bigcheese, dexonsmith, aemerson Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74141	2020-02-18 09:11:55 +01:00
Florian Hahn	b0866f61c1	[InstCombine] Precommit umul.with.overflow sign check test. Precommit tests for D74141.	2020-02-18 08:46:50 +01:00
Nikita Popov	6cdc36afb2	[InstCombine] Add multiuse tests for cttz transform; NFC These show incorrect duplication of instructions.	2020-02-16 15:52:09 +01:00
Yuanfang Chen	4ad7685258	Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""" This reverts commit `80a34ae311` with fixes. Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on MachineVerifierPass by default on X86 and the fact that inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll are not expected to generate functioning machine code, this would go down to `report_fatal_error` in MachineVerifierPass. Here passing `-verify-machineinstrs=0` to make the intent explicit.	2020-02-13 10:16:06 -08:00
Yuanfang Chen	17122ec10a	Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""" This reverts commit `bb51d24330`.	2020-02-13 10:08:05 -08:00
Yuanfang Chen	bb51d24330	Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""" This reverts commit `80a34ae311` with fixes. On bots llvm-clang-x86_64-expensive-checks-ubuntu and llvm-clang-x86_64-expensive-checks-debian only, llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little bit in the hope that LIT is the culprit since this change is not in the codepath these tests are testing. llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll	2020-02-13 10:02:53 -08:00
Yuanfang Chen	80a34ae311	Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"" This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c. There are issues to be investigated for polly bots and bots turning on EXPENSIVE_CHECKS.	2020-02-11 20:41:53 -08:00
Yuanfang Chen	8cedf0e299	Reland "[Support] make report_fatal_error `abort` instead of `exit`" Summary: Reland D67847 after D73742 is committed. Replace `sys::Process::Exit(1)` with `abort` in `report_fatal_error`. After this patch, for tools turning on `CrashRecoveryContext`, crash handler installed by `CrashRecoveryContext` is called unless they installed a non-returning handler using `llvm::install_fatal_error_handler` like `cc1_main` currently does. Reviewers: rnk, MaskRay, aganea, hans, espindola, jhenderson Subscribers: jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, rupprecht, jocewei, jsji, Jim, dmgreen, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74456	2020-02-11 18:20:40 -08:00
Sanjay Patel	62ce7e650a	[InstCombine] fix use check when canonicalizing abs/nabs We were checking for extra uses of the negated operand even if we were not going to create it as part of this canonicalization. This was showing up as a regression when we limit EarlyCSE as proposed in D74285.	2020-02-10 14:57:37 -05:00
Sanjay Patel	93073e52b1	[InstCombine] add tests for abs with extra use of operand; NFC	2020-02-10 14:57:37 -05:00
David Stenberg	982944525c	Revert "[InstCombine][DebugInfo] Fold constants wrapped in metadata" This reverts commit `b54a8ec1bc`. The commit triggered debug invariance (different output with/without -g). The patch seems to have exposed a pre-existing invariance problem in GlobalOpt, which I'll write a bug report for.	2020-02-10 17:58:33 +01:00
George Burgess IV	f8c9ceb1ce	[SimplifyLibCalls] Add __strlen_chk. Bionic has had `__strlen_chk` for a while. Optimizing that into a constant is quite profitable, when possible. Differential Revision: https://reviews.llvm.org/D74079	2020-02-08 11:51:00 -08:00
Nikita Popov	a148b9e990	[InstCombine] Fix infinite min/max canonicalization loop (PR44541) While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541, it does so in a more roundabout manner and there might be other loopholes to trigger the same issue. This is a more direct fix, that prevents the transform if the min/max is based on a non-canonical sub X, 0 instruction. Differential Revision: https://reviews.llvm.org/D73849	2020-02-08 20:42:17 +01:00
Nikita Popov	d4627b90a0	[InstCombine] Avoid modifying instructions in-place As discussed on D73919, this replaces a few cases where we were modifying multiple operands of instructions in-place with the creation of a new instruction, which we generally prefer nowadays. This tends to be more readable and less prone to worklist management bugs. Test changes are only superficial (instruction naming and order).	2020-02-08 17:05:56 +01:00
Nikita Popov	23db9724d0	[InstCombine] Fix infinite loop in min/max load/store bitcast combine (PR44835) Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform if it wouldn't actually do anything (apart from removing and reinserting the same instructions). Note that the test case doesn't loop on current master anymore, only on the LLVM 10 release branch. The issue is already mitigated on master due to worklist order fixes, but we should fix the root cause there as well. As a side note, we should probably assert in combineLoadToNewType() that it does not combine to the same type. Not doing this here, because this assertion would also be triggered in another place right now. Differential Revision: https://reviews.llvm.org/D74278	2020-02-08 16:55:22 +01:00
Florian Hahn	14ef87bda6	[ValueTracking] usub(a, b) cannot overflow if a >= b. If we know that a >= b (unsigned), usub.with.overflow(a, b) cannot overflow. Similarly, if b > a, the same expression overflows. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: nikic, Gerolf Differential Revision: https://reviews.llvm.org/D74066	2020-02-07 10:41:18 +00:00
Florian Hahn	89ca4b9ef2	[InstCombine] Precommit usub.with.overflow test for D74066.	2020-02-07 10:30:28 +00:00
Sanjay Patel	2a191cf850	[InstCombine] add more splat tests with undef elements; NFC	2020-02-04 09:13:08 -05:00
Sanjay Patel	5d04e008f7	[InstCombine] add splat tests with undef elements; NFC	2020-02-04 07:59:12 -05:00
Sanjay Patel	0cf0be993c	[InstCombine] fix operands of shouldChangeType() for casted phi transform This is a bug noted in the recent D72733 and seen in the similar transform just above the changed source code. I added tests with illegal types and zexts to show the bug - we could transform legal phi ops to illegal, etc. I did not add tests with trunc because we won't see any diffs on those patterns. That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to do those transforms independently of datalayout. It can also create more casts than are present in existing code. There are some existing regression tests that do not include a datalayout that would be altered by this fix. I assumed that the lack of a datalayout in those regression files is an oversight, so I added the minimal layout (make i32 legal) necessary to preserve behavior on those tests. Differential Revision: https://reviews.llvm.org/D73907	2020-02-04 07:45:48 -05:00
Sanjay Patel	b2e884bee7	[InstCombine] add tests for casted phi; NFC	2020-02-03 11:54:47 -05:00
Sanjay Patel	5c2e6207b7	[InstCombine] regenerate complete test checks; NFC	2020-02-03 10:30:26 -05:00
Sanjay Patel	e78fb556c5	[InstCombine] reassociate splatted vector ops bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp This patch depends on the splat analysis enhancement in D73549. See the test with comment: ; Negative test - mismatched splat elements ...as the motivation for that first patch. The motivating case for reassociating splatted ops is shown in PR42174: https://bugs.llvm.org/show_bug.cgi?id=42174 In that example, a slight change in order-of-associative math results in a big difference in IR and codegen. This patch gets all of the unnecessary shuffles out of the way, but doesn't address the potential scalarization (see D50992 or D73480 for that). Differential Revision: https://reviews.llvm.org/D73703	2020-02-03 09:08:36 -05:00
Simon Pilgrim	105e5c940c	[ValueTracking] Add DemandedElts support to computeKnownBits/ComputeNumSignBits (PR36319) This patch adds initial support for a DemandedElts mask to the internal computeKnownBits/ComputeNumSignBits methods, matching the SelectionDAG and GlobalISel equivalents. So far only a couple of instructions have been setup to handle the DemandedElts, the remainder still using the existing 'all elements' default. The plan is to extend support as we have test coverage. Differential Revision: https://reviews.llvm.org/D73435	2020-02-01 12:45:46 +00:00
Nikita Popov	ff17da3f75	[InstCombine] Push negation through multiply (PR44234) Fixes https://bugs.llvm.org/show_bug.cgi?id=44234 by adding multiply support to freelyNegateValue(). Only one of the operands needs to be negatible, so this still fits within the framework. Differential Revision: https://reviews.llvm.org/D73410	2020-01-31 20:58:55 +01:00
David Stenberg	b54a8ec1bc	[InstCombine][DebugInfo] Fold constants wrapped in metadata Summary: When constant folding, constants that are wrapped in metadata were not folded. This could lead to dbg.values being the only user of a constant expression, due to the non-dbg uses having been rewritten, resulting in the constant later on being removed by some other pass. This occurred with the attached test case, in which the non-rewritten GEP in the dbg.value intrinsic was later on removed by globalopt. This patch makes the code look through metadata and fold such constants. I guess that we in the future may want to allow dbg.values using GEPs and other constant expressions to be emittable even if there are no non-dbg uses, but for example SelectionDAG does not support that. Reviewers: jmorse, aprantl, vsk, davide Reviewed By: aprantl, vsk, davide Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D73630	2020-01-30 15:50:16 +01:00
Piotr Sobczak	dd7148822b	[InstCombine][AMDGPU] Trim components of s_buffer_load Summary: Add trimming of unused components of s_buffer_load. For s_buffer_load and unformatted buffer_load also trim unused components at the beginning of vector and update offset accordingly. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71785	2020-01-30 10:48:25 +01:00
Nikita Popov	8058196677	[InstCombine] Process newly inserted instructions in the correct order InstCombine operates on the basic premise that the operands of the currently processed instruction have already been simplified. It achieves this by pushing instructions to the worklist in reverse program order, so that instructions are popped off in program order. The worklist management in the main combining loop also makes sure to uphold this invariant. However, the same is not true for all the code that is performing manual worklist management. The largest problem (addressed in this patch) are instructions inserted by InstCombine's IRBuilder. These will be pushed onto the worklist in order of insertion (generally matching program order), which means that a) the users of the original instruction will be visited first, as they are pushed later in the main loop and b) the newly inserted instructions will be visited in reverse program order. This causes a number of problems: First, folds operate on instructions that have not had their operands simplified, which may result in optimizations being missed (ran into this in https://reviews.llvm.org/D72048#1800424, which was the original motivation for this patch). Additionally, this increases the amount of folds InstCombine has to perform, both within one iteration, and by increasing the number of total iterations. This patch addresses the issue by adding a Worklist.AddDeferred() method, which is used for instructions inserted by IRBuilder. These will only be added to the real worklist after the combine finished, and in reverse order, so they will end up processed in program order. I should note that the same should also be done to nearly all other uses of Worklist.Add(), but I'm starting with just this occurrence, which has by far the largest test fallout. Most of the test changes are due to https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where we don't canonicalize something. These are neutral. One regression has been addressed in D73575 and D73647. The remaining regression in an shl+sdiv fold can't really be fixed without dropping another transform, but does not seem particularly problematic in the first place. Differential Revision: https://reviews.llvm.org/D73411	2020-01-30 09:40:10 +01:00
Sanjay Patel	89195638bf	[InstCombine] add splat binop tests; NFC	2020-01-29 15:38:03 -05:00
Nikita Popov	e086e23024	[InstCombine] Support non-splat vectors in icmp eq + add/sub fold For the icmp eq (add X, C1), C2 => icmp eq X, C2-C1 icmp eq (sub C1, X), C2 => icmp eq X, C1-C2 folds, this allows C1 to be non-splat and contain undefs. C2 is still splat, due to the structure of the code. This is to address the remaining part of the regression in D73411, where demanded element analysis replaces some elements with undef. Differential Revision: https://reviews.llvm.org/D73647	2020-01-29 20:56:58 +01:00
Nikita Popov	5171587a5f	[InstCombine] Add undef/non-splat tests for add/sub + icmp eq; NFC	2020-01-29 20:56:58 +01:00
Nikita Popov	6a74641e72	[InstCombine] Regenerate test checks; NFC	2020-01-29 18:22:07 +01:00
Sanjay Patel	87f6314f8c	[InstCombine] canonicalize splat shuffle after cmp cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M As discussed in PR44588: https://bugs.llvm.org/show_bug.cgi?id=44588 ...we try harder to push shuffles after binops than after compares. This patch handles the special (but presumably most common case) of splat shuffles. If both operands are splats, then we can do the comparison on the non-splat inputs followed by splat of the compare. That should take care of the regression noted in D73411. There's another potential fold requested in PR37463 to scalarize the compare, but that's another patch (and it's not clear if we can do that without the ability to undo it later): https://bugs.llvm.org/show_bug.cgi?id=37463 Differential Revision: https://reviews.llvm.org/D73575	2020-01-29 08:34:29 -05:00
Sanjay Patel	276a6b8889	[InstCombine] add tests for cmp with splat operand and splat constant; NFC See PR44588: https://bugs.llvm.org/show_bug.cgi?id=44588	2020-01-28 13:42:20 -05:00
Sanjay Patel	747242af8d	[InstCombine] allow more narrowing of casted select D47163 created a rule that we should not change the casted type of a select when we have matching types in its compare condition. That was intended to help vector codegen, but it also could create situations where we miss subsequent folds as shown in PR44545: https://bugs.llvm.org/show_bug.cgi?id=44545 By using shouldChangeType(), we can continue to get the vector folds (because we always return false for vector types). But we also solve the motivating bug because it's ok to narrow the scalar select in that example. Our canonicalization rules around select are a mess, but AFAICT, this will not induce any infinite looping from the reverse transform (but we'll need to watch for that possibility if committed). Side note: there's a similar use of shouldChangeType() for phi ops just below this diff, and the source and destination types appear to be reversed. Differential Revision: https://reviews.llvm.org/D72733	2020-01-27 16:35:50 -05:00
Sanjay Patel	242fed9d7f	[InstCombine] convert fsub nsz with fneg operand to -(X + Y) This was noted in D72521 - we need to match fneg specifically to consistently handle that pattern along with (-0.0 - X).	2020-01-27 14:49:15 -05:00
Nikita Popov	bcfa0f592f	[InstCombine] Move negation handling into freelyNegateValue() Followup to D72978. This moves existing negation handling in InstCombine into freelyNegateValue(), which make it composable. In particular, root negations of div/zext/sext/ashr/lshr/sub can now always be performed through a shl/trunc as well. Differential Revision: https://reviews.llvm.org/D73288	2020-01-27 20:46:23 +01:00
Nikita Popov	0957748cb7	[InstCombine] Add more negation tests; NFC Additional test cases for pushing negations through various instructions.	2020-01-27 20:46:23 +01:00
Simon Pilgrim	f99ef5455a	[InstCombine] Add extra shift(c1,add(c2,y)) tests for PR15141	2020-01-26 19:04:12 +00:00
Guillaume Chatelet	cc034a5883	[IR] masked gather/scatter alignment should be set Summary: masked_load and masked_store instructions require the alignment to be specified and a power of two. It seems to me that this requirement applies to masked_gather and masked_scatter as well. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73179	2020-01-26 18:51:36 +01:00
Nikita Popov	0b83c5a78f	[InstCombine] Combine neg of shl of sub (PR44529) Fixes https://bugs.llvm.org/show_bug.cgi?id=44529. We already have a combine to sink a negation through a left-shift, but it currently only works if the shift operand is negatable without creating any instructions. This patch introduces freelyNegateValue() as a more powerful extension of dyn_castNegVal(), which allows negating a value as long as this doesn't end up increasing instruction count. Specifically, this patch adds support for negating A-B to B-A. This mechanism could in the future be extended to handle general negation chains that a) start at a proper 0-X negation and b) only require one operand to be freely negatable. This would end up as a weaker form of D68408 aimed at the most obviously profitable subset that eliminates a negation entirely. Differential Revision: https://reviews.llvm.org/D72978	2020-01-22 23:03:58 +01:00
Nikita Popov	80c34f94ac	[InstCombine] Add test for PR44529; NFC	2020-01-22 23:03:58 +01:00
Sanjay Patel	0ade2abdb0	[InstCombine] fneg(X + C) --> -C - X This is 1 of the potential folds uncovered by extending D72521. We don't seem to do this in the backend either (unless I'm not seeing some target-specific transform). icc and gcc (appears to be target-specific) do this transform. Differential Revision: https://reviews.llvm.org/D73057	2020-01-22 09:48:43 -05:00
Sanjay Patel	c0f53ed806	[InstCombine] add tests for fneg+fadd; NFC	2020-01-22 08:59:28 -05:00
Sanjay Patel	7bee94410c	[InstCombine] form copysign from select of FP constants (PR44153) This should be the last step needed to solve the problem in the description of PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 If we're casting an FP value to int, testing its signbit, and then choosing between a value and its negated value, that's a complicated way of saying "copysign": (bitcast X) < 0 ? -TC : TC --> copysign(TC, X) Differential Revision: https://reviews.llvm.org/D72643	2020-01-20 10:51:14 -05:00
Matt Arsenault	a4451d88ee	Consolidate internal denormal flushing controls Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.	2020-01-17 20:09:53 -05:00
Nikita Popov	522c030aa9	[InstCombine] Fix worklist management in DSE (PR44552) Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make sure that the store is reprocessed, because performing DSE may expose more DSE opportunities. There is a slight caveat here though: We need to make sure that we add back the store the worklist first, because that means it will be processed after the operands of the removed store have been processed. This is a general bug in InstCombine worklist management that I hope to address at some point, but for now it means we need to do this manually rather than just returning the instruction as changed. Differential Revision: https://reviews.llvm.org/D72807	2020-01-17 18:10:56 +01:00
Nikita Popov	77befe54f7	[InstCombine] Fix worklist management in return combine There are two related bugs here: First, we don't add the operand we're replacing to the worklist, which means it may not get DCEd (see test change). Second, usually this would just get picked up in the next iteration, but we also do not report the instruction as changed. This means that we do not get that extra instcombine iteration, and more importantly, may break the pass pipeline, as the function is not marked as changed. Differential Revision: https://reviews.llvm.org/D72864	2020-01-17 17:59:23 +01:00
Nikita Popov	10d0e2882b	[InstCombine] Split assume test in expensive and not; NFC The IR difference in @icmp1 serves as a test for D72864.	2020-01-17 17:57:59 +01:00
Nikita Popov	2ca092f320	[InstCombine] Support disabling expensive combines in opt Currently, there is no way to disable ExpensiveCombines when doing a standalone opt -instcombine run, as that's the default, and the opt option can currently only be used to force enable, not to force disable. The only way to disable expensive combines is via -O1 or -O2, but that of course also runs the rest of the kitchen sink... This patch allows using opt -instcombine -expensive-combines=0 to run InstCombine without ExpensiveCombines. Differential Revision: https://reviews.llvm.org/D72861	2020-01-17 17:56:20 +01:00
Nikita Popov	2d0d4235a2	[InstCombine] Add test for -expensive-combines option; NFC This shows that -expensive-combines=0 is ignored.	2020-01-17 17:56:20 +01:00
Matt Arsenault	3ef8cdf666	AMDGPU: Do permlane16 vdst_in discard optimization in InstCombine There's more potential value to discarding the source value earlier, since we always know the value of the fi/bc bits.	2020-01-16 17:27:53 -05:00
Florian Hahn	0b21d55262	[IR] Mark memset.* intrinsics as IntrWriteMem. llvm.memset intrinsics do only write memory, but are missing IntrWriteMem, so they doesNotReadMemory() returns false for them. The test change is due to the test checking the fn attribute ids at the call sites, which got bumped up due to a new combination with writeonly appearing in the test file. Reviewers: jdoerfert, reames, efriedma, nlopes, lebedev.ri Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D72789	2020-01-16 10:35:46 +00:00
Yuanfang Chen	6e24c6037f	Revert "[Support] make report_fatal_error `abort` instead of `exit`" This reverts commit `647c3f4e47`. Got bots failure from sanitizer-windows and maybe others.	2020-01-15 17:52:25 -08:00
Yuanfang Chen	647c3f4e47	[Support] make report_fatal_error `abort` instead of `exit` Summary: This patch could be treated as a rebase of D33960. It also fixes PR35547. A fix for `llvm/test/Other/close-stderr.ll` is proposed in D68164. Seems the consensus is that the test is passing by chance and I'm not sure how important it is for us. So it is removed like in D33960 for now. The rest of the test fixes are just adding `--crash` flag to `not` tool. ** The reason it fixes PR35547 is `exit` does cleanup including calling class destructor whereas `abort` does not do any cleanup. In multithreading environment such as ThinLTO or JIT, threads may share states which mostly are ManagedStatic<>. If faulting thread tearing down a class when another thread is using it, there are chances of memory corruption. This is bad 1. It will stop error reporting like pretty stack printer; 2. The memory corruption is distracting and nondeterministic in terms of error message, and corruption type (depending one the timing, it could be double free, heap free after use, etc.). Reviewers: rnk, chandlerc, zturner, sepavloff, MaskRay, espindola Reviewed By: rnk, MaskRay Subscribers: wuzish, jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, arichardson, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, lenary, s.egerton, pzheng, cfe-commits, MaskRay, filcab, davide, MatzeB, mehdi_amini, hiraditya, steven_wu, dexonsmith, rupprecht, seiya, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D67847	2020-01-15 17:05:13 -08:00
Sanjay Patel	3180af4362	[InstCombine] reassociate fsub+fsub into fsub+fadd As discussed in the motivating PR44509: https://bugs.llvm.org/show_bug.cgi?id=44509 ...we can end up with worse code using fast-math than without. This is because the reassociate pass greedily transforms fsub into fneg/fadd and apparently (based on the regression tests seen here) expects instcombine to clean that up if it wasn't profitable. But we were missing this fold: (X - Y) - Z --> X - (Y + Z) There's another, more specific case that I think we should handle as shown in the "fake" fneg test (but missed with a real fneg), but that's another patch. That may be tricky to get right without conflicting with existing transforms for fneg. Differential Revision: https://reviews.llvm.org/D72521	2020-01-15 11:14:13 -05:00
Nikita Popov	04e586151e	[InstCombine] Fix worklist management when removing guard intrinsic When multiple guard intrinsics are merged into one, currently the result of eraseInstFromFunction() is returned -- however, this should only be done if the current instruction is being removed. In this case we're removing a different instruction and should instead report that the current one has been modified by returning it. For this test case, this reduces the number of instcombine iterations from 5 to 2 (the minimum possible). Differential Revision: https://reviews.llvm.org/D72558	2020-01-14 21:47:48 +01:00
Nikita Popov	65c0805be5	[InstCombine] Fix infinite loop due to bitcast <-> phi transforms Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by explicitly performing the load combine as part of the bitcast of phi transform. Other attempts to force the load to be combined first were ultimately too unreliable. Differential Revision: https://reviews.llvm.org/D71164	2020-01-14 20:45:13 +01:00
Nikita Popov	652cd7c100	[InstCombine] Fix user iterator invalidation in bitcast of phi transform This fixes the issue encountered in D71164. Instead of using a range-based for, manually iterate over the users and advance the iterator beforehand, so we do not skip any users due to iterator invalidation. Differential Revision: https://reviews.llvm.org/D72657	2020-01-14 20:38:10 +01:00
Nikita Popov	fa63234093	[InstCombine] Add test for iterator invalidation bug; NFC	2020-01-14 20:38:10 +01:00
Sanjay Patel	57cb468514	[InstCombine] add test for possible cast-of-select transform; NFC	2020-01-14 14:23:14 -05:00
Juneyoung Lee	3e32b7e127	[InstCombine] Let combineLoadToNewType preserve ABI alignment of the load (PR44543) Summary: If aligment on `LoadInst` isn't specified, load is assumed to be ABI-aligned. And said aligment may be different for different types. So if we change load type, but don't pay extra attention to the aligment (i.e. keep it unspecified), we may either overpromise (if the default aligment of the new type is higher), or underpromise (if the default aligment of the new type is smaller). Thus, if no alignment is specified, we need to manually preserve the implied ABI alignment. This addresses https://bugs.llvm.org/show_bug.cgi?id=44543 by making combineLoadToNewType preserve ABI alignment of the load. Reviewers: spatel, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72710	2020-01-15 03:20:53 +09:00
Juneyoung Lee	0877843dda	[test] Make data layout of load-bitcast64.ll explicit, use update_test_checks.py	2020-01-15 02:49:44 +09:00
Sanjay Patel	cfe2fab708	[InstSimplify] add tests for vector select; NFC	2020-01-14 08:41:06 -05:00
Sanjay Patel	80a094e134	[InstCombine] add FMF to tests for more coverage; NFC	2020-01-13 16:29:20 -05:00
Sanjay Patel	69f4cea413	[InstCombine] add tests for select --> copysign; NFC This is testing for another (possibly final) transform suggested in: https://bugs.llvm.org/show_bug.cgi?id=44153	2020-01-13 15:39:24 -05:00
Sanjay Patel	26d2ace9e2	[InstSimplify] move tests for select from InstCombine; NFC InstCombine has transforms that would enable these simplifications in an indirect way, but those transforms are unsafe and likely to be removed.	2020-01-13 09:13:21 -05:00
Nikita Popov	0e322c8a1f	[InstCombine] Preserve nuw on sub of geps (PR44419) Fix https://bugs.llvm.org/show_bug.cgi?id=44419 by preserving the nuw on sub of geps. We only do this if the offset has a multiplication as the final operation, as we can't be sure the operations is nuw in the other cases without more thorough analysis. Differential Revision: https://reviews.llvm.org/D72048	2020-01-11 11:01:12 +01:00
Sanjay Patel	26cdaeb1f0	[InstCombine] add tests for fsub; NFC Conflicting/missing canonicalizations are visible in PR44509: https://bugs.llvm.org/show_bug.cgi?id=44509	2020-01-10 12:02:43 -05:00
@raghesh (Raghesh Aloor)	6c04ef472a	[InstCombine] Z / (1.0 / Y) => (Y * Z) This is a special case of Z / (X / Y) => (Y * Z) / X, with X = 1.0. The m_OneUse check is avoided because even in the case of the multiple uses for 1.0/Y, the number of instructions remain the same and a division is replaced by a multiplication. Differential Revision: https://reviews.llvm.org/D72319	2020-01-09 10:52:39 -05:00
Sanjay Patel	032a9393a7	[InstCombine] Use minimal FMF in testcase for Z / (1.0 / Y) => (Y * Z); NFC Patch by: @raghesh (Raghesh Aloor) Differential Revision: https://reviews.llvm.org/D72431	2020-01-09 08:21:38 -05:00
Sanjay Patel	5dfd52398f	[InstCombine] Adding testcase for Z / (1.0 / Y) => (Y * Z); NFC The added testcase shows the current transformation for the operation Z / (1.0 / Y), which remains unchanged. This will be updated to align with the transformed code (Y * Z) with D72319. The existing transformation Z / (X / Y) => (Y * Z) / X is not handling this case as there are multiple uses for (1.0 / Y) in this testcase. Patch by: @raghesh (Raghesh Aloor) Differential Revision: https://reviews.llvm.org/D72388	2020-01-08 10:33:44 -05:00
Kadir Cetinkaya	b212eb7159	Revert "[InstCombine] fold zext of masked bit set/clear" This reverts commit `a041c4ec6f`. This looks like a non-trivial change and there has been no code reviews (at least there were no phabricator revisions attached to the commit description). It is also causing a regression in one of our downstream integration tests, we haven't been able to come up with a minimal reproducer yet.	2020-01-08 11:21:21 +01:00
Sanjay Patel	f8962571f7	[InstCombine] try to pull 'not' of select into compare operands not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) --> select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?) If both sides of the select are cmps, we can remove an instruction. The case where only side is a cmp is deferred to a possible follow-on patch. We have a more general 'isFreeToInvert' analysis, but I'm not seeing a way to use that more widely without inducing infinite looping (opposing transforms). Here, we flip the compare predicates directly, so we should not have any danger by creating extra intermediate 'not' ops. Alive proofs: https://rise4fun.com/Alive/jKa Name: both select values are compares - invert predicates %tcmp = icmp sle i32 %x, %y %fcmp = icmp ugt i32 %z, %w %sel = select i1 %cond, i1 %tcmp, i1 %fcmp %not = xor i1 %sel, true => %tcmp_not = icmp sgt i32 %x, %y %fcmp_not = icmp ule i32 %z, %w %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not Name: false val is compare - invert/not %fcmp = icmp ugt i32 %z, %w %sel = select i1 %cond, i1 %tcmp, i1 %fcmp %not = xor i1 %sel, true => %tcmp_not = xor i1 %tcmp, -1 %fcmp_not = icmp ule i32 %z, %w %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not Differential Revision: https://reviews.llvm.org/D72007	2020-01-07 10:44:23 -05:00
Roman Lebedev	772ede3d5d	[InstCombine] Sink sub into hands of select if one hand becomes zero. Part 2 (PR44426) This decreases use count of %Op0, makes one hand of select to be 0, and possibly exposes further folding potential. Name: sub %Op0, (select %Cond, %Op0, %FalseVal) -> select %Cond, 0, (sub %Op0, %FalseVal) %Op0 = %TrueVal %o = select i1 %Cond, i8 %Op0, i8 %FalseVal %r = sub i8 %Op0, %o => %n = sub i8 %Op0, %FalseVal %r = select i1 %Cond, i8 0, i8 %n Name: sub %Op0, (select %Cond, %TrueVal, %Op0) -> select %Cond, (sub %Op0, %TrueVal), 0 %Op0 = %FalseVal %o = select i1 %Cond, i8 %TrueVal, i8 %Op0 %r = sub i8 %Op0, %o => %n = sub i8 %Op0, %TrueVal %r = select i1 %Cond, i8 %n, i8 0 https://rise4fun.com/Alive/aHRt https://bugs.llvm.org/show_bug.cgi?id=44426	2020-01-04 17:30:51 +03:00
Roman Lebedev	d2b79c76be	[NFC][InstCombine] 'subtract from one hands of select' pattern tests (PR44426) https://bugs.llvm.org/show_bug.cgi?id=44426	2020-01-04 17:30:51 +03:00
Roman Lebedev	4d8e47ca18	[InstCombine] Sink sub into hands of select if one hand becomes zero (PR44426) This decreases use count of %Op1, makes one hand of select to be 0, and possibly exposes further folding potential. Name: sub (select %Cond, %Op1, %FalseVal), %Op1 -> select %Cond, 0, (sub %FalseVal, %Op1) %Op1 = %TrueVal %o = select i1 %Cond, i8 %Op1, i8 %FalseVal %r = sub i8 %o, %Op1 => %n = sub i8 %FalseVal, %Op1 %r = select i1 %Cond, i8 0, i8 %n Name: sub (select %Cond, %TrueVal, %Op1), %Op1 -> select %Cond, (sub %TrueVal, %Op1), 0 %Op1 = %FalseVal %o = select i1 %Cond, i8 %TrueVal, i8 %Op1 %r = sub i8 %o, %Op1 => %n = sub i8 %TrueVal, %Op1 %r = select i1 %Cond, i8 %n, i8 0 https://rise4fun.com/Alive/avL https://bugs.llvm.org/show_bug.cgi?id=44426	2020-01-04 17:30:51 +03:00
Roman Lebedev	83aa0b6734	[NFC][InstCombine] 'subtract of one hands of select' pattern tests (PR44426) https://bugs.llvm.org/show_bug.cgi?id=44426	2020-01-04 17:30:51 +03:00
Roman Lebedev	7973aa05f6	[NFC][InstCombine] '(Op1 & С) - Op1' -> '-(Op1 & ~C)' fold (PR44427) This decreases use count of Op1, potentially allows us to further hoist said 'neg' later on, and results in marginally better X86 codegen. Name: (Op1 & С) - Op1 -> -(Op1 & ~C) %o = and i64 %Op1, C1 %r = sub i64 %o, %Op1 => %n = and i64 %Op1, ~C1 %r = sub i64 0, %n https://rise4fun.com/Alive/rwgA https://godbolt.org/z/R_RMfM https://bugs.llvm.org/show_bug.cgi?id=44427	2020-01-03 21:25:48 +03:00
Roman Lebedev	6f922dbbea	[NFC][InstCombine] '(Op1 & С) - Op1' pattern tests (PR44427)	2020-01-03 21:25:48 +03:00
Roman Lebedev	9b750cc6ba	[NFC][InstCombine] Autogenerate and2.ll checklines	2020-01-03 21:25:48 +03:00
Roman Lebedev	cc0216bedb	[NFC][InstCombine] '(X & (- Y)) - X' -> '- (X & (Y - 1))' fold (PR44448) Name: (X & (- Y)) - X -> - (X & (Y - 1)) (PR44448) %negy = sub i8 0, %y %unbiasedx = and i8 %negy, %x %r = sub i8 %unbiasedx, %x => %ymask = add i8 %y, -1 %xmasked = and i8 %ymask, %x %r = sub i8 0, %xmasked https://rise4fun.com/Alive/OIpla This decreases use count of %x, may allow us to later hoist said negation even further, and results in marginally nicer X86 codegen. See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 20:27:29 +03:00
Roman Lebedev	b87a351182	[NFC][InstCombine] '(X & (- Y)) - X' pattern tests (PR44448) As discussed in https://bugs.llvm.org/show_bug.cgi?id=44448, we can hoist negation out of the pattern.	2020-01-03 20:27:17 +03:00
Sanjay Patel	1640582743	[InstCombine] replace undef elements in vector constant when doing icmp folds (PR44383) As shown in P44383: https://bugs.llvm.org/show_bug.cgi?id=44383 ...we can't safely propagate a vector constant through this icmp fold if that vector constant contains undefined elements. We know that each defined element of the constant is safe though, so find the first of those and replicate it into the formerly undef lanes. Differential Revision: https://reviews.llvm.org/D72101	2020-01-03 09:16:57 -05:00
Sanjay Patel	4bb4f5b1d9	[InstCombine] add tests for vector icmp with undef constant elements; NFC	2020-01-02 15:39:45 -05:00
Sanjay Patel	88fc5fdef6	[InstCombine] remove uses before deleting instructions (PR43723) This is a less ambitious alternative to previous attempts to fix this bug with: rG56b2aee1875a rGef02831f0a4e rG56b2aee1875a ...because those all failed bot testing with use-after-free or other problems. The original crashing/assert problem is still showing up on various fuzzers, so I've added a new minimal test based on another one of those failures. Instead of trying to manage and coordinate the logic in isAllocSiteRemovable() with the deletion loops, just loosen the existing code that handles casts and GEP by replacing with undef to allow other opcodes. That means that no instructions with uses should assert on deletion, and there are hopefully no non-obvious sanitizer bugs induced.	2020-01-02 09:47:36 -05:00
Nikita Popov	8dd9a13619	[InstCombine] Preserve inbounds when merging with zero-index GEP (PR44423) This addresses https://bugs.llvm.org/show_bug.cgi?id=44423. If one of the GEPs is inbounds and the other is zero-index, we can also preserve inbounds. Differential Revision: https://reviews.llvm.org/D72060	2020-01-01 23:04:28 +01:00
Nikita Popov	6ba5f8c4ac	[InstCombine] Fix incorrect inbounds on GEP of GEP (PR44425) This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to drop inbounds if one of the GEPs is not inbounds. This was already done when creating a new GEP, but not when modifying in place. Differential Revision: https://reviews.llvm.org/D72059	2020-01-01 22:10:55 +01:00
Nikita Popov	11552433eb	[InstCombine] Add tests for PR44423 and PR44425; NFC	2020-01-01 20:27:57 +01:00
Nikita Popov	7f48171d2f	[InstCombine] Regenerate test checks; NFC	2020-01-01 20:27:57 +01:00
Nikita Popov	8756cd0963	[InstCombine] Add tests for sub nuw of geps; NFC Tests for PR44419.	2020-01-01 20:27:57 +01:00
Craig Topper	374e0299cf	[X86][InstCombine] Add constant folding and simplification support for pdep and pext The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result. There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted. Fixes PR44389 Differential Revision: https://reviews.llvm.org/D71952	2019-12-31 15:06:47 -08:00
Sanjay Patel	a041c4ec6f	[InstCombine] fold zext of masked bit set/clear This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1	2019-12-31 12:35:10 -05:00
Sanjay Patel	eb5c026ef0	[InstCombine] add/adjust tests for masked bit; NFC	2019-12-31 12:35:10 -05:00
Nikita Popov	7adb5c2aca	Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms" This reverts commit `27a0795943`. Seems to break test-suite.	2019-12-31 17:42:57 +01:00
Sanjay Patel	108645cd0a	[InstCombine] add tests for masked bit set/clear; NFC	2019-12-31 10:20:45 -05:00
Nikita Popov	27a0795943	[InstCombine] Fix infinite loop due to bitcast <-> phi transforms Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by adding manually removing the old phis. Differential Revision: https://reviews.llvm.org/D71164	2019-12-31 16:17:14 +01:00
Connor Abbott	fb114694e9	[InstCombine] Don't rewrite phi-of-bitcast when the phi has other users Judging by the existing comments, this was the intention, but the transform never actually checked if the existing phi's would be removed. See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where this causes much worse code generation on AMDGPU. Differential Revision: https://reviews.llvm.org/D71209	2019-12-31 12:15:02 +01:00
Connor Abbott	d04e64a25a	[InstCombine] Add tests for PR44242 Differential Revision: https://reviews.llvm.org/D71260	2019-12-31 12:15:02 +01:00
Sanjay Patel	ee3eebba0d	[InstCombine] remove stale comment on test; NFC	2019-12-30 12:39:10 -05:00
Sanjay Patel	987eb8e26c	[InstCombine] propagate sign argument through nested copysigns This is another optimization suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153	2019-12-30 11:06:02 -05:00
Qiu Chaofan	65661908cb	[NFC] Add test for load-insert-store pattern This patch adds necessary test cases for load-update-store pattern which only updates single element of vector. Differential Revision: https://reviews.llvm.org/D71886	2019-12-30 16:14:37 +08:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Sanjay Patel	25cf5d97ac	[InstCombine] add test for copysign; NFC	2019-12-23 17:54:31 -05:00
Sanjay Patel	9a77c20954	[InstCombine] add tests for not(select ...); NFC	2019-12-23 17:14:32 -05:00
Sanjay Patel	9cdcd81d3f	[InstCombine] enhance fold for copysign with known sign arg This is another optimization suggested in PRPR44153: https://bugs.llvm.org/show_bug.cgi?id=44153	2019-12-22 10:07:01 -05:00
Sanjay Patel	79c7fa31f3	[InstCombine] check alloc size in bitcast of geps fold (PR44321) We missed a constraint in D44833 when folding a bitcast into a GEP with vector/array types. If the alloc sizes specified by the datalayout don't match, this could miscompile as shown in: https://bugs.llvm.org/show_bug.cgi?id=44321 Differential Revision: https://reviews.llvm.org/D71771	2019-12-21 10:31:21 -05:00
Sanjay Patel	19f9f374d9	[SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transforms As discussed in PR44330: https://bugs.llvm.org/show_bug.cgi?id=44330 ...the transform from pow(X, -0.5) libcall/intrinsic to reciprocal square root can result in small deviations from the expected result due to differences in the pow() implementation and/or the extra rounding step from the division. This patch proposes to allow that difference with either the 'approximate functions' or 'reassociate' FMF: http://llvm.org/docs/LangRef.html#fast-math-flags In practice, this likely means that the code is compiled with all of 'fast' (-ffast-math), but I have preserved the existing specializations for -0.0/-INF that enable generating safe code if those special values are allowed simultaneously with allowing approximation/reassociation. The question about whether a similar restriction is needed for the non-reciprocal case -- pow(X, 0.5) -- is deferred. That transform is allowed without FMF currently, and this patch does not change that behavior. Differential Revision: https://reviews.llvm.org/D71706	2019-12-21 10:00:53 -05:00
Jakub Kuderski	c431c407eb	[InstCombine] Improve infinite loop detection Summary: This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop. Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details. The two limits can be specified via command line options. Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71673	2019-12-20 16:15:04 -05:00
Sanjay Patel	0b421d842d	[InstCombine] add tests for cast+gep; NFC PR44321: https://bugs.llvm.org/show_bug.cgi?id=44321	2019-12-20 10:52:23 -05:00
Roman Lebedev	047186cc98	[ValueTracking] isKnownNonZero() should take non-null-ness assumptions into consideration (PR43267) Summary: It is pretty common to assume that something is not zero. Even optimizer itself sometimes emits such assumptions (e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`). But we currently don't deal with such assumptions :) The only way `isKnownNonZero()` handles assumptions is by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`. But `x != 0` does not tell us anything about set bits, it only says that there are some set bits. So naturally, `KnownBits` does not get populated, and we fail to make use of this assumption. I propose to deal with this special case by special-casing it via adding a `isKnownNonZeroFromAssume()` that returns boolean when there is an applicable assumption. While there, we also deal with other predicates, mainly if the comparison is with constant. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 \| PR43267 ]]. Differential Revision: https://reviews.llvm.org/D71660	2019-12-20 01:47:57 +03:00
Roman Lebedev	92083a295a	[ValueTracking] isValidAssumeForContext(): CxtI itself also must transfer execution to successor This is a pretty rare case, when CxtI and assume are in the same basic block, with assume being located later. We were already checking that assumption was guaranteed to be executed, but we omitted CxtI itself from consideration, and as the test (miscompile) shows, that is incorrect. As noted in D71660 review by @nikic.	2019-12-20 01:47:57 +03:00
Roman Lebedev	ffcae008d7	[NFC][InstCombine] Add a test for assume-induced miscompile @escape() may throw here, we don't know that assumption, which is located afterwards in the same block, is executed, therefore %load arg of call to @escape() can not be marked as non-null. As noted in D71660 review by @nikic.	2019-12-20 01:47:56 +03:00
Sanjay Patel	5889e7823d	[InstCombine] add/adjust tests for pow->sqrt; NFC There's at least 1 bug here as discussed in PR44330.	2019-12-19 09:25:19 -05:00
David Green	a59cc5e128	[InstCombine] Canonicalize select immediates In certain situations after inlining and simplification we end up with code that is _almost_ a min/max pattern, but contains constants that have been demand-bit optimised to the wrong values, ending up with code like: %1 = icmp slt i32 %shr, -128 %2 = select i1 %1, i32 128, i32 %shr %.inv = icmp sgt i32 %shr, 127 %spec.select.i = select i1 %.inv, i32 127, i32 %2 %conv7 = trunc i32 %spec.select.i to i8 This should be turned into a min/max pattern, but the -128 in the first select was instead transformed into 128, as only the bottom byte was ever demanded. To fix this, I've put in further canonicalisation for the immediates of selects, preferring to use the same value as the icmp if available. Differential Revision: https://reviews.llvm.org/D71516	2019-12-19 12:36:46 +00:00
David Green	d38153325f	[Instcombine] Add select canonicalization tests. NFC	2019-12-19 12:36:46 +00:00
Piotr Sobczak	40b5a0f7c8	Revert "[InstCombine][AMDGPU] Trim more components of *buffer_load" Revert D70315, as it breaks gfx8 for some reason. This reverts commit `65f94b3380`.	2019-12-18 22:04:44 +01:00
Jakub Kuderski	3d29c41ad5	[InstCombine] Insert instructions before adding them to worklist Summary: This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions. It also adds a check in `Worklist::Add` that ensures that all added instructions have parents. Simple test case that illustrates the difference when run with `--debug-only=instcombine`: ``` define i32 @test35(i32 %a, i32 %b) { %1 = or i32 %a, 1135 %2 = or i32 %1, %b ret i32 %2 } ``` Before this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: <badref> = or i32 %2, 1135 ... ``` With this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: %3 = or i32 %2, 1135 ... ``` Reviewers: fhahn, davide, spatel, foad, grosser, nikic Reviewed By: nikic Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71093	2019-12-18 14:55:41 -05:00
Jakub Kuderski	406b6019cd	[InstCombine] Allow to limit the max number of iterations Summary: This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions. InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites. Bounding the number of iterations can have 2 benefits: * In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything. * When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`). This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in. Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00 Reviewed By: spatel Subscribers: craig.topper, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71145	2019-12-18 13:48:54 -05:00
stozer	89d19d60ad	Reapply: [DebugInfo] Correctly handle salvaged casts and split fragments at ISel This reverts commit `1f3dd83cc1`, reapplying commit `bb1b0bc4e5`. The original commit failed on some builds seemingly due to the use of a bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.	2019-12-18 16:26:42 +00:00
Roman Lebedev	c6a56c9a50	[NFC][InstCombine] Autogenerate assume.ll test	2019-12-18 17:16:19 +03:00
Sanjay Patel	c7492fbd4e	[InstCombine] add tests for copysign; NFC	2019-12-18 07:56:36 -05:00
stozer	1f3dd83cc1	Revert "[DebugInfo] Correctly handle salvaged casts and split fragments at ISel" Reverted due to build failure on windows bots. This reverts commit `bb1b0bc4e5`.	2019-12-18 11:46:10 +00:00
stozer	bb1b0bc4e5	[DebugInfo] Correctly handle salvaged casts and split fragments at ISel Previously, LLVM had no functional way of performing casts inside of a DIExpression(), which made salvaging cast instructions other than Noop casts impossible. This patch enables the salvaging of casts by using the DW_OP_LLVM_convert operator for SExt and Trunc instructions. There is another issue which is exposed by this fix, in which fragment DIExpressions (which are preserved more readily by this patch) for values that must be split across registers in ISel trigger an assertion, as the 'split' fragments extend beyond the bounds of the fragment DIExpression causing an error. This patch also fixes this issue by checking the fragment status of DIExpressions which are to be split, and dropping fragments that are invalid.	2019-12-18 11:09:18 +00:00
Piotr Sobczak	65f94b3380	[InstCombine][AMDGPU] Trim more components of buffer_load Summary: Add trimming of unused components of s_buffer_load. Extend trimming of buffer_load to also include unused components at the beginning of vectors and update offset. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70315	2019-12-17 17:50:07 +01:00
Craig Topper	02f644c59a	[InstCombine] Teach removeBitcastsFromLoadStoreOnMinMax not to change the size of a store. We can change the type as long as we don't change the size. Fixes PR44306 Differential Revision: https://reviews.llvm.org/D71532	2019-12-16 12:12:54 -08:00
Sanjay Patel	6080387f13	[InstSimplify] fold splat of inserted constant to vector constant shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...> This is another missing shuffle fold pattern uncovered by the shuffle correctness fix from D70246. The problem was visible in the post-commit thread example, but we managed to overcome the limitation for that particular case with D71220. This is something like the inverse of the previous fix - there we didn't demand the inserted scalar, and here we are only demanding an inserted scalar. Differential Revision: https://reviews.llvm.org/D71488	2019-12-15 09:32:03 -05:00
Nicola Zaghen	97572775d2	Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-13 14:30:21 +00:00
Nicola Zaghen	f798eb21ec	Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same." This reverts commit `5f6208778f`. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.	2019-12-12 10:29:54 +00:00
Nicola Zaghen	5f6208778f	[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-12 10:07:01 +00:00
Nikita Popov	8db5143b1a	[InstCombine] Optimize overflow check base on uadd.with.overflow result Fix for https://bugs.llvm.org/show_bug.cgi?id=40846. This adds a combine for cases where a (a + b) < a style overflow check is performed, but with a + b being the result of uadd.with.overflow, so the overflow result is also already available and we can just use it. Subsequently GVN/CSE will deduplicate the extracts. We can run into this situation if you have both a uadd.with.overflow and a manual add + overflow check in the same function (on the same operands), in which case GVN will rewrite the add to the with.overflow result and leave you with this pattern. The implementation is a bit ugly because I'm handling the various canonicalization edge cases. This does not yet handle the negated version of this pattern. Differential Revision: https://reviews.llvm.org/D58644	2019-12-11 20:52:04 +01:00
Danila Kutenin	19e83a9b4c	[ValueTracking] Pointer is known nonnull after load/store If the pointer was loaded/stored before the null check, the check is redundant and can be removed. For now the optimizers do not remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG. The patch allows to use more nonnull constraints. Also, it found one more optimization in some PowerPC test. This is my first llvm review, I am free to any comments. Differential Revision: https://reviews.llvm.org/D71177	2019-12-11 20:32:29 +01:00
Sanjay Patel	252d3b9805	[InstSimplify] add tests for insert constant + splat; NFC	2019-12-10 17:16:58 -05:00
Sanjay Patel	396d18aeb6	[InstCombine] replace shuffle's insertelement operand if inserted scalar is not demanded This pattern is noted as a regression from: D70246 ...where we removed an over-aggressive shuffle simplification. SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses, so I'm proposing to pattern match the minimal sequence directly. This fold does not conflict with any of our current shuffle undef/poison semantics. Differential Revision: https://reviews.llvm.org/D71220	2019-12-10 10:10:05 -05:00
Johannes Doerfert	a7d992c0f2	[ValueTracking] Allow context-sensitive nullness check for non-pointers Summary: Same as D60846 and D69571 but with a fix for the problem encountered after them. Both times it was a missing context adjustment in the handling of PHI nodes. The reproducers created from the bugs that caused the old commits to be reverted are included. Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans Subscribers: hiraditya, bollu, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71181	2019-12-09 15:15:52 -06:00
Sanjay Patel	92f94b762a	[InstCombine] add tests for shuffle with insertelement operand; NFC	2019-12-09 14:27:03 -05:00
Bob Haarman	055779a9ac	Revert "[InstCombine] keep assumption before sinking calls" Summary: This reverts commit `c3b06d0c39`. Reason for revert: Caused miscompiles when inserting assume for undef. Also adds a test to prevent similar breakage in future. Fixes PR44154. Reviewers: rnk, jdoerfert, efriedma, xbolva00 Reviewed By: rnk Subscribers: thakis, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70933	2019-12-05 10:39:34 -08:00
Roman Lebedev	796fa662f1	[InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to `sub A, zext B -> add A, sext B`) Summary: D68408 proposes to greatly improve our negation sinking abilities. But in current canonicalization, we produce `sub A, zext(B)`, which we will consider non-canonical and try to sink that negation, undoing the existing canonicalization. So unless we explicitly stop producing previous canonicalization, we will have two conflicting folds, and will end up endlessly looping. This inverts canonicalization, and adds back the obvious fold that we'd miss: * `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)` https://rise4fun.com/Alive/xx4 * `sext(bool) + C -> bool ? C - 1 : C` https://rise4fun.com/Alive/fBl It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()` (oss-fuzz 4871) are no longer folded as much, though those aren't really worrying. Reviewers: spatel, efriedma, t.p.northover, hfinkel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71064	2019-12-05 21:21:30 +03:00
Sanjay Patel	3c6b5d3674	[InstCombine] narrow select with FP casts Select doesn't change values, so truncate of extended operand cancels out.	2019-12-05 11:12:44 -05:00
Sanjay Patel	403bb33a2e	[InstCombine] add tests for fpext+select+fptrunc; NFC	2019-12-05 10:49:29 -05:00
Roman Lebedev	09311459e3	[InstCombine] Extend `0 - (X sdiv C) -> (X sdiv -C)` fold to non-splat vectors Split off from https://reviews.llvm.org/D68408	2019-12-05 15:48:29 +03:00
Roman Lebedev	b89ba5f939	[NFC][InstCombine] Autogenerate check lines in a few tests These files are potentially affected by Negator (D68408) patch.	2019-12-05 01:14:03 +03:00
Roman Lebedev	cd04e8349b	[NFC][InstCombine] Update sub-of-negatible.ll test	2019-12-04 15:49:36 +03:00
Craig Topper	5ebbabc1af	[InstCombine] Revert `aafde063aa` and `6749dc3446` related to bitcast handling of x86_mmx This reverts these two commits [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double. [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement We're seeing at least one internal test failure related to a bitcast that was previously before an inline assembly block containing emms being placed after it. This leads to the mmx state ending up not empty after the emms. IR has no way to make any specific guarantees about this. Reverting these patches to get back to previous behavior which at least worked for this test.	2019-12-03 14:02:22 -08:00
Sanjay Patel	af4e59949c	[InstCombine] fix undef propagation for vector urem transform (PR44186) As described here: https://bugs.llvm.org/show_bug.cgi?id=44186 The match() code safely allows undef values, but we can't safely propagate a vector constant that contains an undef to the new compare instruction.	2019-12-02 12:17:38 -05:00
Simon Tatham	01aefae4a1	[ARM,MVE] Add an InstCombine rule permitting VPNOT. Summary: If a user writing C code using the ACLE MVE intrinsics generates a predicate and then complements it, then the resulting IR will use the `pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit integer; complement that integer; and convert back. This will generate machine code that moves the predicate out of the `P0` register, complements it in an integer GPR, and moves it back in again. This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement of the original predicate vector, which we can already instruction- select as the VPNOT instruction which complements P0 in place. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70484	2019-12-02 16:20:30 +00:00
Roman Lebedev	0f22e783a0	[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100	2019-12-02 18:06:15 +03:00
Sanjay Patel	af0babc90a	[InstCombine] fold copysign with constant sign argument to (fneg+)fabs If the sign of the sign argument is known (this could be extended to use ValueTracking), then we can use fneg+fabs to clear/set the sign bit of the magnitude argument. http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic This transform is already done in DAGCombiner, but we can do it sooner in IR as suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 We have effectively no analysis for copysign in IR, so we are taking the unusual step of increasing the number of IR instructions for the negative constant case. Differential Revision: https://reviews.llvm.org/D70792	2019-12-02 09:23:12 -05:00
Bjorn Pettersson	a9d6b0e544	[InstCombine] Fix big-endian miscompile of (bitcast (zext/trunc (bitcast))) Summary: optimizeVectorResize is rewriting patterns like: %1 = bitcast vector %src to integer %2 = trunc/zext %1 %dst = bitcast %2 to vector Since bitcasting between integer an vector types gives different integer values depending on endianness, we need to take endianness into account. As it happens the old implementation only produced the correct result for little endian targets. Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178 Reviewers: spatel, lattner, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70844	2019-12-02 11:05:25 +01:00
Craig Topper	67298d683c	[X86][InstCombine] Move non-X86 specific instcombine test from test/CodeGen/X86/ to test/Transforms/InstCombine/	2019-12-01 10:31:04 -08:00
Craig Topper	3dd93dc2a1	[X86][InstCombine] Move instcombine test from test/CodeGen/X86 to test/Transforms/InstCombine/ and replace grep with FileCheck	2019-12-01 10:31:04 -08:00
David Green	59b56e5c57	[InstCombine] Expand usub_sat patterns to handle constants The constants come through as add %x, -C, not a sub as would be expected. They need some extra matchers to canonicalise them towards usub_sat. Differential Revision: https://reviews.llvm.org/D69514	2019-11-30 16:58:01 +00:00
David Green	3a1bef5616	[InstCombine] Adjust usub_sat fold one use checks This adjusts the one use checks in the the usub_sat fold code to not increase instruction count, but otherwise do the fold. Reviewed as a part of D69514.	2019-11-30 16:58:00 +00:00
David Green	a46b959ebd	[InstCombine] More usub_sat tests. NFC.	2019-11-30 16:58:00 +00:00
Bjorn Pettersson	363cbcc590	[InstCombine] Run the cast.ll test a twice, now also testing little endian. NFC Some tests in test/Transforms/InstCombine/cast.ll depend on endianness. Added a second run line to run the tests with both big and little endian. In the past we only compiled for big endian, and then it was hard to see if any big endian bugfixes would impact the little endian result etc.	2019-11-29 13:24:13 +01:00
Sanjay Patel	5e6b728763	[InstCombine] add tests for copysign; NFC	2019-11-27 11:32:23 -05:00
Sanjay Patel	8d20dd0b06	[ConstFolding] move tests for copysign; NFC InstCombine doesn't have any transforms for copysign currently.	2019-11-26 16:54:46 -05:00
Dávid Bolvanský	bb7b8540f0	[InstCombine] Optimize some memccpy calls to memcpy/null Summary: return memccpy(d, "helloworld", 'r', 20) => return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8 Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68089	2019-11-26 10:54:47 +01:00
Sanjay Patel	f575f12c64	[InstCombine] remove identity shuffle simplification for mask with undefs And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that pattern. That preserves some of the old optimizations in IR. Given a shuffle that includes undef elements in an otherwise identity mask like: define <4 x float> @shuffle(<4 x float> %arg) { %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> ret <4 x float> %shuf } We were simplifying that to the input operand. But as discussed in PR43958: https://bugs.llvm.org/show_bug.cgi?id=43958 ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). Differential Revision: https://reviews.llvm.org/D70246	2019-11-24 10:06:26 -05:00
Davide Italiano	c32f0ff92f	[InstCombine] Fix call guard difference with dbg Patch by Chris Ye! Differential Revision: https://reviews.llvm.org/D68004	2019-11-22 13:35:53 -08:00
Philip Reames	1f4395942f	Precommit tests for forthcoming widenable.condition transforms	2019-11-20 17:02:04 -08:00
Simon Tatham	f4f77aa53e	[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i. If you're writing C code using the ACLE MVE intrinsics that passes the result of a vcmp as input to a predicated intrinsic, e.g. mve_pred16_t pred = vcmpeqq(v1, v2); v_out = vaddq_m(v_inactive, v3, v4, pred); then clang's codegen for the compare intrinsic will create calls to `@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an `mve_pred16_t` integer representation, and then the next intrinsic will call `@llvm.arm.mve.pred.i2v` to convert it straight back again. This will be visible in the generated code as a `vmrs`/`vmsr` pair that move the predicate value pointlessly out of `p0` and back into it again. To prevent that, I've added InstCombine rules to remove round trips of the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine about the known and demanded bits of those intrinsics. As a result, you now get just the generated code you wanted: vpt.u16 eq, q1, q2 vaddt.u16 q0, q3, q4 Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70313	2019-11-18 10:39:30 +00:00
Sanjay Patel	5d67d81f48	[InstCombine] prevent crashing/assert on shift constant expression (PR44028) The binary operator cast implies an instruction, but the matcher for shift does not: https://bugs.llvm.org/show_bug.cgi?id=44028	2019-11-17 17:31:09 -05:00
Florian Hahn	8eeabbaf5d	[ConstantFold] Handle identity folds at top of ConstantFoldBinaryInst Currently we miss folds with undef and identity values for binary ops that do not fold to undef in general. We can generalize the identity simplifications and do them before checking for undef in particular. Alive checks: * OR - https://rise4fun.com/Alive/8OsK * AND - https://rise4fun.com/Alive/e3tE This will also allow us to remove some now redundant cases throughout the function, but I would like to do this as follow-up. That should make tracking down potential issues easier. Reviewers: spatel, RKSimon, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D70169	2019-11-17 21:30:14 +00:00
David Green	08390c52a2	[InstCombine] Canonicalize ssub.with.overflow with clamp to ssub.sat Working on top of D69252, this adds canonicalisation patterns for ssub.with.overflow to ssub.sats. Differential Revision: https://reviews.llvm.org/D69753	2019-11-17 10:45:11 +00:00
David Green	03fce6b12e	[InstCombine] Canonicalize sadd.with.overflow with clamp to sadd.sat This adds to D69245, adding extra signed patterns for folding from a sadd_with_overflow to a sadd_sat. These are more complex than the unsigned patterns, as the overflow can occur in either direction. For the add case, the positive overflow can only occur if both of the values are positive (same for both the values being negative). So there is an extra select on whether to use the positive or negative overflow limit. Differential Revision: https://reviews.llvm.org/D69252	2019-11-17 10:42:39 +00:00

... 2 3 4 5 6 ...

4946 Commits