llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	b74c6d2c9d	[InlineFunction] Disable emission of alignment assumptions by default In D74183 clang started emitting alignment for sret parameters unconditionally. This caused a 1.5% compile-time regression on tramp3d-v4. The reason is that we now generate many instance of IR like %ptrint = ptrtoint %class.GuardLayers* %guards_m to i64 %maskedptr = and i64 %ptrint, 3 %maskcond = icmp eq i64 %maskedptr, 0 tail call void @llvm.assume(i1 %maskcond) to preserve the alignment information during inlining. Based on IR analysis, these assumptions also regress optimization. The attached phase ordering test case illustrates two issues: One are instruction count based optimization heuristics, which are affected by the four additional instructions of the assumption. The other is blocking of SROA due to ptrtoint casts (PR45763). We already encountered the same problem in Rust, where we (unlike Clang) generally prefer to emit alignment information absolutely everywhere it is available. We were only able to do this after hardcoding -preserve-alignment-assumptions-during-inlining=false, because we were seeing significant optimization and compile-time regressions otherwise. This patch disables -preserve-alignment-assumptions-during-inlining by default, because we should not be punishing people for adding more alignment annotations. Once the assume bundle work shakes out and we can represent (and use) alignment assumptions using assume bundles, it should be possible to re-enable this with reduced overhead. Differential Revision: https://reviews.llvm.org/D76886	2020-04-30 23:12:54 +02:00
Sanjay Patel	bef6e67e95	[VectorCombine] transform bitcasted shuffle to wider elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' This is the widen shuffle elements enhancement to D76727. It builds on the analysis and simplifications in D77881 and rG6a7e958a423e. The phase ordering tests show that we can simplify inverse shuffles across a binop in both directions (widen/narrow or narrow/widen) now. There's another potential transform visible in some of the remaining TODOs - move a bitcasted operand of a shuffle after the shuffle. Differential Revision: https://reviews.llvm.org/D78371	2020-04-19 08:24:38 -04:00
Sanjay Patel	9d9a088e51	[PhaseOrdering] remove blank lines in tests; NFC	2020-04-17 10:30:38 -04:00
Sanjay Patel	538a8f0227	[InstCombine] convert bitcast-shuffle to vector trunc As discussed in D76983, that patch can turn a chain of insert/extract with scalar trunc ops into bitcast+extract and existing instcombine vector transforms end up creating a shuffle out of that (see the PhaseOrdering test for an example). Currently, that process requires at least this sequence: -instcombine -early-cse -instcombine. Before D76983, the sequence of insert/extract would reach the SLP vectorizer and become a vector trunc there. Based on a small sampling of public targets/types, converting the shuffle to a trunc is better for codegen in most cases (and a regression of that form is the reason this was noticed). The trunc is clearly better for IR-level analysis as well. This means that we can induce "spontaneous vectorization" without invoking any explicit vectorizer passes (at least a vector cast op may be created out of scalar casts), but that seems to be the right choice given that we started with a chain of insert/extract, and the backend would expand back to that chain if a target does not support the op. Differential Revision: https://reviews.llvm.org/D77299	2020-04-05 09:48:02 -04:00
Sanjay Patel	389704cc60	[PhaseOrdering] add shuffle tests based on D40633; NFC We got some of the potential optimizations with D76727 and D76844. There are 2 likely enhancements that we could add to -vector-combine to get most of the remaining cases: 1. Allow bitcasted shuffle mask narrowing (widen the elements). 2. Combine shuffle-of-shuffle into a single shuffle. This is already partly handled by the x86 backend, but the tests here show that we still miss some of the potential combines.	2020-04-03 12:44:49 -04:00
Sanjay Patel	12fcbcecff	[InstCombine] add tests for cmyk benchmark; NFC These are versions of a function that regressed with: rGf2fbdf76d8d0 That particular problem occurs with an instcombine-simplifycfg-instcombine sequence, but we can show that it exists within instcombine only with other variations of the pattern.	2020-04-02 13:00:46 -04:00
Sanjay Patel	a19b27b90e	[PhaseOrdering] add test for vector trunc; NFC See discussion in D76983.	2020-04-02 08:13:19 -04:00
Roman Lebedev	1badf7c33a	[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org/z/G24anE This resolves phase-ordering bug that was introduced in D75145 for https://godbolt.org/z/2gBwF2 https://godbolt.org/z/XvgSua Reviewers: spatel, nikic, dmgreen, xbolva00 Reviewed By: nikic, xbolva00 Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75757	2020-03-06 21:39:07 +03:00
Sanjay Patel	71a316883d	[PassManager] adjust VectorCombine placement The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs: https://bugs.llvm.org/show_bug.cgi?id=45015 https://bugs.llvm.org/show_bug.cgi?id=42022 This patch contains a few independent changes: 1. Move the pass up in the pipeline, so it happens just after loop-vectorization. This is only to keep vectorization passes together in the pipeline at the moment. I don't have evidence of interaction between these yet. 2. Add an -early-cse pass after -vector-combine to clean up redundant ops. This was partly proposed as far back as rL219644 (which is why it's effectively being moved in the old PM code). This is important because the subsequent -instcombine doesn't work as well without EarlyCSE. With the CSE, -instcombine is able to squash shuffles together in 1 of the tests (because those are simple "select" shuffles). 3. Remove the -vector-combine pass that was running after SLP. We may want to do that eventually, but I don't have a test case to support it yet. Differential Revision: https://reviews.llvm.org/D75145	2020-03-04 11:10:49 -05:00
Sanjay Patel	922558be9e	[PhaseOrdering] add tests for missed CSE; NFC Also add a RUN line for the new pass manager.	2020-02-25 14:30:59 -05:00
Sanjay Patel	f452f7b95a	[PhaseOrdering] add test for missing vector/CSE transforms (PR45015); NFC	2020-02-25 09:13:49 -05:00
Sanjay Patel	b8ebc11f03	[EarlyCSE] avoid crashing when detecting min/max/abs patterns (PR41083) As discussed in PR41083: https://bugs.llvm.org/show_bug.cgi?id=41083 ...we can assert/crash in EarlyCSE using the current hashing scheme and instructions with flags. ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or other flags when detecting patterns such as min/max/abs composed of compare+select. But the value numbering / hashing mechanism used by EarlyCSE intersects those flags to allow more CSE. Several alternatives to solve this are discussed in the bug report. This patch avoids the issue by doing simple matching of min/max/abs patterns that never requires instruction flags. We give up some CSE power because of that, but that is not expected to result in much actual performance difference because InstCombine will canonicalize these patterns when possible. It even has this comment for abs/nabs: /// Canonicalize all these variants to 1 pattern. /// This makes CSE more likely. (And this patch adds PhaseOrdering tests to verify that the expected transforms are still happening in the standard optimization pipelines. I left this code to use ValueTracking's "flavor" enum values, so we don't have to change the callers' code. If we decide to go back to using the ValueTracking call (by changing the hashing algorithm instead), it should be obvious how to replace this chunk. Differential Revision: https://reviews.llvm.org/D74285	2020-02-10 17:25:34 -05:00
Sanjay Patel	0ad6e726ec	[Transforms] add phase ordering tests for min/max/abs; NFC Test that instcombine and early-cse can cooperate to reduce sequences of select patterns that are not composed of the same underlying instructions. There's a bug in EarlyCSE (PR41083), and we can test how much a possible fix (D74285) may affect optimization.	2020-02-10 15:14:46 -05:00
Nicola Zaghen	97572775d2	Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-13 14:30:21 +00:00
Nicola Zaghen	f798eb21ec	Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same." This reverts commit `5f6208778f`. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.	2019-12-12 10:29:54 +00:00
Nicola Zaghen	5f6208778f	[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-12 10:07:01 +00:00
Dávid Bolvanský	40963b2bf0	Revert "[Attributor] Move pass after InstCombine to futher eliminate null pointer checks" This reverts commit `7ca7d62c6e`. Commited accidentally.	2019-11-27 22:45:47 +01:00
Dávid Bolvanský	7ca7d62c6e	[Attributor] Move pass after InstCombine to futher eliminate null pointer checks Summary: PR44149 Reviewers: jdoerfert Subscribers: mehdi_amini, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70737	2019-11-27 22:36:51 +01:00
Eric Christopher	fd39b1bb20	Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there."" This reapplies: `8ff85ed905` Original commit message: As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410 This reverts commit `c9ddb02659`.	2019-11-26 20:28:52 -08:00
Muhammad Omair Javaid	c9ddb02659	Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there." This reverts commit `8ff85ed905`. This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu Following tests were failing: lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410	2019-11-26 09:32:13 +05:00
Eric Christopher	8ff85ed905	As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410	2019-11-25 17:16:46 -08:00
Roman Lebedev	9c604a0dd6	[NFC][PhaseOrdering] Add end-to-end tests for the 'two shifts by sext' problem We start with two separate sext's, but EarlyCSE runs before InstCombine, so when we get them, they are a single sext, and we just ignore that. Likewise, if we had a single sext, we don't do anything there. llvm-svn: 373115	2019-09-27 19:32:43 +00:00
Dmitri Gribenko	2bf8d77453	Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."" This reverts commit r371502, it broke tests (clang/test/CodeGenCXX/auto-var-init.cpp). llvm-svn: 371507	2019-09-10 10:39:09 +00:00
Clement Courbet	612c260ec3	Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." With a fix for sanitizer breakage (see explanation in D60318). llvm-svn: 371502	2019-09-10 09:18:00 +00:00
Roman Lebedev	c584786854	[InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` inverted overflow bit Summary: Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`, and with D65147 we've flattened the CFG, we now can see that the guard may have been there to prevent division by zero is redundant. We can simply drop it: ``` ---------------------------------------- Name: no overflow or zero %iszero = icmp eq i4 %y, 0 %umul = smul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %umul.ov.not = xor %umul.ov, -1 %retval.0 = or i1 %iszero, %umul.ov.not ret i1 %retval.0 => %iszero = icmp eq i4 %y, 0 %umul = smul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %umul.ov.not = xor %umul.ov, -1 %retval.0 = or i1 %iszero, %umul.ov.not ret i1 %umul.ov.not Done: 1 Optimization is correct! ``` Note that this is inverted from what we have in a previous patch, here we are looking for the inverted overflow bit. And that inversion is kinda problematic - given this particular pattern we neither hoist that `not` closer to `ret` (then the pattern would have been identical to the one without inversion, and would have been handled by the previous patch), neither do the opposite transform. But regardless, we should handle this too. I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 \| PR42720 ]]. Reviewers: nikic, spatel, xbolva00, RKSimon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65151 llvm-svn: 370351	2019-08-29 12:48:04 +00:00
Roman Lebedev	aaf6ab4410	[InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` overflow bit Summary: Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`, and with D65147 we've flattened the CFG, we now can see that the guard may have been there to prevent division by zero is redundant. We can simply drop it: ``` ---------------------------------------- Name: no overflow and not zero %iszero = icmp ne i4 %y, 0 %umul = umul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %retval.0 = and i1 %iszero, %umul.ov ret i1 %retval.0 => %iszero = icmp ne i4 %y, 0 %umul = umul_overflow i4 %x, %y %umul.ov = extractvalue {i4, i1} %umul, 1 %retval.0 = and i1 %iszero, %umul.ov ret %umul.ov Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65150 llvm-svn: 370350	2019-08-29 12:47:50 +00:00
Roman Lebedev	9f35d2b564	[SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values Summary: As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow' and got rid of potential for division-by-zero, the control flow remains, we still have that branch. We have this condition: ``` // Don't fold i1 branches on PHIs which contain binary operators // These can often be turned into switches and other things. if (PN->getType()->isIntegerTy(1) && (isa<BinaryOperator>(PN->getIncomingValue(0)) \|\| isa<BinaryOperator>(PN->getIncomingValue(1)) \|\| isa<BinaryOperator>(IfCond))) return false; ``` which was added back in rL121764 to help with `select` formation i think? That check prevents us to flatten the CFG here, even though we know we no longer need that guard and will be able to drop everything but the '@llvm.umul.with.overflow' + `not`. As it can be seen from tests, we end here because the `not` is being sinked into the PHI's incoming values by InstCombine, so we can't workaround this by hoisting it to after PHI. Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`. Reviewers: craig.topper, spatel, fhahn, nikic Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65147 llvm-svn: 370349	2019-08-29 12:47:34 +00:00
Roman Lebedev	fb38b7aab3	[InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `(-1 u/ %x) u< %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` ---------------------------------------- Name: no overflow %o0 = udiv i4 -1, %x %r = icmp ult i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow, swapped %o0 = udiv i4 -1, %x %r = icmp ugt i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp uge i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp ule i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ``` As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow` is easy, if we were looking for the inverted answer, then more work needs to be done to cleanup the now-pointless control-flow that was guarding against division-by-zero. This is being addressed in follow-up patches. Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65143 llvm-svn: 370347	2019-08-29 12:47:08 +00:00
Vitaly Buka	e19f3ff4c8	Add PhaseOrdering/lifetime-sanitizer.ll tests Reviewers: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66761 llvm-svn: 369996	2019-08-27 00:18:28 +00:00
Roman Lebedev	87fdcb8749	[NFC][PhaseOredering][SimplifyCFG] Add more runlines to umul.with.overflow tests This way it will be more obvious that the problem is both in cost threshold and in hardcoded benefit check, plus will show how the instsimplify cleans this all in the end. llvm-svn: 366800	2019-07-23 12:42:41 +00:00
Roman Lebedev	6b248fca33	[NFC][PhaseOrdering] Add tests showcasing the problems of unsigned multiply overflow check While we can form the @llvm.mul.with.overflow easily, we are still left with that check that was guarding against div-by-0. And in the second case we won't even flatten the CFG. llvm-svn: 366747	2019-07-22 22:08:35 +00:00
Clement Courbet	2851248fa1	Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." Breaks sanitizers: libFuzzer :: cxxstring.test libFuzzer :: memcmp.test libFuzzer :: recommended-dictionary.test libFuzzer :: strcmp.test libFuzzer :: value-profile-mem.test libFuzzer :: value-profile-strcmp.test llvm-svn: 364416	2019-06-26 12:13:13 +00:00
Clement Courbet	7b3a5f0e6d	[ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline. This allows later passes (in particular InstCombine) to optimize more cases. One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0. llvm-svn: 364412	2019-06-26 11:50:18 +00:00
Nemanja Ivanovic	1d662316cb	[Pass Pipeline][NFC] Add a test prior to committing D61726 This patch just adds a test case to show the differences in code emitted by opt before and after https://reviews.llvm.org/D61726. Previous attempt to commit this did not include the registered target requirement so it caused buildbot breaks. llvm-svn: 360620	2019-05-13 21:14:36 +00:00
Nemanja Ivanovic	34dc3aca40	Pull r360426 as it is breaking the build bots. llvm-svn: 360437	2019-05-10 16:03:22 +00:00
Nemanja Ivanovic	7a41cd5b88	Another attempt to fix the build bot breaks after r360426 The test case checks were produced by the update_test_checks.py scripts and I assumed that is sufficient. However, the behaviour is different with different default target triples. Specify the triple explicitly in the test case. If this doesn't clean up the build bot breaks, I'll remove the test case until I can get to the bottom of why the behaviour on build bots is different from my machine. llvm-svn: 360434	2019-05-10 15:44:56 +00:00
Nemanja Ivanovic	0f991c65f2	Fix build break after r360426 llvm-svn: 360433	2019-05-10 15:11:40 +00:00
Nemanja Ivanovic	cfc89896e0	[Pass Pipeline][NFC] Add a test prior to committing D61726 This patch just adds a test case to show the differences in code emitted by opt before and after https://reviews.llvm.org/D61726. llvm-svn: 360426	2019-05-10 13:47:00 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Sanjay Patel	654e6aabb9	[InstCombine] canonicalize raw IR rotate patterns to funnel shift The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199	2019-01-01 21:51:39 +00:00
Sanjay Patel	200885e654	[AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924) Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. In the worst case (a target that doesn't have rotate instructions), we will expand this into a branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very likely to be a perf improvement over the original code. The motivating source code pattern for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=34924 Background: I looked at several different options before deciding where to try this - instcombine, simplifycfg, CGP - because it doesn't fit cleanly anywhere AFAIK. The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to have the IR converted before we reach things like vectorization because the reduced code can make a loop much simpler to transform. Technically, this could be included in instcombine, but it's a large pattern match that includes control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch. So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. This only runs at -O3, but that seems like a reasonable limitation given that source code has many options to avoid this pattern (including the recently added clang intrinsics for rotates). I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug with the new pass manager that's independent of this change (but it will be masked if we canonicalize harder to funnel shift intrinsics in instcombine). Differential Revision: https://reviews.llvm.org/D55604 llvm-svn: 349396	2018-12-17 21:14:51 +00:00
Sanjay Patel	eb741c29c1	[PhaseOrdering] add test for funnel shift (rotate); NFC As mentioned in D55604, there are 2 bugs here: 1. The new pass manager is speculating wildly by default. 2. The old pass manager is not converting this to funnel shift. llvm-svn: 348980	2018-12-12 22:11:05 +00:00
Sanjay Patel	6f2cf73b37	[PhaseOrdering] remove stale comments; NFC Forgot to update this with rL331937. llvm-svn: 331939	2018-05-09 23:10:46 +00:00
Sanjay Patel	ac3951a735	[AggressiveInstCombine] convert a chain of 'and-shift' bits into masked compare This is a follow-up to D45986. As suggested there, we should match the "all-bits-set" pattern in addition to "any-bits-set". This was a little more complicated than I thought it would be initially because the "and 1" instruction can be anywhere in the chain. Hopefully, the code comments make that logic understandable, but if you see a way to simplify or improve that, it's most appreciated. This transforms patterns that emerge from bitfield tests as seen in PR37098: https://bugs.llvm.org/show_bug.cgi?id=37098 I think it would also help reduce the large test from: D46336 D46595 but we need something to reassociate that case to the forms we're expecting here first. Differential Revision: https://reviews.llvm.org/D46649 llvm-svn: 331937	2018-05-09 23:08:15 +00:00
Sanjay Patel	d2025a2e31	[AggressiveInstCombine] convert a chain of 'or-shift' bits into masked compare and (or (lshr X, C), ...), 1 --> (X & C') != 0 I initially thought about implementing the minimal pattern in instcombine as mentioned here: https://bugs.llvm.org/show_bug.cgi?id=37098#c6 ...but we need to do better to catch the more general sequence from the motivating test (more than 2 bits in the compare). And a test-suite run with statistics showed that this pattern only happened 2 times currently. It would potentially happen more often if reassociation worked better (D45842), but it's probably still not too frequent? This is small enough that I didn't see a need to create a whole new class/file within AggressiveInstCombine. There are likely other relatively small matchers like what was discussed in D44266 that would slide under foldUnusualPatterns() (name suggestions welcome). We could potentially also consolidate matchers for ctpop, bswap, etc under here. Differential Revision: https://reviews.llvm.org/D45986 llvm-svn: 331311	2018-05-01 21:02:09 +00:00
Sanjay Patel	2b36e95d45	[PhaseOrdering] add tests for bittest patterns from bitfields; NFC As mentioned in D45986, there's a potential ordering dependency between instcombine and aggressive-instcombine for detecting these, so I'm adding a few tests to confirm that the expected folds occur using -O3 (because aggressive-instcombine only runs at -O3 currently). llvm-svn: 331308	2018-05-01 20:53:44 +00:00
Elena Demikhovsky	945b7e5aa6	Adding a width of the GEP index to the Data Layout. Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout. p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits. The index size parameter is optional, if not specified, it is equal to the pointer size. Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width. It works fine if you can convert pointer to integer for address calculation and all registered targets do this. But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout. http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account. This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size. Differential Revision: https://reviews.llvm.org/D42123 llvm-svn: 325102	2018-02-14 06:58:08 +00:00
Sanjay Patel	0ab0c1a201	[SimplifyCFG] don't sink common insts too soon (PR34603) This should solve: https://bugs.llvm.org/show_bug.cgi?id=34603 ...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run. It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the sinking transform later in the optimization pipeline. Differential Revision: https://reviews.llvm.org/D38566 llvm-svn: 320749	2017-12-14 22:05:20 +00:00
Sanjay Patel	956dec63fb	[PassManager, SimplifyCFG] add test for PR34603 / D38566; NFC This is a recommit of r316908 which was reverted by r317444. llvm-svn: 318300	2017-11-15 16:37:30 +00:00

1 2

78 Commits