llvm-project

Commit Graph

Author	SHA1	Message	Date
Zequan Wu	cb22ab7403	Add nomerge function attribute to supress tail merge optimization in simplifyCFG We want to add a way to avoid merging identical calls so as to keep the separate debug-information for those calls. There is also an asan usecase where having this attribute would be beneficial to avoid alternative work-arounds. Here is the link to the feature request: https://bugs.llvm.org/show_bug.cgi?id=42783. `nomerge` is different from `noline`. `noinline` prevents function from inlining at callsites, but `nomerge` prevents multiple identical calls from being merged into one. This patch adds `nomerge` to disable the optimization in IR level. A followup patch will be needed to let backend understands `nomerge` and avoid tail merge at backend. Reviewed By: asbirlea, rnk Differential Revision: https://reviews.llvm.org/D78659	2020-05-12 16:49:20 -07:00
Zequan Wu	eccfa35d53	Fix lifetime call in landingpad blocking Simplifycfg pass Fix lifetime call in landingpad blocks simplifycfg from removing the landingpad. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D77188	2020-04-09 13:07:32 -07:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
Matt Arsenault	43d98a0ecf	Allow replacing intrinsic operands with variables Since intrinsics can now specify when an argument is required to be constant, it is now OK to replace arguments with variables if they aren't. This means intrinsics must now be accurately marked with immarg.	2020-03-23 15:51:57 -04:00
Chen Zheng	3f85134d71	[PowerPC] implement target hook isProfitableToHoist On Powerpc fma is faster than fadd + fmul for some types, (PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma pattern by hoisting fmul to predecessor block. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D76207	2020-03-19 00:17:25 -04:00
Chen Zheng	fa72b29bec	[PowerPC] add test cases for target hook isProfitableToHoist - NFC	2020-03-16 23:07:30 -04:00
Sanjay Patel	89b19e8959	[SimplifyCFG] add test for chain of empty block conditional branches; NFC	2020-03-13 14:39:31 -04:00
Sanjay Patel	afc4dcee83	[SimplifyCFG] regenerate complete test checks; NFC	2020-03-13 14:12:28 -04:00
Sanjay Patel	7fe0e70ecc	[SimplifyCFG] regenerate test checks; NFC	2020-03-13 14:12:28 -04:00
Jonas Paulsson	c2dafe12dc	[SimplifyCFG] Skip merging return blocks if it would break a CallBr. SimplifyCFG should not merge empty return blocks and leave a CallBr behind with a duplicated destination since the verifier will then trigger an assert. This patch checks for this case and avoids the transformation. CodeGenPrepare has a similar check which also has a FIXME comment about why this is needed. It seems perhaps better if these two passes would eventually instead update the CallBr instruction instead of just checking and avoiding. This fixes https://bugs.llvm.org/show_bug.cgi?id=45062. Review: Craig Topper Differential Revision: https://reviews.llvm.org/D75620	2020-03-10 14:59:13 +01:00
Nikita Popov	4ef272ec9c	[InstCombine] DCE instructions earlier When InstCombine initially populates the worklist, it already performs constant folding and DCE. However, as the instructions are initially visited in program order, this DCE can pick up only the last instruction of a dead chain, the rest would only get picked up in the main InstCombine run. To avoid this, we instead perform the DCE in separate pass over the collected instructions in reverse order, which will allow us to pick up full dead instruction chains. We already need to do this reverse iteration anyway to populate the worklist, so this shouldn't add extra cost. This by itself only fixes a small part of the problem though: The same basic issue also applies during the main InstCombine loop. We generally always want DCE to occur as early as possible, because it will allow one-use folds to happen. Address this by also performing DCE while adding deferred instructions to the main worklist. This drops the number of tests that perform more than 2 InstCombine iterations from ~80 to ~40. There's some spurious test changes due to operand order / icmp toggling. Differential Revision: https://reviews.llvm.org/D75008	2020-02-27 18:45:59 +01:00
stozer	9bda7ab835	Re-revert: Recover debug intrinsics when killing duplicated/empty blocks This reverts commit `61b35e4111`. This commit causes a timeout in chromium builds; likely to have a similar cause to the previous timeout issue caused by this commit (see `6ded69f294` for more details). It is possible that there is no way to fix this bug that will not cause this issue; further investigations as to the efficiency of handling large amounts of debug info will be necessary.	2020-02-13 11:48:19 +00:00
stozer	61b35e4111	Re-reapply: Recover debug intrinsics when killing duplicated/empty blocks This reverts commit `636c93ed11`. The original patch caused build failures on TSan buildbots. Commit `6ded69f294` fixes this issue by reducing the rate at which empty debug intrinsics propagate, reducing the memory footprint and preventing a fatal spike.	2020-02-12 14:36:30 +00:00
Nikita Popov	ef052a7527	[InstCombine] Update SimplifyCFG test This test also runs -instcombine. Here the operands in an or chain have been reassociated.	2020-01-30 10:11:42 +01:00
Andy Kaylor	c467faf23c	[WinEH] Ignore lifetime.end PHI nodes in empty cleanuppads This fixes a bug where a PHI node that is only referenced by a lifetime.end intrinsic in an otherwise empty cleanuppad can cause SimplyCFG to create an SSA violation while removing the empty cleanuppad. Theoretically the same problem can occur with debug intrinsics. Differential Revision: https://reviews.llvm.org/D72540	2020-01-23 18:18:50 -08:00
Fangrui Song	a36ddf0aa9	Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351	2019-12-24 16:27:51 -08:00
Fangrui Song	eb16435b5e	Migrate function attribute "no-frame-pointer-elim-non-leaf" to "frame-pointer"="non-leaf" as cleanups after D56351	2019-12-24 16:05:15 -08:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Bjorn Pettersson	e5f07080b8	[BasicBlockUtils] Fix dbg.value elimination problem in MergeBlockIntoPredecessor Summary: In commit `d60f34c20a` (llvm-svn 317128, PR35113) MergeBlockIntoPredecessor was changed into discarding some dbg.value intrinsics referring to PHI values, post-splice due to loop rotation. That elimination of dbg.value intrinsics did not consider which dbg.value to keep depending on the context (e.g. if the variable is changing its value several times inside the basic block). In the past that hasn't been such a big problem since CodeGenPrepare::placeDbgValues has moved the dbg.value to be next to the PHI node anyway. But after commit `00e238896c` CodeGenPrepare isn't doing that any longer, so we need to be more careful when avoiding duplicate dbg.value intrinsics in MergeBlockIntoPredecessor. This patch replaces the code that tried to avoid duplicate dbg.values by using the RemoveRedundantDbgInstrs helper. Reviewers: aprantl, jmorse, vsk Reviewed By: aprantl, vsk Subscribers: jholewinski, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71480	2019-12-16 11:41:21 +01:00
Nicola Zaghen	97572775d2	Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-13 14:30:21 +00:00
Nicola Zaghen	f798eb21ec	Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same." This reverts commit `5f6208778f`. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.	2019-12-12 10:29:54 +00:00
Nicola Zaghen	5f6208778f	[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-12 10:07:01 +00:00
Vlad Tsyrklevich	636c93ed11	Revert "Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty..." This reverts commit `f2ba93971c`, it was causing build timeouts on sanitizer-x86_64-linux-autoconf such as http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/44917	2019-12-10 16:03:17 -08:00
stozer	f2ba93971c	Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty... basic blocks Originally applied in `72ce759928`. Fixed a build failure caused by incorrect use of cast instead of dyn_cast. This reverts commit `8b0780f795`.	2019-12-10 13:33:32 +00:00
Tozer	8b0780f795	Revert "[DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks" This reverts commit `72ce759928`. Reverted due to build failure.	2019-12-04 18:47:08 +00:00
stozer	72ce759928	[DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks When basic blocks are killed, either due to being empty or to being an if.then or if.else block whose complement contains identical instructions, some of the debug intrinsics in that block are lost. This patch sinks those intrinsics into the single successor block, setting them Undef if necessary to prevent debug info from falling out-of-date. Differential Revision: https://reviews.llvm.org/D70318	2019-12-04 16:01:49 +00:00
Philip Reames	aaea24802b	Broaden the definition of a "widenable branch" As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality. Broaden the definition in two ways: Allow swapped operands to the br (and X, WC()) form Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC() The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop. Differential Revision: https://reviews.llvm.org/D70502	2019-11-21 10:46:16 -08:00
Sanjay Patel	ebf9bf2cbc	[SimplifyCFG] propagate fast-math-flags (FMF) from phi to select Similar to/extension of D70208 (rGee0882bdf866), but this one may finally allow closing motivating bugs. This is another step towards having FMF apply only to FP values rather than those + fcmp. See PR38086 for one of the original discussions/motivations: https://bugs.llvm.org/show_bug.cgi?id=38086 And the test here is derived from PR39535: https://bugs.llvm.org/show_bug.cgi?id=39535 Currently, we lose FMF when converting any phi to select in SimplifyCFG. There are a small number of similar changes needed to correct within SimplifyCFG, so it should be quick to patch this pass up. FMF was extended to select and phi with: D61917 D67564	2019-11-17 11:23:44 -05:00
Sanjay Patel	23f736059c	[SimplifyCFG] add fast-math-flags to tests for better coverage; NFC The conversion to select fails to propagate FMF.	2019-11-17 10:37:42 -05:00
Sanjay Patel	f5870b0f36	[SimplifyCFG] add tests for possible FP speculative select; NFC It doesn't seem that there are any perf/param knobs that can be turned to create selects for the FP variants of the tests, but that may not always be true in the future. If it changes, we should propagate FMF.	2019-11-17 10:27:47 -05:00
Sanjay Patel	ee0882bdf8	[SimplifyCFG] propagate fast-math-flags (FMF) from phi to select This is another step towards having FMF apply only to FP values rather than those + fcmp. See PR38086 for one of the original discussions/motivations: https://bugs.llvm.org/show_bug.cgi?id=38086 And the test here is derived from PR39535: https://bugs.llvm.org/show_bug.cgi?id=39535 Currently, we lose FMF when converting any phi to select in SimplifyCFG. There are a small number of similar changes needed to correct within SimplifyCFG, so it should be quick to patch this pass up. FMF was extended to select and phi with: D61917 D67564 Differential Revision: https://reviews.llvm.org/D70208	2019-11-15 16:14:35 -05:00
Sanjay Patel	be08af8816	[SimplifyCFG] add test for select with FMF; NFC	2019-11-13 16:45:42 -05:00
Philip Reames	686f449e3d	[WC] Fix a subtle bug in our definition of widenable branch We had a subtle, but nasty bug in our definition of a widenable branch, and thus in the transforms which used that utility. Specifically, we returned true for any branch which included a widenable condition within it's condition, regardless of whether that widenable condition also had other uses. The problem is that the result of the WC() call is defined to be one particular value. As such, all users must agree as to what that value is. If we widen a branch without also updating all other users of the WC in the same way, we have broken the required semantics. Most of the textual diff is updating existing transforms not to leave dead uses hanging around. They're largely NFC as the dead instructions would be immediately deleted by other passes. The reason to make these changes is so that the transforms preserve the widenable branch form. In practice, we don't get bitten by this only because it isn't profitable to CSE WC() calls and the lowering pass from guards uses distinct WC calls per branch. Differential Revision: https://reviews.llvm.org/D69916	2019-11-06 14:16:34 -08:00
Philip Reames	6ff439b57f	[SimplifyCFG] Use a (trivially) dominanting widenable branch to remove later slow path blocks This transformation is a variation on the GuardWidening transformation we have checked in as it's own pass. Instead of focusing on merge (i.e. hoisting and simplifying) two widenable branches, this transform makes the observation that simply removing a second slowpath block (by reusing an existing one) is often a very useful canonicalization. This may lead to later merging, or may not. This is a useful generalization when the intermediate block has loads whose dereferenceability is hard to establish. As noted in the patch, this can be generalized further, and will be. Differential Revision: https://reviews.llvm.org/D69689	2019-11-04 11:03:28 -08:00
Roman Lebedev	dd0170ab24	[SimplifyCFG] mergeConditionalStoreToAddress(): consider cost, not instruction count Summary: As it can be see in the changed test, while `div` is really costly, we were speculating it. This does not seem correct. Also, the old code would run for every single insturuction in BB, instead of eagerly bailing out as soon as there are too many instructions. This function still has a problem that `PHINodeFoldingThreshold` is per-basic-block, while it should be for all the basic blocks. Reviewers: efriedma, craig.topper, dmgreen, jmolloy Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67315 llvm-svn: 372255	2019-09-18 19:46:57 +00:00
Roman Lebedev	10151f6618	[SimplifyCFG] FoldTwoEntryPHINode(): consider total speculation cost, not per-BB cost Summary: Previously, if the threshold was 2, we were willing to speculatively execute 2 cheap instructions in both basic blocks (thus we were willing to speculatively execute cost = 4), but weren't willing to speculate when one BB had 3 instructions and other one had no instructions, even thought that would have total cost of 3. This looks inconsistent to me. I don't think `cmov`-like instructions will start executing until both of it's inputs are available: https://godbolt.org/z/zgHePf So i don't see why the existing behavior is the correct one. Also, let's add it's own `cl::opt` for this threshold, with default=4, so it is not stricter than the previous threshold: will allow to fold when there are 2 BB's each with cost=2. And since the logic has changed, it will also allow to fold when one BB has cost=3 and other cost=1, or there is only one BB with cost=4. This is an alternative solution to D65148: This fix is mainly motivated by `signbit-like-value-extension.ll` test. That pattern comes up in JPEG decoding, see e.g. `Figure F.12 – Extending the sign bit of a decoded value in V` of `ITU T.81` (JPEG specification). That branch is not predictable, and it is within the innermost loop, so the fact that that pattern ends up being stuck with a branch instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial. This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240) \| metric \| old \| new \| delta \| % \| \| x86-mi-counting.NumMachineFunctions \| 37720 \| 37721 \| 1 \| 0.00% \| \| x86-mi-counting.NumMachineBasicBlocks \| 773545 \| 771181 \| -2364 \| -0.31% \| \| x86-mi-counting.NumMachineInstructions \| 7488843 \| 7486442 \| -2401 \| -0.03% \| \| x86-mi-counting.NumUncondBR \| 135770 \| 135543 \| -227 \| -0.17% \| \| x86-mi-counting.NumCondBR \| 423753 \| 422187 \| -1566 \| -0.37% \| \| x86-mi-counting.NumCMOV \| 24815 \| 25731 \| 916 \| 3.69% \| \| x86-mi-counting.NumVecBlend \| 17 \| 17 \| 0 \| 0.00% \| We significantly decrease basic block count, notably decrease instruction count, significantly decrease branch count and very significantly increase `cmov` count. Performance-wise, unsurprisingly, this has great effect on target RawSpeed benchmark. I'm seeing 5 major improvements: ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681 ``` That being said, it's always best-effort, so there will likely be cases where this worsens things. Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67318 llvm-svn: 372009	2019-09-16 16:18:24 +00:00
Roman Lebedev	4e76f88072	[SimplifyCFG][NFC] Autogenerate PhiEliminate3.ll llvm-svn: 371311	2019-09-07 13:53:14 +00:00
Roman Lebedev	88bab08a88	[SimplifyCFG][NFC] Autogenerate two tests llvm-svn: 371310	2019-09-07 13:35:54 +00:00
Roman Lebedev	395f254bf0	[SimplifyCFG][NFC] Make merge-cond-stores-cost.ll X86-specific, and rewrite it We clearly perform store-merging, even though div is really costly. llvm-svn: 371300	2019-09-07 10:55:04 +00:00
Roman Lebedev	0ff6d7f305	[SimplifyCFG][NFC] Show that we don't consider the cost when merging cond stores We count instruction count in each BB's separately, not their cost. llvm-svn: 371297	2019-09-07 09:25:26 +00:00
Roman Lebedev	8d3e4d3a4d	[SimplifyCFG][NFC] Regenerate merge-cond-stores* tests llvm-svn: 371296	2019-09-07 09:25:18 +00:00
Vitaly Buka	9020f11377	[SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase Summary: Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain which can check on undef. Msan by design reports branches on uninitialized memory and undefs, so we have false report here. In general msan does not like when we convert ``` // If at least one of them is true we can MSAN is ok if another is undefs if (a \|\| b) return; ``` into ``` // If 'a' is undef MSAN will complain even if 'b' is true if (a) return; if (b) return; ``` Example Before optimization we had something like this: ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue break; } // we know that c == 10 \|\| c == 13 if we get here, // so msan know that branch is not affected by maybe_undef if (maybe_undef \|\| c == 10 \|\| c == 13) continue; return; } ``` SimplifyBranchOnICmpChain will convert that into ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue; break; } // however msan will complain here: if (maybe_undef) continue; // we know that c == 10 \|\| c == 13, so either way we will get continue switch(c) { case 10: continue; case 13: continue; } return; } ``` Reviewers: eugenis, efriedma Reviewed By: eugenis, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67205 llvm-svn: 371138	2019-09-05 22:49:34 +00:00
Michael Liao	001871dee8	[SimplifyCFG] Skip sinking common lifetime markers of `alloca`. Summary: - Similar to the workaround in fix of PR30188, skip sinking common lifetime markers of `alloca`. They are mostly left there after inlining functions in branches. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66950 llvm-svn: 370376	2019-08-29 16:12:05 +00:00
Roman Lebedev	05ef49515e	[NFC][SimplifyCFG] 'Safely extract low bits' pattern will also benefit from -phi-node-folding-threshold=3 This is the naive implementation of x86 BZHI/BEXTR instruction: it takes input and bit count, and extracts low nbits up to bit width. I.e. unlike shift it does not have any UB when nbits >= bitwidth. Which means we don't need a while PHI here, simple select will do. And if it's a select, it should then be trivial to fix codegen to select it to BEXTR/BZHI. See https://bugs.llvm.org/show_bug.cgi?id=34704 llvm-svn: 370369	2019-08-29 14:46:49 +00:00
Roman Lebedev	9f35d2b564	[SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values Summary: As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow' and got rid of potential for division-by-zero, the control flow remains, we still have that branch. We have this condition: ``` // Don't fold i1 branches on PHIs which contain binary operators // These can often be turned into switches and other things. if (PN->getType()->isIntegerTy(1) && (isa<BinaryOperator>(PN->getIncomingValue(0)) \|\| isa<BinaryOperator>(PN->getIncomingValue(1)) \|\| isa<BinaryOperator>(IfCond))) return false; ``` which was added back in rL121764 to help with `select` formation i think? That check prevents us to flatten the CFG here, even though we know we no longer need that guard and will be able to drop everything but the '@llvm.umul.with.overflow' + `not`. As it can be seen from tests, we end here because the `not` is being sinked into the PHI's incoming values by InstCombine, so we can't workaround this by hoisting it to after PHI. Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`. Reviewers: craig.topper, spatel, fhahn, nikic Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65147 llvm-svn: 370349	2019-08-29 12:47:34 +00:00
Roman Lebedev	4894eeecc9	[SimplifyCFG] Add "safe abs" test from CMSIS DSP 'abs_with_clamp()' With -phi-node-folding-threshold=3 this branch would get flattened into select. See https://reviews.llvm.org/D65148#1629010 llvm-svn: 368847	2019-08-14 13:10:59 +00:00
Alina Sbirlea	3af2a69575	[SimplifyCFG] Mark missed Changed to true. Summary: DominatorTree is invalid after SimplifyCFG because of a missed `Changed = true` when simplifying a branch condition and removing an edge. Resolves PR42272. Reviewers: zhizhouy, manojgupta Subscribers: jlebar, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65490 llvm-svn: 367596	2019-08-01 18:37:34 +00:00
Sanjay Patel	b456310902	[SimplifyCFG] avoid crashing after simplifying a switch (PR42737) Later code in TryToSimplifyUncondBranchFromEmptyBlock() assumes that we have cleaned up unreachable blocks, but that was not happening with this switch transform. llvm-svn: 367037	2019-07-25 17:01:12 +00:00
Roman Lebedev	87fdcb8749	[NFC][PhaseOredering][SimplifyCFG] Add more runlines to umul.with.overflow tests This way it will be more obvious that the problem is both in cost threshold and in hardcoded benefit check, plus will show how the instsimplify cleans this all in the end. llvm-svn: 366800	2019-07-23 12:42:41 +00:00
Roman Lebedev	1693b80bd5	[SimplifyCFG][NFC] Test that we fail to flatten CFG in JPEG "sign" value extend pattern This comes up in JPEG decoding, see e.g. Figure F.12 – Extending the sign bit of a decoded value in V of ITU T.81 (JPEG specification). llvm-svn: 366750	2019-07-22 22:09:02 +00:00

1 2 3 4 5 ...

670 Commits