llvm-project

Commit Graph

Author	SHA1	Message	Date
Max Kazantsev	c9dca6df78	[NFC] Add some tests on mustexec llvm-svn: 339219	2018-08-08 04:40:47 +00:00
Sanjay Patel	979423c996	[InstCombine] add tests for fneg fold including FMF; NFC llvm-svn: 339203	2018-08-07 23:24:25 +00:00
Sanjay Patel	bac052ef52	[InstCombine] fix FP constant in test; NFC Too many digits... llvm-svn: 339200	2018-08-07 23:03:29 +00:00
Michael Berg	2e60ad2e58	[NFC] adding tests for Y - (X + Y) --> -X llvm-svn: 339197	2018-08-07 22:52:57 +00:00
Sanjay Patel	25887da162	[InstCombine] add tests for fneg of fmul/fdiv with constant; NFC llvm-svn: 339195	2018-08-07 22:30:43 +00:00
Sanjay Patel	9b07347033	[InstSimplify] fold fsub+fadd with common operand llvm-svn: 339176	2018-08-07 20:32:55 +00:00
Sanjay Patel	4364d604c2	[InstSimplify] fold fadd+fsub with common operand llvm-svn: 339174	2018-08-07 20:23:49 +00:00
Sanjay Patel	f7a8fb2dee	[InstSimplify] fold fsub+fsub with common operand llvm-svn: 339171	2018-08-07 20:14:27 +00:00
Sanjay Patel	50976393ed	[InstSimplify] add tests for fadd/fsub; NFC Instcombine gets some, but not all, of these cases via it's internal reassociation transforms. It fails in all cases with vector types. llvm-svn: 339168	2018-08-07 19:49:13 +00:00
Alexey Bataev	0edcd0278d	[SLP] Fix insert point for reused extract instructions. Summary: Reworked the previously committed patch to insert shuffles for reused extract element instructions in the correct position. Previous logic was incorrect, and might lead to the crash with PHIs and EH instructions. Reviewers: efriedma, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50143 llvm-svn: 339166	2018-08-07 19:21:05 +00:00
Wei Mi	b1ef2cc53d	[SampleFDO] Fix a bug in getOrCompHotCountThreshold/getOrCompColdCountThreshold getOrCompHotCountThreshold/getOrCompColdCountThreshold introduced in https://reviews.llvm.org/D45377 contain a bad mistake and will only return 1 or 0 instead of the true hot/cold cutoff value. The patch fixes the mistake. But the mistake seems not causing big performance difference according to internal server benchmarks testing. Differential Revision: https://reviews.llvm.org/D50370 llvm-svn: 339162	2018-08-07 18:13:10 +00:00
Philip Reames	c792e197b4	[LICM] Strengthen assume hoisting tests [NFC] As requested in review of https://reviews.llvm.org/D50364 llvm-svn: 339159	2018-08-07 17:54:36 +00:00
Florian Hahn	950576bdf8	[GVN,NewGVN] Keep nonnull if K does not move. In combineMetadata, we should be able to preserve K's nonnull metadata, if K does not move. This condition should hold for all replacements by NewGVN/GVN, but I added a bunch of assertions to verify that. Fixes PR35038. There probably are additional kinds of metadata that could be preserved using similar reasoning. This is follow-up work. Reviewers: dberlin, davide, efriedma, nlopes Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D47339 llvm-svn: 339149	2018-08-07 15:36:11 +00:00
Sanjay Patel	948ff87d7d	[InstSimplify] move minnum/maxnum with common op fold from instcombine llvm-svn: 339144	2018-08-07 14:36:27 +00:00
Sanjay Patel	b06d283909	[InstSimplify] add tests for minnum/maxnum with shared op; NFC llvm-svn: 339142	2018-08-07 14:13:40 +00:00
Sanjay Patel	b802d18df7	[InstSimplify] move misplaced minnum/maxnum tests; NFC llvm-svn: 339141	2018-08-07 14:12:08 +00:00
Philip Reames	94b29601ef	[LICM] Further strengthen tests for hoisting guards and invariant.starts [NFC] llvm-svn: 339062	2018-08-06 21:39:43 +00:00
Philip Reames	9d7bb2f700	[LICM] Strengthen invariant.start hoisting tests [NFC] llvm-svn: 339057	2018-08-06 21:18:34 +00:00
Philip Reames	81c7dc93d2	[LICM] Add tests highlighting missing hoists for intrinsics [NFC] llvm-svn: 339054	2018-08-06 21:06:15 +00:00
Evandro Menezes	6e137cb9f0	[SLC] Fix shrinking of pow() Properly shrink `pow()` to `powf()` as a binary function and, when no other simplification applies, do not discard it. Differential revision: https://reviews.llvm.org/D50113 llvm-svn: 339046	2018-08-06 19:40:17 +00:00
Matt Arsenault	56b31d8d75	ValueTracking: Handle canonicalize in CannotBeNegativeZero Also fix apparently missing test coverage for any of the handling here. llvm-svn: 339023	2018-08-06 15:16:26 +00:00
Max Kazantsev	2dbbd64cb7	Re-enable "[ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND" The patch was reverted because of bug detected by sanitizer. The bug is fixed, respective tests added. Differential Revision: https://reviews.llvm.org/D50172 llvm-svn: 339005	2018-08-06 11:14:18 +00:00
Max Kazantsev	3271f379a9	Revert rL338990 to see if it causes sanitizer failures Multiple failues reported by sanitizer-x86_64-linux, seem to be caused by this patch. Reverting to see if they sustain without it. Differential Revision: https://reviews.llvm.org/D50172 llvm-svn: 338994	2018-08-06 08:10:28 +00:00
Max Kazantsev	34b0666be9	[ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND `isKnownNonNullFromDominatingCondition` is able to prove non-null basing on `br` or `guard` by `%p != null` condition, but is unable to do so basing on `(%p != null) && %other_cond`. This patch allows it to do so. Differential Revision: https://reviews.llvm.org/D50172 Reviewed By: reames llvm-svn: 338990	2018-08-06 06:11:36 +00:00
Max Kazantsev	eded4abef8	[GuardWidening] Widen guards with conditions of frequently taken dominated branches If there is a frequently taken branch dominated by a guard, and its condition is available at the point of the guard, we can widen guard with condition of this branch and convert the branch into unconditional: guard(cond1) if (cond2) { // taken in 99.9% cases // do something } else { // do something else } Converts to guard(cond1 && cond2) // do something Differential Revision: https://reviews.llvm.org/D49974 Reviewed By: reames llvm-svn: 338988	2018-08-06 05:49:19 +00:00
David Bolvansky	c0aa4b75a4	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338969	2018-08-05 14:53:08 +00:00
Roman Lebedev	365fa96055	[NFC][InstCombine] Add tests for sinking 'not' into 'xor' (PR38446) https://rise4fun.com/Alive/IT3 Comes up in the [most ugliest] signed int -> signed char case of -fsanitize=implicit-conversion (https://reviews.llvm.org/D50250) Not sure if we want to do it always, or only when it is free to invert. llvm-svn: 338967	2018-08-05 10:15:04 +00:00
Roman Lebedev	656a478e98	[NFC][InstCombine] Regenerate set.ll test llvm-svn: 338965	2018-08-05 08:53:40 +00:00
David Bolvansky	b82a5ec1b6	[InstCombine] [NFC] Tests for strcmp to memcmp transformation llvm-svn: 338963	2018-08-05 05:46:56 +00:00
Chijun Sima	8b5de48d62	[TailCallElim] Preserve DT and PDT Summary: Previously, in the NewPM pipeline, TailCallElim recalculates the DomTree when it modifies any instruction in the Function. For example, ``` CallInst *CI = dyn_cast<CallInst>(&I); ... CI->setTailCall(); Modified = true; ... if (!Modified \|\| ...) return PreservedAnalyses::all(); ``` After applying this patch, the DomTree only recalculates if needed (plus an extra insertEdge() + an extra deleteEdge() call). When optimizing SQLite with `-passes="default<O3>"` pipeline of the newPM, the number of DomTree recalculation decreases by 6.2%, the number of nodes visited by DFS decreases by 2.9%. The time used by DomTree will decrease approximately 1%~2.5% after applying the patch. Statistics: ``` Before the patch: 23010 dom-tree-stats - Number of DomTree recalculations 489264 dom-tree-stats - Number of nodes visited by DFS -- DomTree After the patch: 21581 dom-tree-stats - Number of DomTree recalculations 475088 dom-tree-stats - Number of nodes visited by DFS -- DomTree ``` Reviewers: kuhar, dmgreen, brzycki, grosser, davide Reviewed By: kuhar, brzycki Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49982 llvm-svn: 338954	2018-08-04 08:13:47 +00:00
Chijun Sima	eacad79777	[ADCE] Remove the need of DomTree Summary: ADCE doesn't need to query domtree. Reviewers: kuhar, brzycki, dmgreen, davide, grosser Reviewed By: kuhar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49988 llvm-svn: 338950	2018-08-04 02:50:12 +00:00
Anastasis Grammenos	4dfe279e00	[TRE][DebugInfo] Preserve Debug Location in new branch instruction There are two branch instructions created so the new test covers them both. Differential Revision: https://reviews.llvm.org/D50263 llvm-svn: 338917	2018-08-03 20:27:13 +00:00
Hiroshi Inoue	73f8b255b6	[InstSimplify] fold extracting from std::pair (2/2) This is the second patch of the series which intends to enable jump threading for an inlined method whose return type is std::pair<int, bool> or std::pair<bool, int>. The first patch is https://reviews.llvm.org/rL338485. This patch handles code sequences that merges two values using `shl` and `or`, then extracts one value using `and`. Differential Revision: https://reviews.llvm.org/D49981 llvm-svn: 338817	2018-08-03 05:39:48 +00:00
Philip Reames	5937368d4f	[LICM] Remove unneccessary safety check to increase sinking effectiveness This one requires a bit of explaination. It's not every day you simply delete code to implement an optimization. :) The transform in question is sinking an instruction from a loop to the uses in loop exiting blocks. We know (from LCSSA) that all of the uses outside the loop must be phi nodes, and after predecessor splitting, we know all phi users must have a single operand. Since the use must be strictly dominated by the def, we know from the definition of dominance/ssa that the exit block must execute along a (non-strict) subset of paths which reach the def. As a result, duplicating a potentially faulting instruction can not introduce a fault that didn't previously exist in the program. The full story is that this patch builds on "rL338671: [LICM] Factor out fault legality from canHoistOrSinkInst [NFC]" which pulled this logic out of a common helper routine. As best I can tell, this check was originally added to the helper function for hoisting legality, later an incorrect fastpath for loads/calls was added, and then the bug was fixed by duplicating the fault safety check in the hoist path. This left the redundant check in the common code to pessimize sinking for no reason. I split it out in an NFC, and am not removing the unneccessary check. I wanted there to be something easy to revert in case I missed something. Reviewed by: Anna Thomas (in person) llvm-svn: 338794	2018-08-03 00:21:56 +00:00
Eli Friedman	1ba5e9ac24	[GlobalMerge] Allow merging globals with explicit section markings. At least on ELF, it's impossible to tell from the object file whether two globals with the same section marking were merged: the merged global uses "private" linkage to hide its symbol, and the aliases look like regular symbols. I can't think of any other reason to disallow it. (Of course, we can only merge globals in the same section.) The weird alignment handling matches AsmPrinter; our alignment handling for global variables should probably be refactored. Differential Revision: https://reviews.llvm.org/D49822 llvm-svn: 338791	2018-08-02 23:54:16 +00:00
David Bolvansky	67647bcfbe	[InstCombine] [NFC] Tests for select with binop fold llvm-svn: 338727	2018-08-02 14:59:23 +00:00
Sanjay Patel	3f6e9a71f7	[InstSimplify] move minnum/maxnum with undef fold from instcombine llvm-svn: 338719	2018-08-02 14:33:40 +00:00
Sanjay Patel	f9a0d593e9	[ValueTracking] fix maxnum miscompile for cannotBeOrderedLessThanZero (PR37776) This adds the NAN checks suggested in PR37776: https://bugs.llvm.org/show_bug.cgi?id=37776 If both operands to maxnum are NAN, that should get constant folded, so we don't have to handle that case. This is the same assumption as other FP ops in this function. Returning 'false' is always conservatively correct. Copying from the bug report: Currently, we have this for "when is cannotBeOrderedLessThanZero (mustBePositiveOrNaN) true for maxnum": L ------------------- \| Pos \| Neg \| NaN \| ------------------------ \|Pos \| x \| x \| x \| ------------------------ R \|Neg \| x \| \| x \| ------------------------ \|NaN \| x \| x \| x \| ------------------------ The cases with (Neg & NaN) are wrong. We should have: L ------------------- \| Pos \| Neg \| NaN \| ------------------------ \|Pos \| x \| x \| x \| ------------------------ R \|Neg \| x \| \| \| ------------------------ \|NaN \| x \| \| x \| ------------------------ Differential Revision: https://reviews.llvm.org/D50081 llvm-svn: 338716	2018-08-02 13:46:20 +00:00
Philip Reames	24b13cb06d	[LICM] Expand tests to highlight an oddity in sinking implementation llvm-svn: 338670	2018-08-02 03:54:29 +00:00
Sanjay Patel	28c7e41c09	[InstSimplify] move minnum/maxnum with same arg fold from instcombine llvm-svn: 338652	2018-08-01 23:05:55 +00:00
David Bolvansky	fbbb83c782	Revert "Enrich inline messages", tests fail llvm-svn: 338496	2018-08-01 08:02:40 +00:00
David Bolvansky	7f36cd9d96	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338494	2018-08-01 07:37:16 +00:00
Hiroshi Inoue	02f79eae06	[InstSimplify] fold extracting from std::pair (1/2) This patch intends to enable jump threading when a method whose return type is std::pair<int, bool> or std::pair<bool, int> is inlined. For example, jump threading does not happen for the if statement in func. std::pair<int, bool> callee(int v) { int a = dummy(v); if (a) return std::make_pair(dummy(v), true); else return std::make_pair(v, v < 0); } int func(int v) { std::pair<int, bool> rc = callee(v); if (rc.second) { // do something } SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA. After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc). This series of patch add patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs. These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading. SimplifyDemandedBits in InstCombine, can do more general optimization but this patch aims to provide opportunities for other optimizers by supporting a simple but common case in InstSimplify. This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr. Differential Revision: https://reviews.llvm.org/D48828 llvm-svn: 338485	2018-08-01 04:40:32 +00:00
Evandro Menezes	eb159e56a6	[PATCH] [SLC] Test simplification of pow() for vector types (NFC) Add test case for the simplification of `pow()` for vector types that D50035 enables. llvm-svn: 338463	2018-08-01 00:30:43 +00:00
Ewan Crawford	d83beb804c	Fix InstCombine address space assert Workaround bug where the InstCombine pass was asserting on the IR added in lit test, where we have a bitcast instruction after a GEP from an addrspace cast. The second bitcast in the test was getting combined into `bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)`, which looks like it should be an addrspace cast instruction instead. Otherwise if control flow is allowed to continue as it is now we create a GEP instruction `<badref> = getelementptr inbounds <16 x i32>, <16 x i32> %0, i32 0`. However because the type of this instruction doesn't match the address space we hit an assert when replacing the bitcast with that GEP. ``` void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed. ``` Differential Revision: https://reviews.llvm.org/D50058 llvm-svn: 338395	2018-07-31 15:53:03 +00:00
Sanjay Patel	a35781fdf9	[InstCombine] regenerate checks and add tests for D50035; NFC llvm-svn: 338392	2018-07-31 15:07:32 +00:00
Anastasis Grammenos	ac3f8028da	[DebugInfo][LCSSA] Preserve debug location in lcssa phis Summary: When inserting lcssa Phi Nodes in the exit block mak sure to preserve the original instructions DL. Reviewers: vsk Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D50009 llvm-svn: 338391	2018-07-31 14:54:52 +00:00
David Bolvansky	ab79414f7b	Revert Enrich inline messages llvm-svn: 338389	2018-07-31 14:47:22 +00:00
Sanjay Patel	57d617d676	[InstCombine] auto-generate checks; NFC llvm-svn: 338388	2018-07-31 14:27:30 +00:00
David Bolvansky	b562dbabda	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338387	2018-07-31 14:25:24 +00:00
John Brawn	cd5f37f3f1	[MemDep] Use PhiValuesAnalysis to improve alias analysis results This is being done in order to make GVN able to better optimize certain inputs. MemDep doesn't use PhiValues directly, but does need to notifiy it when things get invalidated. Differential Revision: https://reviews.llvm.org/D48489 llvm-svn: 338384	2018-07-31 14:19:29 +00:00
David Bolvansky	16d8a69b90	[InstSimplify] Fold another Select with And/Or pattern Summary: Proof: https://rise4fun.com/Alive/L5J Reviewers: lebedev.ri, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49975 llvm-svn: 338383	2018-07-31 14:17:15 +00:00
Alexey Bataev	c0c3a6ed5e	[SLP] Fix PR38339: Instruction does not dominate all uses! Summary: If the ExtractElement instructions can be optimized out during the vectorization and we need to reshuffle the parent vector, this ShuffleInstruction may be inserted in the wrong place causing compiler to produce incorrect code. Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49928 llvm-svn: 338380	2018-07-31 14:02:43 +00:00
Sanjay Patel	9a801cb598	[InstCombine] simplify code for A & (A ^ B) --> A & ~B This fold was written in an odd way and tried to avoid an endless loop by bailing out on all constants instead of the supposedly problematic case of -1. But (X & -1) should always be simplified before we reach here, so I'm not sure how that is a problem. There were no tests for the commuted patterns, so I added those at rL338364. llvm-svn: 338367	2018-07-31 13:00:03 +00:00
Sanjay Patel	995138ce60	[InstCombine] move/add tests for xor+add fold; NFC llvm-svn: 338364	2018-07-31 12:31:00 +00:00
Hiroshi Inoue	2f6769be60	[InstSimplify] tests for D48828, D49981: fold extraction from std::pair Minor touch up in the previous comment. llvm-svn: 338351	2018-07-31 05:29:20 +00:00
Hiroshi Inoue	5427d3efc2	[InstSimplify] tests for D48828, D49981: fold extraction from std::pair Updated unit tests for D48828 and D49981. llvm-svn: 338350	2018-07-31 05:10:36 +00:00
David Bolvansky	6737b3a6a1	[InstCombine] Fold Select with binary op Summary: Fold %A = icmp eq i8 %x, 0 %B = xor i8 %x, %z %C = select i1 %A, i8 %B, i8 %y To %C = select i1 %A, i8 %z, i8 %y Fixes https://bugs.llvm.org/show_bug.cgi?id=38345 Proof: https://rise4fun.com/Alive/43J Reviewers: lebedev.ri, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49954 llvm-svn: 338300	2018-07-30 20:38:53 +00:00
Manoj Gupta	9d83ce9043	[Inline] Copy "null-pointer-is-valid" attribute in caller. Summary: Normally, inling does not happen if caller does not have "null-pointer-is-valid"="true" attibute but callee has it. However, alwaysinline may force callee to be inlined. In this case, if the caller has the "null-pointer-is-valid"="true" attribute, copy the attribute to caller. Reviewers: efriedma, a.elovikov, lebedev.ri, jyknight Reviewed By: efriedma Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D50000 llvm-svn: 338292	2018-07-30 19:33:53 +00:00
David Bolvansky	76e95865f3	[InstSimplify] [NFC] Tests for Select with AND/OR fold llvm-svn: 338285	2018-07-30 18:22:18 +00:00
Evandro Menezes	a7d48286fb	[SLC] Refactor the simplication of pow() (NFC) Use more meaningful variable names. Mostly NFC. llvm-svn: 338266	2018-07-30 16:20:04 +00:00
David Bolvansky	403826ee0f	[InstCombine] [NFC] Added tests for Select with binop fold llvm-svn: 338257	2018-07-30 15:38:42 +00:00
John Brawn	cd73fe8989	[BasicAA] Use PhiValuesAnalysis if available when handling phi alias By using PhiValuesAnalysis we can get all the values reachable from a phi, so we can be more precise instead of giving up when a phi has phi operands. We can't make BaseicAA directly use PhiValuesAnalysis though, as the user of BasicAA may modify the function in ways that PhiValuesAnalysis can't cope with. For this optional usage to work correctly BasicAAWrapperPass now needs to be not marked as CFG-only (i.e. it is now invalidated even when CFG is preserved) due to how the legacy pass manager handles dependent passes being invalidated, namely the depending pass still has a pointer to the now-dead dependent pass. Differential Revision: https://reviews.llvm.org/D44564 llvm-svn: 338242	2018-07-30 11:52:08 +00:00
Sanjay Patel	577c705752	[InstCombine] try to fold 'add+sub' to 'not+add' These are reassociated versions of the same pattern and similar transforms as in rL338200 and rL338118. The motivation is identical to those commits: Patterns with add/sub combos can be improved using 'not' ops. This is better for analysis and may lead to follow-on transforms because 'xor' and 'add' are commutative/associative. It can also help codegen. llvm-svn: 338221	2018-07-29 18:13:16 +00:00
Sanjay Patel	2daf28f9ce	[InstCombine] add tests for another sub-not variant; NFC llvm-svn: 338220	2018-07-29 18:07:28 +00:00
Sanjay Patel	54421ce918	[InstSimplify] fold funnel shifts with 0-shift amount llvm-svn: 338218	2018-07-29 16:36:38 +00:00
Sanjay Patel	46af5835af	[InstSimplify] add tests for funnel shift intrinsics; NFC llvm-svn: 338217	2018-07-29 16:27:17 +00:00
David Bolvansky	f801a58c0d	[InstCombine] Tests for fold Select with binary op Differential Revision: https://reviews.llvm.org/D49961 llvm-svn: 338201	2018-07-28 17:13:33 +00:00
Sanjay Patel	818b253d3a	[InstCombine] try to fold 'sub' to 'not' https://rise4fun.com/Alive/jDd Patterns with add/sub combos can be improved using 'not' ops. This is better for analysis and may lead to follow-on transforms because 'xor' and 'add' are commutative/associative. It can also help codegen. llvm-svn: 338200	2018-07-28 16:48:44 +00:00
David Bolvansky	9800b710c2	[InstSimplify] Moved Select + AND/OR tests from InstCombine Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49957 llvm-svn: 338195	2018-07-28 13:52:45 +00:00
David Green	fc4b0fe0a2	[GlobalOpt] Test array indices inside structs for out-of-bounds accesses We now, from clang, can turn arrays of static short g_data[] = {16, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0}; into structs of the form @g_data = internal global <{ [8 x i16], [8 x i16] }> ... GlobalOpt will incorrectly SROA it, not realising that the access to the first element may overflow into the second. This fixes it by checking geps more thoroughly. I believe this makes the globalsra-partial.ll test case invalid as the %i value could be out of bounds. I've re-purposed it as a negative test for this case. Differential Revision: https://reviews.llvm.org/D49816 llvm-svn: 338192	2018-07-28 08:20:10 +00:00
David Bolvansky	f947608ddf	[InstCombine] Fold Select with AND/OR condition Summary: Fold ``` %A = icmp ne i8 %X, %V1 %B = icmp ne i8 %X, %V2 %C = or i1 %A, %B %D = select i1 %C, i8 %X, i8 %V1 ret i8 %D => ret i8 %X Fixes https://bugs.llvm.org/show_bug.cgi?id=38334 Proof: https://rise4fun.com/Alive/plI8 Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D49919 llvm-svn: 338191	2018-07-28 06:55:51 +00:00
David Bolvansky	173484d78c	[InstCombine] [NFC] [Tests] Fold Select with AND/OR condition - fixed Differential Revision: https://reviews.llvm.org/D49933 llvm-svn: 338161	2018-07-27 20:29:32 +00:00
David Bolvansky	1b82617473	[InstCombine] [NFC] [Tests] Fold Select with AND/OR condition Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49932 llvm-svn: 338159	2018-07-27 20:18:12 +00:00
Evandro Menezes	f611ce82c0	[SLC] Test simplification of pow(x, 0.333...) to cbrt(x) (NFC) Add test case for simplifying `pow(x, 0.333...)` into `cbrt(x)`, which D49040 enables. llvm-svn: 338152	2018-07-27 18:56:47 +00:00
Sanjay Patel	78e4b4d3c4	[InstCombine] not(sub X, Y) --> add (not X), Y The tests with constants show a missing optimization. Analysis for adds is better than subs, so this can also help with other transforms. And codegen is better with adds for targets like x86 (destructive ops, no sub-from). https://rise4fun.com/Alive/llK llvm-svn: 338118	2018-07-27 10:54:48 +00:00
Sanjay Patel	eee52b5090	[InstCombine] add tests for not+sub; NFC llvm-svn: 338117	2018-07-27 10:45:04 +00:00
Max Kazantsev	4d980515d2	[SimplifyIndVar] Canonicalize comparisons to unsigned while eliminating truncs This is a follow-up for the patch rL335020. When we replace compares against trunc with compares against wide IV, we can also replace signed predicates with unsigned where it is legal. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D48763 llvm-svn: 338115	2018-07-27 09:43:39 +00:00
Anastasis Grammenos	f6e143e67f	Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction" This reverts commit r338106. llvm-svn: 338109	2018-07-27 08:22:54 +00:00
Hiroshi Inoue	eeab694cea	[InstSimplify] tests for D48828: fold extraction from std::pair This commit includes unit tests for D48828, which enhances InstSimplify to enable jump threading with a method whose return type is std::pair<int, bool> or std::pair<bool, int>. I am going to commit the actual transformation later. llvm-svn: 338107	2018-07-27 07:21:02 +00:00
Anastasis Grammenos	03948d0e0f	[LV][DebugInfo] Set DL to the middle block Icmp instruction Reviewers: hsaito Differential Revision: https://reviews.llvm.org/D49746 llvm-svn: 338106	2018-07-27 07:12:44 +00:00
Chen Zheng	567485a72f	[InstCombine] canonicalize abs pattern Differential Revision: https://reviews.llvm.org/D48754 llvm-svn: 338092	2018-07-27 01:49:51 +00:00
Reid Kleckner	c6cf918f44	[InstrProf] Use comdats on COFF for available_externally functions Summary: r262157 added ELF-specific logic to put a comdat on the __profc_* globals created for available_externally functions. We should be able to generalize that logic to all object file formats that support comdats, i.e. everything other than MachO. This fixes duplicate symbol errors, since on COFF, linkonce_odr doesn't make the symbol weak. Fixes PR38251. Reviewers: davidxl, xur Subscribers: hiraditya, dmajor, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D49882 llvm-svn: 338082	2018-07-26 22:59:17 +00:00
Vedant Kumar	b572f64212	[DebugInfo] LowerDbgDeclare: Add derefs when handling CallInst users LowerDbgDeclare inserts a dbg.value before each use of an address described by a dbg.declare. When inserting a dbg.value before a CallInst use, however, it fails to append DW_OP_deref to the DIExpression. The DW_OP_deref is needed to reflect the fact that a dbg.value describes a source variable directly (as opposed to a dbg.declare, which relies on pointer indirection). This patch adds in the DW_OP_deref where needed. This results in the correct values being shown during a debug session for a program compiled with ASan and optimizations (see https://reviews.llvm.org/D49520). Note that ConvertDebugDeclareToDebugValue is already correct -- no changes there were needed. One complication is that SelectionDAG is unable to distinguish between direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also fixes this long-standing issue in order to not regress integration tests relying on the incorrect assumption that all frame-index SDDbgValues are indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot be lowered properly otherwise. Basically the fix prevents a direct SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice by a debugger. There were a handful of tests relying on this incorrect "FRAMEIX => indirect" assumption which actually had incorrect DW_AT_locations: these are all fixed up in this patch. Testing: - check-llvm, and an end-to-end test using lldb to debug an optimized program. - Existing unit tests for DIExpression::appendToStack fully cover the new DIExpression::append utility. - check-debuginfo (the debug info integration tests) Differential Revision: https://reviews.llvm.org/D49454 llvm-svn: 338069	2018-07-26 20:56:53 +00:00
Sanjay Patel	6d6eab66e0	[InstCombine] fold udiv with common factor from muls with nuw Unfortunately, sdiv isn't as simple because of UB due to overflow. This fold is mentioned in PR38239: https://bugs.llvm.org/show_bug.cgi?id=38239 llvm-svn: 338059	2018-07-26 19:22:41 +00:00
Sanjay Patel	b381e7e59a	[InstCombine] add tests for udiv with common factor; NFC This fold is mentioned in PR38239: https://bugs.llvm.org/show_bug.cgi?id=38239 The general case probably belongs in -reassociate, but given that we do basic reassociation optimizations similar to this in instcombine already, we might as well be consistent within instcombine and handle this pattern? llvm-svn: 338038	2018-07-26 16:14:53 +00:00
Fangrui Song	f2822e2d9d	[ConstProp] Fix calls-math-finite.ll on FreeBSD FreeBSD's log(3.0) is less precise than glibc and musl. Let's forgive its rounding error of more than half an ulp. llvm-svn: 338009	2018-07-26 06:24:11 +00:00
Eli Friedman	d6baff65f7	[GlobalMerge] Handle llvm.compiler.used correctly. Reuse the handling for llvm.used, and don't transform such globals. Fixes a failure on the asan buildbot caused by my previous commit. llvm-svn: 337973	2018-07-25 22:03:35 +00:00
Roman Tereshin	4f10a9d3a3	[LSV] Look through selects for consecutive addresses In some cases LSV sees (load/store _ (select _ <pointer expression> <pointer expression>)) patterns in input IR, often due to sinking and other forms of CFG simplification, sometimes interspersed with bitcasts and all-constant-indices GEPs. With this patch`areConsecutivePointers` method would attempt to handle select instructions. This leads to an increased number of successful vectorizations. Technically, select instructions could appear in index arithmetic as well, however, we don't see those in our test suites / benchmarks. Also, there is a lot more freedom in IR shapes computing integral indices in general than in what's common in pointer computations, and it appears that it's quite unreliable to do anything short of making select instructions first class citizens of Scalar Evolution, which for the purposes of this patch is most definitely an overkill. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49428 llvm-svn: 337965	2018-07-25 21:33:00 +00:00
Eli Friedman	0887cf9cab	[GlobalMerge] Allow merging globals with arbitrary alignment. Instead of depending on implicit padding from the structure layout code, use a packed struct and emit the padding explicitly. Differential Revision: https://reviews.llvm.org/D49710 llvm-svn: 337961	2018-07-25 20:58:01 +00:00
Florian Hahn	b6613ac665	Revert r337904: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions. I suspect it is causing the clang-stage2-Rthinlto failures. llvm-svn: 337956	2018-07-25 19:44:19 +00:00
Roman Tereshin	ed047b0184	[SCEV] Add [zs]ext{C,+,x} -> (D + [zs]ext{C-D,+,x})<nuw><nsw> transform as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw> similar to the equivalent transformation for zext's if the top level addition in (D + (C-D + x * n)) could be proven to not wrap, where the choice of D also maximizes the number of trailing zeroes of (C-D + x * n), ensuring homogeneous behaviour of the transformation and better canonicalization of such AddRec's (indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for the same Y, but only 2^(2w - k) different expressions in the resulting `B3 + ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type) This patch generalizes sext(C1 + C2X) --> sext(C1) + sext(C2X) and sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in r209568 relaxing the requirements the following way: 1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2 a sufficient number of times; 2. C1 doesn't have to be less than C2, instead of extracting the entire C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the second one that may cause wrapping within the extension operator, and move the first one that doesn't affect wrapping out of the extension operator, enabling further simplifications; 3. C1 and C2 don't have to be positive, splitting C1 like shown above produces a sum that is guaranteed to not wrap, signed or unsigned; 4. in AddExpr case there could be more than 2 terms, and in case of AddExpr the 2nd and following terms and in case of AddRecExpr the Step component don't have to be in the C2X form or constant (respectively), they just need to have enough trailing zeros, which in turn could be guaranteed by means other than arithmetics, e.g. by a pointer alignment; 5. the extension operator doesn't have to be a sext, the same transformation works and profitable for zext's as well. Apparently, optimizations like SLPVectorizer currently fail to vectorize even rather trivial cases like the following: double bar(double a, unsigned n) { double x = 0.0; double y = 0.0; for (unsigned i = 0; i < n; i += 2) { x += a[i]; y += a[i + 1]; } return x * y; } If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm` (!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"}) it produces scalar code with the loop not unrolled with the unsigned `n` and `i` (like shown above), but vectorized and unrolled loop with signed `n` and `i`. With the changes made in this commit the unsigned version will be vectorized (though not unrolled for unclear reasons). How it all works: Let say we have an AddExpr that looks like (C + x + y + ...), where C is a constant and x, y, ... are arbitrary SCEVs. Let's compute the minimum number of trailing zeroes guaranteed of that sum w/o the constant term: (x + y + ...). If, for example, those terms look like follows: i XXXX...X000 YYYY...YY00 ... ZZZZ...0000 then the rightmost non-guaranteed-zero bit (a potential one at i-th position above) can change the bits of the sum to the left (and at i-th position itself), but it can not possibly change the bits to the right. So we can compute the number of trailing zeroes by taking a minimum between the numbers of trailing zeroes of the terms. Now let's say that our original sum with the constant is effectively just C + X, where X = x + y + .... Let's also say that we've got 2 guaranteed trailing zeros for X: j CCCC...CCCC XXXX...XX00 // this is X = (x + y + ...) Any bit of C to the left of j may in the end cause the C + X sum to wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not affect wrapping in any way. If the upper bits cause a wrap, it will be a wrap regardless of the values of the 2 least significant bits of C. If the upper bits do not cause a wrap, it won't be a wrap regardless of the values of the 2 bits on the right (again). So let's split C to 2 constants like follows: 0000...00CC = D CCCC...CC00 = (C - D) and represent the whole sum as D + (C - D + X). The second term of this new sum looks like this: CCCC...CC00 XXXX...XX00 ----------- // let's add them up YYYY...YY00 The sum above (let's call it Y)) may or may not wrap, we don't know, so we need to keep it under a sext/zext. Adding D to that sum though will never wrap, signed or unsigned, if performed on the original bit width or the extended one, because all that that final add does is setting the 2 least significant bits of Y to the bits of D: YYYY...YY00 = Y 0000...00CC = D ----------- <nuw><nsw> YYYY...YYCC Which means we can safely move that D out of the sext or zext and claim that the top-level sum neither sign wraps nor unsigned wraps. Let's run an example, let's say we're working in i8's and the original expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes like this: 0001 0101 // 21 XXXX XX00 // 12x YYYY Y000 // 8y 0001 0101 // 21 ZZZZ ZZ00 // 12x + 8y 0000 0001 // D 0001 0100 // 21 - D = 20 ZZZZ ZZ00 // 12x + 8y 0000 0001 // D WWWW WW00 // 21 - D + 12x + 8y = 20 + 12x + 8y therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw> This approach could be improved if we move away from using trailing zeroes and use KnownBits instead. For instance, with KnownBits we could have the following picture: i 10 1110...0011 // this is C XX X1XX...XX00 // this is X = (x + y + ...) Notice that some of the bits of X are known ones, also notice that known bits of X are interspersed with unknown bits and not grouped on the rigth or left. We can see at the position i that C(i) and X(i) are both known ones, therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of the bits of C to the right of i. For instance, the C(i - 1) bit only affects the bits of the sum at positions i - 1 and i, and does not influence if the sum is going to wrap or not. Therefore we could split the constant C the following way: i 00 0010...0011 = D 10 1100...0000 = (C - D) Let's compute the KnownBits of (C - D) + X: XX1 1 = carry bit, blanks stand for known zeroes 10 1100...0000 = (C - D) XX X1XX...XX00 = X --- ----------- XX X0XX...XX00 Will this add wrap or not essentially depends on bits of X. Adding D to this sum, however, is guaranteed to not to wrap: 0 X 00 0010...0011 = D sX X0XX...XX00 = (C - D) + X --- ----------- sX XXXX XX11 As could be seen above, adding D preserves the sign bit of (C - D) + X, if any, and has a guaranteed 0 carry out, as expected. The more bits of (C - D) we constrain, the better the transformations introduced here canonicalize expressions as it leaves less freedom to what values the constant part of ((C - D) + x + y + ...) can take. Reviewed By: mzolotukhin, efriedma Differential Revision: https://reviews.llvm.org/D48853 llvm-svn: 337943	2018-07-25 18:01:41 +00:00
Florian Hahn	6f5c6adbcd	Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions. r337828 resolves a PredicateInfo issue with unnamed types. Original message: This patch updates IPSCCP to use PredicateInfo to propagate facts to true branches predicated by EQ and to false branches predicated by NE. As a follow up, we should be able to extend it to also propagate additional facts about nonnull. Reviewers: davide, mssimpso, dberlin, efriedma Reviewed By: davide, dberlin llvm-svn: 337904	2018-07-25 11:13:40 +00:00
Hideki Saito	ef380b0fc5	[LV] Fix for PR38110, LV encountered llvm_unreachable() Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash). Reviewers: sbaranga, mssimpso, hsaito Reviewed By: hsaito Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49461 llvm-svn: 337861	2018-07-24 22:30:31 +00:00
Roman Tereshin	1ba1f9310c	[SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw><nsw> transform if the top level addition in (D + (C-D + x + ...)) could be proven to not wrap, where the choice of D also maximizes the number of trailing zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the transformation and better canonicalization of such expressions. This enables better canonicalization of expressions like 1 + zext(5 + 20 * %x + 24 * %y) and zext(6 + 20 * %x + 24 * %y) which get both transformed to 2 + zext(4 + 20 * %x + 24 * %y) This pattern is common in address arithmetics and the transformation makes it easier for passes like LoadStoreVectorizer to prove that 2 or more memory accesses are consecutive and optimize (vectorize) them. Reviewed By: mzolotukhin Differential Revision: https://reviews.llvm.org/D48853 llvm-svn: 337859	2018-07-24 21:48:56 +00:00
Florian Hahn	36d2e25d5a	[PredicateInfo] Use custom mangling to support ssa_copy with unnamed types. This is a workaround and it would be better to fix this generally, but doing it generally is quite tricky. See D48541 and PR38117. Doing it in PredicateInfo directly allows us to use the type address to differentiate different unnamed types, because neither the created declarations nor the ssa_copy calls should be visible after PredicateInfo got destroyed. Reviewers: efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49126 llvm-svn: 337828	2018-07-24 14:49:52 +00:00
Manoj Gupta	f9f50f634d	ConstantFolding: Avoid a crash. Summary: Check if the parent basic block and caller exists before calling CS.getCaller when constant folding strip.invariant.group instrinsic. This avoids a crash when the function containing the intrinsic is being inlined. The instruction is checked for any simplifiction but has not yet been added to a basic block. Reviewers: Prazek, rsmith, efriedma Reviewed By: efriedma Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D49690 llvm-svn: 337742	2018-07-23 21:20:00 +00:00
John Brawn	fc18a6ad7d	[GVN] Don't use the eliminated load as an available value in phi construction In ConstructSSAForLoadSet if an available value is actually the load that we're doing SSA construction to eliminate, then we can omit it as SSAUpdate will add in the value for the phi that will be replacing it anyway. This can result in simpler IR which can allow further optimisation. Differential Revision: https://reviews.llvm.org/D44160 llvm-svn: 337686	2018-07-23 12:14:45 +00:00
Alexandros Lamprineas	bf6009c234	[MemorySSAUpdater] Update Phi operands after trivial Phi elimination Bug fix for PR37445. The underlying problem and its fix are similar to PR37808. The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is computed before the call to tryRemoveTrivialPhi() and it ends up being out of date, pointing to stale data. We have now turned each of the PhiOps into a TrackingVH<MemoryAccess>. Differential Revision: https://reviews.llvm.org/D49425 llvm-svn: 337680	2018-07-23 10:56:30 +00:00
Alexandros Lamprineas	592cc78dd8	[GVNHoist] safeToHoistLdSt allows illegal hoisting Bug fix for PR36787. When reasoning if it's safe to hoist a load we want to make sure that the defining memory access dominates the new insertion point of the hoisted instruction. safeToHoistLdSt calls firstInBB(InsertionPoint,DefiningAccess) which returns false if InsertionPoint == DefiningAccess, and therefore it falsely thinks it's safe to hoist. Differential Revision: https://reviews.llvm.org/D49555 llvm-svn: 337674	2018-07-23 09:42:35 +00:00
Chen Zheng	69bb064539	[InstrSimplify] fold sdiv if two operands are negated and non-overflow Differential Revision: https://reviews.llvm.org/D49382 llvm-svn: 337642	2018-07-21 12:27:54 +00:00
Roman Tereshin	31d52847ef	Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size" This reapplies commit r337489 reverted by r337541 Additionally, this commit contains a speculative fix to the issue reported in r337541 (the report does not contain an actionable reproducer, just a stack trace) llvm-svn: 337606	2018-07-20 20:10:04 +00:00
Chen Zheng	0f609db61a	[NFC][testcases] fold sdiv if two operands are negated and non-overflow llvm-svn: 337549	2018-07-20 13:38:59 +00:00
Florian Hahn	0a560d5d9c	Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 337548	2018-07-20 13:29:12 +00:00
Chen Zheng	f801d0fea9	[InstSimplify] fold srem instruction if its two operands are negated. Differential Revision: https://reviews.llvm.org/D49423 llvm-svn: 337545	2018-07-20 13:00:47 +00:00
Chen Zheng	35395a6773	[NFC][testcases] more testcases for folding srem if its two operands are negatived. llvm-svn: 337543	2018-07-20 12:53:45 +00:00
Sam McCall	57743883f1	Revert "[LSV] Refactoring + supporting bitcasts to a type of different size" This reverts commit r337489. It causes asserts to fire in some TensorFlow tests, e.g. tensorflow/compiler/tests/gather_test.py on GPU. Example stack trace: Start test case: GatherTest.testHigherRank assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits" @ 0x5559446ebe10 __assert_fail @ 0x55593ef32f5e llvm::APInt::trunc() @ 0x55593d78f86e (anonymous namespace)::Vectorizer::lookThroughComplexAddresses() @ 0x55593d78f2bc (anonymous namespace)::Vectorizer::areConsecutivePointers() @ 0x55593d78d128 (anonymous namespace)::Vectorizer::isConsecutiveAccess() @ 0x55593d78c926 (anonymous namespace)::Vectorizer::vectorizeInstructions() @ 0x55593d78c221 (anonymous namespace)::Vectorizer::vectorizeChains() @ 0x55593d78b948 (anonymous namespace)::Vectorizer::run() @ 0x55593d78b725 (anonymous namespace)::LoadStoreVectorizer::runOnFunction() @ 0x55593edf4b17 llvm::FPPassManager::runOnFunction() @ 0x55593edf4e55 llvm::FPPassManager::runOnModule() @ 0x55593edf563c (anonymous namespace)::MPPassManager::runOnModule() @ 0x55593edf5137 llvm::legacy::PassManagerImpl::run() @ 0x55593edf5b71 llvm::legacy::PassManager::run() @ 0x55593ced250d xla::gpu::IrDumpingPassManager::run() @ 0x55593ced5033 xla::gpu::(anonymous namespace)::EmitModuleToPTX() @ 0x55593ced40ba xla::gpu::(anonymous namespace)::CompileModuleToPtx() @ 0x55593ced33d0 xla::gpu::CompileToPtx() @ 0x55593b26b2a2 xla::gpu::NVPTXCompiler::RunBackend() @ 0x55593b21f973 xla::Service::BuildExecutable() @ 0x555938f44e64 xla::LocalService::CompileExecutable() @ 0x555938f30a85 xla::LocalClient::Compile() @ 0x555938de3c29 tensorflow::XlaCompilationCache::BuildExecutable() @ 0x555938de4e9e tensorflow::XlaCompilationCache::CompileImpl() @ 0x555938de3da5 tensorflow::XlaCompilationCache::Compile() @ 0x555938c5d962 tensorflow::XlaLocalLaunchBase::Compute() @ 0x555938c68151 tensorflow::XlaDevice::Compute() @ 0x55593f389e1f tensorflow::(anonymous namespace)::ExecutorState::Process() @ 0x55593f38a625 tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()() * SIGABRT received by PID 7798 (TID 7837) from PID 7798; * llvm-svn: 337541	2018-07-20 12:03:00 +00:00
Eli Friedman	a3c78f5981	[SCCP] Don't use markForcedConstant on branch conditions. It's more aggressive than we need to be, and leads to strange workarounds in other places like call return value inference. Instead, just directly mark an edge viable. Tests by Florian Hahn. Differential Revision: https://reviews.llvm.org/D49408 llvm-svn: 337507	2018-07-19 23:02:07 +00:00
Roman Tereshin	b49b2a601f	[LSV] Refactoring + supporting bitcasts to a type of different size This is mostly a preparation work for adding a limited support for select instructions. It proved to be difficult to do due to size and irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I believe. It also turned out that these changes make it simpler to finish one of the TODOs and fix a number of other small issues, namely: 1. Looking through bitcasts to a type of a different size (requires careful tracking of the original load/store size and some math converting sizes in bytes to expected differences in indices of GEPs). 2. Reusing partial analysis of pointers done by first attempt in proving them consecutive instead of starting from scratch. This added limited support for nested GEPs co-existing with difficult sext/zext instructions. This also required a careful handling of negative differences between constant parts of offsets. 3. Handing a case where the first pointer index is not an add, but something else (a function parameter for instance). I observe an increased number of successful vectorizations on a large set of shader programs. Only few shaders are affected, but those that are affected sport >5% less loads and stores than before the patch. Reviewed By: rampitec Differential-Revision: https://reviews.llvm.org/D49342 llvm-svn: 337489	2018-07-19 19:42:43 +00:00
Farhana Aleen	8c7a30baea	[LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers. Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or both of them are not factorized. But isConsecutiveAccess() fails if one of the pointers is factorized but the other one is not. Here is an example: PtrA = 4 * (A + B) PtrB = 4 + 4A + 4B This patch uses getMinusSCEV() to compute the distance between two pointers. getMinusSCEV() allows combining the expressions and computing the simplified distance. Author: FarhanaAleen Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49516 llvm-svn: 337471	2018-07-19 16:50:27 +00:00
Roman Lebedev	3cb87e905c	[InstCombine] Re-commit: Fold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 This transform is surprisingly frustrating. This does not deal with non-splat shift amounts, or with undef shift amounts. I've outlined what i think the solution should be: ``` // Potential handling of non-splats: for each element: // * if both are undef, replace with constant 0. // Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0. // * if both are not undef, and are different, bailout. // * else, only one is undef, then pick the non-undef one. ``` This is a re-commit, as the original patch, committed in rL337190 was reverted in rL337344 as it broke chromium build: https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM Differential Revision: https://reviews.llvm.org/D49320 llvm-svn: 337376	2018-07-18 10:55:17 +00:00
Roman Lebedev	3404d4dd41	[NFC][InstCombine] i65 tests for 'check for [no] signed truncation' pattern Those initially broke chromium build: https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 llvm-svn: 337364	2018-07-18 08:49:51 +00:00
Roman Lebedev	ad50ae82ad	Revert test changes part of "Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"" We want the test to remain good anyway. I think the fix is incoming. This reverts part of commit rL337344. llvm-svn: 337359	2018-07-18 08:15:13 +00:00
Bob Haarman	4ebe5d59b6	Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern" This reverts r337190 (and a few follow-up commits), which caused the Chromium build to fail. See https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 llvm-svn: 337344	2018-07-18 02:18:28 +00:00
Vedant Kumar	9ece818291	[InstCombine] Preserve debug value when simplifying cast-of-select InstCombine has a cast transform that matches a cast-of-select: Orig = cast (Src = select Cond TV FV) And tries to replace it with a select which has the cast folded in: NewSel = select Cond (cast TV) (cast FV) The combiner does RAUW(Orig, NewSel), so any debug values for Orig would survive the transform. But debug values for Src would be lost. This patch teaches InstCombine to replace all debug uses of Src with NewSel (taking care of doing any necessary DIExpression rewriting). Differential Revision: https://reviews.llvm.org/D49270 llvm-svn: 337310	2018-07-17 18:08:36 +00:00
Vedant Kumar	ced3fd4736	Remove an errant piece of !dbg metadata from a test, NFC llvm-svn: 337309	2018-07-17 18:08:34 +00:00
Florian Hahn	d95761d9d0	[IPSCCP] Run Solve each time we resolved an undef in a function. Once we resolved an undef in a function we can run Solve, which could lead to finding a constant return value for the function, which in turn could turn undefs into constants in other functions that call it, before resolving undefs there. Computationally the amount of work we are doing stays the same, just the order we process things is slightly different and potentially there are a few less undefs to resolve. We are still relying on the order of functions in the IR, which means depending on the order, we are able to resolve the optimal undef first or not. For example, if @test1 comes before @testf, we find the constant return value of @testf too late and we cannot use it while solving @test1. This on its own does not lead to more constants removed in the test-suite, probably because currently we have to be very lucky to visit applicable functions in the right order. Maybe we manage to come up with a better way of resolving undefs in more 'profitable' functions first. Reviewers: efriedma, mssimpso, davide Reviewed By: efriedma, davide Differential Revision: https://reviews.llvm.org/D49385 llvm-svn: 337283	2018-07-17 14:04:59 +00:00
Simon Pilgrim	1a4f3c93fb	[SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191) TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only. llvm-svn: 337280	2018-07-17 13:43:33 +00:00
Chen Zheng	fcfcf07104	[NFC][testcases] add testcases for folding srem whose operands are negatived. Finish same optimization for add instruction in D49216 and sdiv instruction in D49382. This patch is for srem instruction. llvm-svn: 337270	2018-07-17 12:31:54 +00:00
Chen Zheng	c992cc4e97	[testcases] move testcases to right place - NFC Differential Revision: https://reviews.llvm.org/D49409 llvm-svn: 337230	2018-07-17 01:04:41 +00:00
Roman Lebedev	5da636fb90	[NFC][InstCombine] Fine-tune 'check for [no] signed truncation' tests We are using i8 for these tests, and shifting by 4, which is exactly the half of i8. But as it is seen from the proofs https://rise4fun.com/Alive/mgu KeptBits = bitwidth(%x) - MaskedBits, so with using shifts by 4, we are not really testing that we actually properly handle the other cases with shifts not by half... llvm-svn: 337208	2018-07-16 20:10:46 +00:00
Roman Lebedev	b79b4f539b	[InstCombine] Fold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. Proofs for this transform: https://rise4fun.com/Alive/mgu This transform is surprisingly frustrating. This does not deal with non-splat shift amounts, or with undef shift amounts. I've outlined what i think the solution should be: ``` // Potential handling of non-splats: for each element: // * if both are undef, replace with constant 0. // Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0. // * if both are not undef, and are different, bailout. // * else, only one is undef, then pick the non-undef one. ``` The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: JDevlieghere, rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D49320 llvm-svn: 337190	2018-07-16 16:45:42 +00:00
Teresa Johnson	d68935c5ac	Restore "[ThinLTO] Ensure we always select the same function copy to import" This reverts commit r337081, therefore restoring r337050 (and fix in r337059), with test fix for bot failure described after the original description below. In order to always import the same copy of a linkonce function, even when encountering it with different thresholds (a higher one then a lower one), keep track of the summary we decided to import. This ensures that the backend only gets a single definition to import for each GUID, so that it doesn't need to choose one. Move the largest threshold the GUID was considered for import into the current module out of the ImportMap (which is part of a larger map maintained across the whole index), and into a new map just maintained for the current module we are computing imports for. This saves some memory since we no longer have the thresholds maintained across the whole index (and throughout the in-process backends when doing a normal non-distributed ThinLTO build), at the cost of some additional information being maintained for each invocation of ComputeImportForModule (the selected summary pointer for each import). There is an additional map lookup for each callee being considered for importing, however, this was able to subsume a map lookup in the Worklist iteration that invokes computeImportForFunction. We also are able to avoid calling selectCallee if we already failed to import at the same or higher threshold. I compared the run time and peak memory for the SPEC2006 471.omnetpp benchmark (running in-process ThinLTO backends), as well as for a large internal benchmark with a distributed ThinLTO build (so just looking at the thin link time/memory). Across a number of runs with and without this change there was no significant change in the time and memory. (I tried a few other variations of the change but they also didn't improve time or peak memory). The new commit removes a test that no longer makes sense (Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the reverse-iteration bot. The test depends on the order of processing the summary call edges, and actually depended on the old problematic behavior of selecting more than one summary for a given GUID when encountered with different thresholds. There was no guarantee even before that we would eventually pick the linkonce copy with the hottest call edges, it just happened to work with the test and the old code, and there was no guarantee that we would end up importing the selected version of the copy that had the hottest call edges (since the backend would effectively import only one of the selected copies). Reviewers: davidxl Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D48670 llvm-svn: 337184	2018-07-16 15:30:27 +00:00
Chen Zheng	1ff49315ab	[InstrSimplify] add testcases for fold sdiv if two operands are negatived and non-overflow Differential Revision: https://reviews.llvm.org/D49365 llvm-svn: 337179	2018-07-16 15:06:42 +00:00
Alexandros Lamprineas	f854ce84c4	[MemorySSAUpdater] Remove deleted trivial Phis from active workset Bug fix for PR37808. The regression test is a reduced version of the original reproducer attached to the bug report. As stated in the report, the problem was that InsertedPHIs was keeping dangling pointers to deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get zapped shortly afterwards. I've used WeakVH instead of an expensive removal operation from the active workset. Differential Revision: https://reviews.llvm.org/D48372 llvm-svn: 337149	2018-07-16 07:51:27 +00:00
Chen Zheng	ccc8422464	[InstCombine] add more SPFofSPF folding Differential Revision: https://reviews.llvm.org/D49238 llvm-svn: 337143	2018-07-16 02:23:00 +00:00
Chen Zheng	b972273f98	[InstCombine] fold icmp pred (sub 0, X) C for vector type Differential Revision: https://reviews.llvm.org/D49283 llvm-svn: 337141	2018-07-16 00:51:40 +00:00
Sanjay Patel	fae8ed0104	[InstSimplify] add fixme comment for PR37776; NFC llvm-svn: 337129	2018-07-15 16:13:58 +00:00
Sanjay Patel	92d0c1c129	[InstSimplify] fold minnum/maxnum with NaN arg This fold is repeated/misplaced in instcombine, but I'm not sure if it's safe to remove that yet because some other folds appear to be asserting that the transform has occurred within instcombine itself. This isn't the best fix for PR37776, but it probably hides the bug with the given code example: https://bugs.llvm.org/show_bug.cgi?id=37776 We have another test to demonstrate the more general bug. llvm-svn: 337127	2018-07-15 14:52:16 +00:00
Sanjay Patel	ef71b704c2	[InstSimplify] add tests for minnum/maxnum; NFC This isn't the best fix for PR37776, but it probably hides the bug with the given code example: https://bugs.llvm.org/show_bug.cgi?id=37776 We have another test to demonstrate the more general bug. llvm-svn: 337126	2018-07-15 14:46:48 +00:00
Roman Lebedev	b972fc3e8a	[InstCombine] Fold x & (-1 >> y) s< x to x s> (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337111	2018-07-14 20:08:47 +00:00
Roman Lebedev	edba515baa	[NFC][InstCombine] Tests for x & (-1 >> y) s< x to x s> (-1 >> y) fold. https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337110	2018-07-14 20:08:42 +00:00
Roman Lebedev	f14426101e	[InstCombine] Fold x & (-1 >> y) s>= x to x s<= (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337109	2018-07-14 20:08:37 +00:00
Roman Lebedev	53a014e61d	[NFC][InstCombine] Tests for x & (-1 >> y) s>= x to x s<= (-1 >> y) fold. https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337108	2018-07-14 20:08:31 +00:00
Roman Lebedev	1e61e358a4	[InstCombine] Fold x s<= x & (-1 >> y) to x s<= (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337107	2018-07-14 20:08:26 +00:00
Roman Lebedev	f1a351cf3a	[NFC][InstCombine] Tests for x s<= x & (-1 >> y) to x s<= (-1 >> y) fold. https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337106	2018-07-14 20:08:21 +00:00
Roman Lebedev	859e14aeaa	[InstCombine] Fold x s> x & (-1 >> y) to x s> (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337105	2018-07-14 20:08:16 +00:00
Roman Lebedev	62d050d2c4	[NFC][InstCombine] Tests for x s> x & (-1 >> y) to x s> (-1 >> y) fold. https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/I3O This pattern is not commutative! We must make sure not to fold the commuted version! llvm-svn: 337104	2018-07-14 20:08:09 +00:00
Roman Lebedev	0f5ec8921b	[InstCombine] Fold x u<= x & C to x u<= C https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/Fqp This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337102	2018-07-14 16:44:54 +00:00
Roman Lebedev	79ee3211ba	[NFC][InstCombine] Tests for x u<= x & C to x u<= C fold. https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/Fqp This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337101	2018-07-14 16:44:48 +00:00
Roman Lebedev	74f611a1f5	[InstCombine] Fold x u> x & C to x u> C https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/JvS This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337100	2018-07-14 16:44:43 +00:00
Roman Lebedev	60ea5df7c2	[NFC][InstCombine] Tests for x u> x & C to x u> C fold. https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/JvS This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337099	2018-07-14 16:44:37 +00:00
Roman Lebedev	e3dc587ae0	[InstCombine] Fold x & (-1 >> y) u< x to x u> (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/ocb This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337098	2018-07-14 12:20:16 +00:00
Roman Lebedev	cad4e3d741	[NFC][InstCombine] Tests for x & (-1 >> y) u< x to x u> (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/ocb This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337097	2018-07-14 12:20:11 +00:00
Roman Lebedev	fac48474ce	[InstCombine] Fold x & (-1 >> y) u>= x to x u<= (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/azI This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337096	2018-07-14 12:20:06 +00:00
Roman Lebedev	3591eb9296	[NFC][InstCombine] Tests for x & (-1 >> y) u>= x to x u<= (-1 >> y) https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/azI This pattern is not commutative. But InstSimplify will already have taken care of the 'commutative' variant. llvm-svn: 337095	2018-07-14 12:20:01 +00:00
Roman Lebedev	ea2ffc3fcf	[NFC][InstCombine] Add forgotten variable tests for foldICmpWithLowBitMaskedVal() llvm-svn: 337094	2018-07-14 12:19:56 +00:00
Teresa Johnson	b78c5d0602	Revert "[ThinLTO] Ensure we always select the same function copy to import" This reverts commits r337050 and r337059. Caused failure in reverse-iteration bot that needs more investigation. llvm-svn: 337081	2018-07-14 01:45:49 +00:00
Teresa Johnson	8e739f98fe	Revert "[ThinLTO] Add debug output to test" This reverts commit r337076. Not needed any more. llvm-svn: 337080	2018-07-14 01:34:06 +00:00
Teresa Johnson	05d28c8002	[ThinLTO] Add debug output to test Add -debug-only=function-import to get more information for debugging reverse-iteration bot failure from r337050. llvm-svn: 337076	2018-07-14 00:08:48 +00:00
Tim Shen	a064622bd3	Re-apply "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)." llvm-svn: 337075	2018-07-13 23:58:46 +00:00
Teresa Johnson	4f8d704cd0	[ThinLTO] Require x86 target for new test Should fix non-x86 bot failures for new test from r337050. llvm-svn: 337059	2018-07-13 22:36:22 +00:00
Teresa Johnson	d94c0594d9	[ThinLTO] Ensure we always select the same function copy to import In order to always import the same copy of a linkonce function, even when encountering it with different thresholds (a higher one then a lower one), keep track of the summary we decided to import. This ensures that the backend only gets a single definition to import for each GUID, so that it doesn't need to choose one. Move the largest threshold the GUID was considered for import into the current module out of the ImportMap (which is part of a larger map maintained across the whole index), and into a new map just maintained for the current module we are computing imports for. This saves some memory since we no longer have the thresholds maintained across the whole index (and throughout the in-process backends when doing a normal non-distributed ThinLTO build), at the cost of some additional information being maintained for each invocation of ComputeImportForModule (the selected summary pointer for each import). There is an additional map lookup for each callee being considered for importing, however, this was able to subsume a map lookup in the Worklist iteration that invokes computeImportForFunction. We also are able to avoid calling selectCallee if we already failed to import at the same or higher threshold. I compared the run time and peak memory for the SPEC2006 471.omnetpp benchmark (running in-process ThinLTO backends), as well as for a large internal benchmark with a distributed ThinLTO build (so just looking at the thin link time/memory). Across a number of runs with and without this change there was no significant change in the time and memory. (I tried a few other variations of the change but they also didn't improve time or peak memory). Reviewers: davidxl Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D48670 llvm-svn: 337050	2018-07-13 21:35:51 +00:00
Roman Lebedev	4d22f15a2f	[NFC][InstCombine] Tests for 'check for [no] signed truncation' pattern [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 llvm-svn: 337042	2018-07-13 20:33:34 +00:00
Vlad Tsyrklevich	cd1559366d	[LowerTypeTests] Limit when icall jumptable entries are emitted Summary: Currently LowerTypeTests emits jumptable entries for all live external and address-taken functions; however, we could limit the number of functions that we emit entries for significantly. For Cross-DSO CFI, we continue to emit jumptable entries for all exported definitions. In the non-Cross-DSO CFI case, we only need to emit jumptable entries for live functions that are address-taken in live functions. This ignores exported functions and functions that are only address taken in dead functions. This change uses ThinLTO summary data (now emitted for all modules during ThinLTO builds) to determine address-taken and liveness info. The logic for emitting jumptable entries is more conservative in the regular LTO case because we don't have summary data in the case of monolithic LTO builds; however, once summaries are emitted for all LTO builds we can unify the Thin/monolithic LTO logic to only use summaries to determine the liveness of address taking functions. This change is a partial fix for PR37474. It reduces the build size for nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in lots of unused code and unused functions that are address taken in dead functions no longer being being considered live due to emitted jumptable references. The reduction for chromium is ~0.1-0.2%. Reviewers: pcc, eugenis, javed.absar Reviewed By: pcc Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D47652 llvm-svn: 337038	2018-07-13 19:57:39 +00:00
Evgeniy Stepanov	ca8c2f7638	Revert "CallGraphSCCPass: iterate over all functions." This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements due to the use of a stale iterator in CGPassManager::runOnModule. The iterator may be invalidated if a pass removes a function, ex.: llvm::LegacyInlinerBase::inlineCalls inlineCallsImpl llvm::CallGraph::removeFunctionFromModule llvm-svn: 337018	2018-07-13 16:32:31 +00:00
Simon Pilgrim	36b944e778	[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED-2) We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Reapplied with fix to only accept 2 different casts if they come from the same source type (PR38154). Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336989	2018-07-13 11:09:52 +00:00
Sanjay Patel	70043b7e9a	[InstCombine] return when SimplifyAssociativeOrCommutative makes a change This bug was created by rL335258 because we used to always call instsimplify after trying the associative folds. After that change it became possible for subsequent folds to encounter unsimplified code (and potentially assert because of it). Instead of carrying changed state through instcombine, we can just return immediately. This allows instsimplify to run, so we can continue assuming that easy folds have already occurred. llvm-svn: 336965	2018-07-13 01:18:07 +00:00
Piotr Padlewski	c63b492bcd	Simplify recursive launder.invariant.group and strip Summary: This patch is crucial for proving equality laundered/stripped pointers. eg: bool foo(A a) { return a == std::launder(a); } Clang with -fstrict-vtable-pointers will emit something like: define dso_local zeroext i1 @_Z3fooP1A(%struct.A %a) { entry: %c = bitcast %struct.A* %a to i8* %call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c) %0 = bitcast %struct.A* %a to i8* %1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0) %2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call) %cmp = icmp eq i8* %1, %2 ret i1 %cmp } and because %2 can be replaced with @llvm.strip.invariant.group(%0) and that %2 and %1 will produce the same value (because strip is readnone) we can replace compare with true. Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D47423 llvm-svn: 336963	2018-07-12 23:55:20 +00:00
Martin Storsjo	86b95489fd	Revert "[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)" This reverts commit r336812, which broke compilation of a number of projects, see PR38154. llvm-svn: 336949	2018-07-12 21:33:42 +00:00
Roman Lebedev	733a8d6886	[InstCombine] icmp-logical.ll: restore the original intention of the test. While that fold is clearly not happening [anymore], we do now have separate test cases for these cases, so we should be ok to slightly adjust these tests to not potentially loose test coverage. As suggested by Hiroshi Yamauchi in https://reviews.llvm.org/D49179#1159345 llvm-svn: 336912	2018-07-12 14:56:17 +00:00
Roman Lebedev	74f899f0f4	[InstCombine] Fold x & (-1 >> y) != x to x u> (-1 >> y) Summary: A complementary fold to D49179. https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/Rny Caveat: one more thing in `test/Transforms/InstCombine/icmp-logical.ll` breaks. Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49205 llvm-svn: 336911	2018-07-12 14:56:12 +00:00
Chen Zheng	9b00a8e9d7	[InstCombine]add testcases for folding more SPFofSPF pattern Differential Revision: https://reviews.llvm.org/D49222 llvm-svn: 336902	2018-07-12 13:28:20 +00:00
Chen Zheng	fdf13ef342	[InstSimplify] simplify add instruction if two operands are negative Differential Revision: https://reviews.llvm.org/D49216 llvm-svn: 336881	2018-07-12 03:06:04 +00:00
Eric Christopher	7289ac8a19	Temporarily revert "Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters." as it's causing miscompiles. A testcase was provided in the original review thread. This reverts commit r336098. llvm-svn: 336877	2018-07-12 01:53:21 +00:00
Craig Topper	034adf2683	[X86] Remove and autoupgrade the scalar fma intrinsics with masking. This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. llvm-svn: 336871	2018-07-12 00:29:56 +00:00
Craig Topper	ed6acde8cf	[LoopIdiomRecognize] Don't convert a do while loop to ctlz. This commit suppresses turning loops like this into "(bitwidth - ctlz(input))". unsigned foo(unsigned input) { unsigned num = 0; do { ++num; input >>= 1; } while (input != 0); return num; } The loop version returns a value of 1 for both an input of 0 and an input of 1. Converting to a naive ctlz does not preserve that. Theoretically we could do better if we checked isKnownNonZero or we could insert a select to handle the divergence. But until we have motivating cases for that, this is the easiest solution. llvm-svn: 336864	2018-07-11 22:35:28 +00:00
Craig Topper	ef08aec935	[LoopIdiomRecognize] Add a test case showing a loop we turn into ctlz that we shouldn't. This loop executes one iteration without checking the input value. This produces a count of 1 for an input of 0 and 1. We are turning this into 32 - ctlz(n), but that returns 0 if n is 0. llvm-svn: 336862	2018-07-11 22:17:26 +00:00
Roman Lebedev	2c19fe52e9	[NFC][InstCombine] Tests for x & (-1 >> y) != x -> x u> (-1 >> y) fold https://bugs.llvm.org/show_bug.cgi?id=38123 https://rise4fun.com/Alive/Rny llvm-svn: 336857	2018-07-11 21:28:42 +00:00
Roman Lebedev	68d54cf5b3	[InstCombine] Fold x & (-1 >> y) == x to x u<= (-1 >> y) Summary: https://bugs.llvm.org/show_bug.cgi?id=38123 This pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in unsigned case, therefore it is probably a good idea to improve it. https://rise4fun.com/Alive/Rny ^ there are more opportunities for folds, i will follow up with them afterwards. Caveat: this somehow exposes a missing opportunities in `test/Transforms/InstCombine/icmp-logical.ll` It seems, the problem is in `foldLogOpOfMaskedICmps()` in `InstCombineAndOrXor.cpp`. But i'm not quite sure what is wrong, because it calls `getMaskedTypeForICmpPair()`, which calls `decomposeBitTestICmp()` which should already work for these cases... As @spatel notes in https://reviews.llvm.org/D49179#1158760, that code is a rather complex mess, so we'll let it slide. Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: yamauchi, majnemer, t.p.northover, llvm-commits Differential Revision: https://reviews.llvm.org/D49179 llvm-svn: 336834	2018-07-11 19:05:04 +00:00
Sanjay Patel	14eeb5a5c0	[InstSimplify] add/move tests for add folds; NFC isKnownNegation() is currently proposed as part of D48754, but it could be used to make InstSimplify stronger independently of any abs() improvements. llvm-svn: 336822	2018-07-11 16:52:18 +00:00
Simon Pilgrim	876e99bf2c	[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED) We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Reapplied with fix to only accept 2 different casts if they come from the same source type. Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336812	2018-07-11 15:05:10 +00:00
Simon Pilgrim	09832e33b4	[SLPVectorizer] Ensure alternate/passthrough doesn't vectorize sdiv with undef elts llvm-svn: 336809	2018-07-11 14:34:43 +00:00
Simon Pilgrim	0c7bea0dc0	[SLPVectorizer] Add some additional alternate cast tests Initial attempt at D49135 failed as we weren't correctly handling casts with different source types. llvm-svn: 336808	2018-07-11 14:29:13 +00:00
Simon Pilgrim	1edde95abd	Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions. Reverting due to buildbot failures llvm-svn: 336806	2018-07-11 14:08:16 +00:00
Simon Pilgrim	2f963a7e83	[SLPVectorizer] Add initial alternate opcode support for cast instructions. We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336804	2018-07-11 13:34:09 +00:00
Roman Lebedev	76f195cdbd	[NFC][InstCombine] Tests for x & (-1 >> y) == x -> x u<= (-1 >> y) fold https://bugs.llvm.org/show_bug.cgi?id=38123 This pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in unsigned case, therefore it is probably a good idea to improve it. https://rise4fun.com/Alive/Rny llvm-svn: 336796	2018-07-11 12:37:12 +00:00
Roman Lebedev	a042fae6e0	[NFC][InstCombine] icmp-logical.ll: add a few more tests. The @masked_and_notA_slightly_optimized and @masked_or_A will break when PR38123 will be fixed: https://rise4fun.com/Alive/Rny Clearly, they aren't optimized currently. https://rise4fun.com/Alive/ERo llvm-svn: 336784	2018-07-11 10:31:12 +00:00
Roman Lebedev	5260c9efc8	[NFC][InstCombine] Fix extra space padding in icmp-mul-zext.ll test update_test_checks will drop it anyway, creating noise.. llvm-svn: 336781	2018-07-11 09:57:53 +00:00
Roman Lebedev	c5e437e5eb	[NFC][InstCombine] Add variable names and regenerate icmp-logical.ll test. llvm-svn: 336780	2018-07-11 09:57:46 +00:00
Chen Zheng	fb361d2525	[test cases] add test cases for find more abs pattern Differential Revision: https://reviews.llvm.org/D49123 llvm-svn: 336752	2018-07-11 01:07:21 +00:00
Eugene Leviant	6a572b8e79	[Evaluator] Examine alias when evaluating function call This fixes PR38120 llvm-svn: 336702	2018-07-10 16:34:23 +00:00
Sanjay Patel	c8d9d812ec	[InstCombine] allow flag propagation when using safe constant This corresponds with the code for the single binop pattern added in rL336684. llvm-svn: 336696	2018-07-10 16:09:49 +00:00
Sanjay Patel	509a1e7a9b	[InstCombine] safely allow non-commutative binop identity constant folds This was originally intended with D48893, but as discussed there, we have to make the folds safe from producing extra poison. This should give the single binop folds the same capabilities as the existing folds for 2-binops+shuffle. LLVM binary opcode review: there are a total of 18 binops. There are 7 commutative binops (add, mul, and, or, xor, fadd, fmul) which we already fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr, fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't bother with sub/fsub with constant operand 1 because those are canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18. llvm-svn: 336684	2018-07-10 15:12:31 +00:00
Sanjay Patel	3333106a62	[InstCombine] drop poison flags when shuffle mask undef propagates to constant llvm-svn: 336679	2018-07-10 14:27:55 +00:00
Sanjay Patel	06ea4206ad	[InstCombine] allow more shuffle-binop folds with safe constants The case with 2 variables is more complicated than the case where we eliminate the shuffle entirely because a shuffle with an undef mask element creates an undef result. I'm not aware of any current analysis/transform that recognizes that undef propagating to a div/rem/shift, but we have to guard against the possibility. llvm-svn: 336668	2018-07-10 13:33:26 +00:00
Anastasis Grammenos	612bf7cac5	[DebugInfo][LoopVectorize] Preserve DL in induction PHI and Add Differential Revision: https://reviews.llvm.org/D48968 llvm-svn: 336667	2018-07-10 13:29:50 +00:00
Karl-Johan Karlsson	1ffeb5d7f0	[LowerSwitch] Fixed faulty PHI nodes Summary: Fixed two cases of where PHI nodes need to be updated by lowerswitch. When lowerswitch find out that the switch default branch is not reachable it remove the old default and replace it with the most popular block from the cases, but it forget to update the PHI nodes in the default block. The PHI nodes also need to be updated when the switch is replaced with a single branch. Reviewers: hans, reames, arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D47203 llvm-svn: 336659	2018-07-10 12:06:16 +00:00
Chandler Carruth	47dc3a346e	[PM/Unswitch] Fix a collection of closely related issues with trivial switch unswitching. The core problem was that the way we handled unswitching trivial exit edges through the default successor of a switch. For some reason I thought the right way to do this was to add a block containing unreachable and point the default successor at this block. In retrospect, this has an amazing number of problems. The first issue is the one that this pass has always worked around -- we have to detect such edges and avoid unswitching them again. This seemed pretty easy really. You juts look for an edge to a block containing unreachable. However, this pattern is woefully unsound. So many things can break it. The amazing thing is that I found a test case where simple-loop-unswitch itself breaks this! When we do a non-trivial unswitch of a switch we will end up splitting this exit edge. The result will be a default successor that is an exit and terminates in ... a perfectly normal branch. So the first test case that I started trying to fix is added to the nontrivial test cases. This is a ridiculous example that did just amazing things previously. With just unswitch, it would create 10+ copies of this stuff stamped out. But if you combine it just right with a bunch of other passes (like simplify-cfg, loop rotate, and some LICM) you can get it to do this infinitely. Or at least, I never got it to finish. =[ This, in turn, uncovered another related issue. When we are manipulating these switches after doing a trivial unswitch we never correctly updated PHI nodes to reflect our edits. As soon as I started changing how these edges were managed, it became obvious there were more issues that I couldn't realistically leave unaddressed, so I wrote more test cases around PHI updates here and ensured all of that works now. And this, in turn, required some adjustment to how we collect and manage the exit successor when it is the default successor. That showed a clear bug where we failed to include it in our search for the outer-most loop reached by an unswitched exit edge. This was actually already tested and the test case didn't work. I (wrongly) thought that was due to SCEV failing to analyze the switch. In fact, it was just a simple bug in the code that skipped the default successor. While changing this, I handled it correctly and have updated the test to reflect that we now get precise SCEV analysis of trip counts for the outer loop in one of these cases. llvm-svn: 336646	2018-07-10 08:36:05 +00:00
Sanjay Patel	69faf464ed	[InstCombine] allow more shuffle folds using safe constants getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming that the constant we wanted was operand 1 (RHS). That's wrong, but I don't think we could expose a bug or even a suboptimal fold from that because the callers have other guards for any binop that would have been affected. llvm-svn: 336617	2018-07-09 23:22:47 +00:00
Manoj Gupta	77eeac3d9e	llvm: Add support for "-fno-delete-null-pointer-checks" Summary: Support for this option is needed for building Linux kernel. This is a very frequently requested feature by kernel developers. More details : https://lkml.org/lkml/2018/4/4/601 GCC option description for -fdelete-null-pointer-checks: This Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. -fno-delete-null-pointer-checks is the inverse of this implying that null pointer dereferencing is not undefined. This feature is implemented in LLVM IR in this CL as the function attribute "null-pointer-is-valid"="true" in IR (Under review at D47894). The CL updates several passes that assumed null pointer dereferencing is undefined to not optimize when the "null-pointer-is-valid"="true" attribute is present. Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv Reviewed By: efriedma, george.burgess.iv Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47895 llvm-svn: 336613	2018-07-09 22:27:23 +00:00
George Burgess IV	3fbfa9c403	Make llvm.objectsize more conservative with null In non-zero address spaces, we were reporting that an object at `null` always occupies zero bytes. This is incorrect in many cases, so just return `unknown` in those cases for now. Differential Revision: https://reviews.llvm.org/D48860 llvm-svn: 336611	2018-07-09 22:21:16 +00:00
Sanjay Patel	651438c2f6	[InstCombine] correct test comments; NFC llvm-svn: 336570	2018-07-09 17:48:08 +00:00
Sanjay Patel	7cd32419ab	[InstCombine] avoid extra poison when moving shift above shuffle As discussed in D49047 / D48987, shift-by-undef produces poison, so we can't use undef vector elements in that case.. Note that we need to extend this for poison-generating flags, and there's a proposal to create poison from FMF in D47963, llvm-svn: 336562	2018-07-09 17:20:20 +00:00
Sanjay Patel	5bd36644c8	[InstCombine] fix shuffle-of-binops transform to avoid poison/undef As noted in D48987, there are many different ways for this transform to go wrong. In particular, the poison potential for shifts means we have to more careful with those ops. I added tests to make that behavior visible for all of the different cases that I could find. This is a partial fix. To make this review easier, I did not make changes for the single binop pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations noted with TODO comments. I'll follow-up once we're confident that things are correct here. The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely. Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows for some improvements to div/rem patterns, so there are wins along with the missed opportunities and fixes. Differential Revision: https://reviews.llvm.org/D49047 llvm-svn: 336546	2018-07-09 13:21:46 +00:00
Chandler Carruth	ed2965438e	[PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in r335553 with the non-trivial unswitching of switches. The code correctly updated most aspects of the CFG and analyses, but missed some crucial aspects: 1) When multiple cases have the same successor, we unswitch that a single time and replace the switch with a direct branch. The CFG here is correct, but the target of this direct branch may have had a PHI node with multiple entries in it. 2) When we still have to clone a successor of the switch into an unswitched copy of the loop, we'll delete potentially multiple edges entering this successor, not just one. 3) We also have to delete multiple edges entering the successors in the original loop when they have to be retained. 4) When the "retained successor" also occurs as a case successor, we just assert failed everywhere. This doesn't happen very easily because its always valid to simply drop the case -- the retained successor for switches is always the default successor. However, it is likely possible through some contrivance of different loop passes, unrolling, and simplifying for this to occur in practice and certainly there is nothing "invalid" about the IR so this pass needs to handle it. 5) In the case of #4, we also will replace these multiple edges with a direct branch much like in #1 and need to collapse the entries in any PHI nodes to a single enrty. All of this stems from the delightful fact that the same successor can show up in multiple parts of the switch terminator, and each of these are considered a distinct edge for the purpose of PHI nodes (and iterating the successors and predecessors) but not for unswitching itself, the dominator tree, or many other things. For the record, I intensely dislike this "feature" of the IR in large part because of the complexity it causes in passes like this. We already have a ton of logic building sets and handling duplicates, and we just had to add a bunch more. I've added a complex test case that covers all five of the above failure modes. I've also added a variation on it where #4 and #5 occur in loop exit, adding fun where we have an LCSSA PHI node with "multiple entries" despite have dedicated exits. There were no additional issues found by this, but it seems a useful corner case to cover with testing. One thing that working on all of this code has made painfully clear for me as well is how amazingly inefficient our PHI node representation is (in terms of the in-memory data structures and the APIs used to update them). This code has truly marvelous complexity bounds because every time we remove an entry from a PHI node we do a linear scan to find it and then a linear update to the data structure to remove it. We could in theory batch all of the PHI node updates into a single linear walk of the operands making this much more efficient, but the APIs fight hard against this and the fact that we have to handle duplicates in the peculiar manner we do (removing all but one in some cases) makes even implementing that very tedious and annoying. Anyways, none of this is new here or specific to loop unswitching. All code in LLVM that updates PHI node operands suffers from these problems. llvm-svn: 336536	2018-07-09 10:30:48 +00:00
Chijun Sima	9e1e0c7b2a	[PGOMemOPSize] Preserve the DominatorTree Summary: PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort. When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called. Reviewers: kuhar, davide, dmgreen Reviewed By: dmgreen Subscribers: mzolotukhin, vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D48914 llvm-svn: 336522	2018-07-09 08:07:21 +00:00
Craig Topper	2835278ee0	[LoopIdiomRecognize] Support for converting loops that use LSHR to CTLZ. In the 'detectCTLZIdiom' function support for loops that use LSHR instruction instead of ASHR has been added. This supports creating ctlz from the following code. int lzcnt(int x) { int count = 0; while (x > 0) { count++; x = x >> 1; } return count; } Patch by Olga Moldovanova Differential Revision: https://reviews.llvm.org/D48354 llvm-svn: 336509	2018-07-08 01:45:47 +00:00
Chandler Carruth	d8b0c8ce1b	[PM/LoopUnswitch] Fix PR37889, producing the correct loop nest structure after trivial unswitching. This PR illustrates that a fundamental analysis update was not performed with the new loop unswitch. This update is also somewhat fundamental to the core idea of the new loop unswitch -- we actually update the CFG based on the unswitching. In order to do that, we need to update the loop nest in addition to the domtree. For some reason, when writing trivial unswitching, I thought that the loop nest structure cannot be changed by the transformation. But the PR helps illustrate that it clearly can. I've expanded this to a number of different test cases that try to cover the different cases of this. When we unswitch, we move an exit edge of a loop out of the loop. If this exit edge changes which loop reached by an exit is the innermost loop, it changes the parent of the loop. Essentially, this transformation may hoist the inner loop up the nest. I've added the simple logic to handle this reliably in the trivial unswitching case. This just requires updating LoopInfo and rebuilding LCSSA on the impacted loops. In the trivial case, we don't even need to handle dedicated exits because we're only hoisting the one loop and we just split its preheader. I've also ported all of these tests to non-trivial unswitching and verified that the logic already there correctly handles the loop nest updates necessary. Differential Revision: https://reviews.llvm.org/D48851 llvm-svn: 336477	2018-07-07 01:12:56 +00:00
Tim Shen	2ed501d656	Revert "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)." This reverts commit r336140. Our tests shows that LSR assert fails with it. llvm-svn: 336473	2018-07-06 23:20:35 +00:00
Sanjay Patel	5739587735	[InstCombine] add more tests for potentially poisonous shifts; NFC llvm-svn: 336454	2018-07-06 17:44:57 +00:00
Vedant Kumar	6379a62250	[Local] replaceAllDbgUsesWith: Update debug values before RAUW The replaceAllDbgUsesWith utility helps passes preserve debug info when replacing one value with another. This improves upon the existing insertReplacementDbgValues API by: - Updating debug intrinsics in-place, while preventing use-before-def of the replacement value. - Falling back to salvageDebugInfo when a replacement can't be made. - Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into common utility code. Along with the API change, this teaches replaceAllDbgUsesWith how to create DIExpressions for three basic integer and pointer conversions: - The no-op conversion. Applies when the values have the same width, or have bit-for-bit compatible pointer representations. - Truncation. Applies when the new value is wider than the old one. - Zero/sign extension. Applies when the new value is narrower than the old one. Testing: - check-llvm, check-clang, a stage2 `-g -O3` build of clang, regression/unit testing. - This resolves a number of mis-sized dbg.value diagnostics from Debugify. Differential Revision: https://reviews.llvm.org/D48676 llvm-svn: 336451	2018-07-06 17:32:39 +00:00
Sanjay Patel	a212b0bc18	[InstCombine] add more tests with poison and undef; NFC As discussed in D48987 and D48893, there are many different ways to go wrong depending on the binop (and as shown here we already do go wrong in some cases). llvm-svn: 336450	2018-07-06 17:24:32 +00:00
Tim Northover	7ee46ed992	CallGraphSCCPass: iterate over all functions. Previously we only iterated over functions reachable from the set of external functions in the module. But since some of the passes under this (notably the always-inliner and coroutine lowerer) are required for correctness, they need to run over everything. This just adds an extra layer of iteration over the CallGraph to keep track of which functions we've already visited and get the next batch of SCCs. Should fix PR38029. llvm-svn: 336419	2018-07-06 08:04:47 +00:00
Max Kazantsev	20da7e467a	Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done" llvm-svn: 336410	2018-07-06 04:04:13 +00:00
Matt Arsenault	24ce89b717	Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN Better NaN handling for AMDGCN fmed3. All operands are checked for NaN now. The checks were moved before the canonicalization to provide a better mapping from fclamp. Changed the behaviour of fmed3(x,y,NaN) to return max(x,y) instead of min(x,y) in light of this. Updated tests as a result and added some new cases to cover the fix. Patch by Alan Baker llvm-svn: 336375	2018-07-05 17:05:36 +00:00
Craig Topper	350c5f1881	[X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement. llvm-svn: 336315	2018-07-05 06:52:55 +00:00
Sanjay Patel	9c2e7ceb1a	[InstCombine] allow narrowing of min/max/abs We have bailout hacks based on min/max in various places in instcombine that shouldn't be necessary. The affected test was added for: D48930 ...which is a consequence of the improvement in: D48584 (https://reviews.llvm.org/rL336172) I'm assuming the visitTrunc bailout in this patch was added specifically to avoid a change from SimplifyDemandedBits, so I'm just moving that below the EvaluateInDifferentType optimization. A narrow min/max is still a min/max. llvm-svn: 336293	2018-07-04 17:44:04 +00:00
Sanjay Patel	ad32d22e13	[InstCombine] add value names to test; NFC That makes it easier to mix and match lines into other tests. llvm-svn: 336289	2018-07-04 16:56:35 +00:00
Gabor Buella	da4a966e1c	NFC - Various typo fixes in tests llvm-svn: 336268	2018-07-04 13:28:39 +00:00
Max Kazantsev	bee9ca6568	[NFC] Add test that shows that InstCombine can do better llvm-svn: 336258	2018-07-04 10:32:02 +00:00
Anastasis Grammenos	204726b345	[DebugInfo][LoopVectorize] Preserve DL in generated phi instruction When creating `phi` instructions to resume at the scalar part of the loop, copy the DebugLoc from the original phi over to the new one. Differential Revision: https://reviews.llvm.org/D48769 llvm-svn: 336256	2018-07-04 10:16:55 +00:00
Anastasis Grammenos	509d79789f	[DebugInfo][InstCombine] Preserve DI after combining zext When zext is EvaluatedInDifferentType, InstCombine drops the dbg.value intrinsic. This patch tries to preserve said DI, by inserting the zext's old DI in the resulting instruction. (Only for integer type for now) Differential Revision: https://reviews.llvm.org/D48331 llvm-svn: 336254	2018-07-04 09:55:46 +00:00
Sanjay Patel	181aa26eb8	[InstCombine] add tests for shuffle+binop with constant op1; NFC This adds coverage for a planned enhancement for ConstantExpr::getBinOpIdentity() noted in D48830. llvm-svn: 336220	2018-07-03 18:43:46 +00:00
Sanjay Patel	8307bc407b	[Constants] add identity constants for fadd/fmul As the test diffs show, the current users of getBinOpIdentity() are InstCombine and Reassociate. SLP vectorizer is a candidate for using this functionality too (D28907). The InstCombine shuffle improvements are part of the planned enhancements noted in D48830. InstCombine actually has several other uses of getBinOpIdentity() via SimplifyUsingDistributiveLaws(), but we don't call that for any FP ops. Fixing that might be another part of removing the custom reassociation in InstCombine that is only done for fadd+fmul. llvm-svn: 336215	2018-07-03 17:12:59 +00:00
Sanjay Patel	2c38b7fd8b	[Reassociate] add tests for binop with identity constant; NFC llvm-svn: 336214	2018-07-03 16:44:18 +00:00
Sanjay Patel	5b4a003088	[Reassociate] regenerate checks; NFC llvm-svn: 336211	2018-07-03 16:01:41 +00:00
Sanjay Patel	5a6ba018d7	[Reassociate] add test for missing FP constant analysis; NFC llvm-svn: 336208	2018-07-03 15:56:04 +00:00
Sanjay Patel	3074b9e53f	[InstCombine] fold shuffle-with-binop and common value This is the last significant change suggested in PR37806: https://bugs.llvm.org/show_bug.cgi?id=37806#c5 ...though there are several follow-ups noted in the code comments in this patch to complete this transform. It's possible that a binop feeding a select-shuffle has been eliminated by earlier transforms (or the code was just written like this in the 1st place), so we'll fail to match the patterns that have 2 binops from: D48401, D48678, D48662, D48485. In that case, we can try to materialize identity constants for the remaining binop to fill in the "ghost" lanes of the vector (where we just want to pass through the original values of the source operand). I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor). Differential Revision: https://reviews.llvm.org/D48830 llvm-svn: 336196	2018-07-03 13:44:22 +00:00
Bjorn Pettersson	8dd6cf711f	[DebugInfo] Corrections for salvageDebugInfo Summary: When salvaging a dbg.declare/dbg.addr we should not add DW_OP_stack_value to the DIExpression (see test/Transforms/InstCombine/salvage-dbg-declare.ll). Consider this example %vla = alloca i32, i64 2 call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression()) Instcombine will turn it into %vla1 = alloca [2 x i32] %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0 call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression()) If the GEP can be eliminated, then the dbg.declare will be salvaged and we should get %vla1 = alloca [2 x i32] call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression()) The problem was that salvageDebugInfo did not recognize dbg.declare as being indirect (%vla1 points to the value, it does not hold the value), so we incorrectly got call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value)) I also made sure that llvm::salvageDebugInfo and DIExpression::prependOpcodes do not add DW_OP_stack_value to the DIExpression in case no new operands are added to the DIExpression. That way we avoid to, unneccessarily, turn a register location expression into an implicit location expression in some situations (see test11 in test/Transforms/LICM/sinking.ll). Reviewers: aprantl, vsk Reviewed By: aprantl, vsk Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48837 llvm-svn: 336191	2018-07-03 11:29:00 +00:00
Chandler Carruth	3897ded691	[PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV when unswitching loops. Original patch trying to address this was sent in D47624, but that didn't quite handle things correctly. There are two key principles used to select whether and how to invalidate SCEV-cached information about loops: 1) We must invalidate any info SCEV has cached before unswitching as we may change (or destroy) the loop structure by the act of unswitching, and make it hard to recover everything we want to invalidate within SCEV. 2) We need to invalidate all of the loops whose CFGs are mutated by the unswitching. Notably, this isn't the entire loop nest, this is every loop contained by the outermost loop reached by an exit block relevant to the unswitch. And we need to do this even when doing trivial unswitching. I've added more focused tests that directly check that SCEV starts off with imprecise information and after unswitching (and simplifying instructions) re-querying SCEV will produce precise information. These tests also specifically work to check that an outer loop's information becomes precise. However, the testing here is still a bit imperfect. Crafting test cases that reliably fail to be analyzed by SCEV before unswitching and succeed afterward proved ... very, very hard. It took me several hours and careful work to build these, and I'm not optimistic about necessarily coming up with more to cover more elaborate possibilities. Fortunately, the code pattern we are testing here in the pass is really straightforward and reliable. Thanks to Max Kazantsev for the initial work on this as well as the review, and to Hal Finkel for helping me talk through approaches to test this stuff even if it didn't come to much. Differential Revision: https://reviews.llvm.org/D47624 llvm-svn: 336183	2018-07-03 09:13:27 +00:00
Max Kazantsev	3097b76e8c	[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172	2018-07-03 06:23:57 +00:00
Tim Shen	c7cef4bcc4	[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428). Summary: Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change SCEV is able to prove that the loop doesn't wrap-self (due to zext i16 to i64), disabling the entire loop versioning pass. Removed the zext and just use i64. Reviewers: sanjoy Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48409 llvm-svn: 336140	2018-07-02 20:01:54 +00:00
Farhana Aleen	3b416db19b	[SLP] Recognize min/max pattern using instructions producing same values. Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130	2018-07-02 17:55:31 +00:00
Sanjay Patel	b999d74132	[InstCombine] reverse canonicalization of add --> or to allow more shuffle folding This extends D48485 to allow another pair of binops (add/or) to be combined either with or without a leading shuffle: or X, C --> add X, C (when X and C have no common bits set) Here, we need value tracking to determine that the 'or' can be reversed into an 'add', and we've added general infrastructure to allow extending to other opcodes or moving to where other passes could use that functionality. Differential Revision: https://reviews.llvm.org/D48662 llvm-svn: 336128	2018-07-02 17:42:29 +00:00
Simon Pilgrim	35f196c179	[SLPVectorizer][X86] Begin adding alternate tests for call operators Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls llvm-svn: 336125	2018-07-02 17:23:45 +00:00
Sanjay Patel	284ba0c18f	[ValueTracking] allow undef elements when matching vector abs llvm-svn: 336111	2018-07-02 14:43:40 +00:00
Sanjay Patel	951f617e16	[InstCombine] adjust shuffle tests with IR flags; NFC Due to current limitations in constant analysis, we need flags on add or mul to show propagation for the potential transform suggested in these tests (no other binops currently report identity constants). llvm-svn: 336101	2018-07-02 13:40:54 +00:00
Florian Hahn	4ebba909a2	Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 336098	2018-07-02 12:44:04 +00:00
Sanjay Patel	d980084597	[InstCombine] add tests for shuffle-binop; NFC This is another pattern mentioned in PR37806. llvm-svn: 336096	2018-07-02 12:30:46 +00:00
Simon Pilgrim	265793d52a	[SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095	2018-07-02 11:28:01 +00:00
Max Kazantsev	66da390506	[NFC] Test that shows unprofitability of instcombine with bit ranges llvm-svn: 336078	2018-07-02 06:55:00 +00:00
Piotr Padlewski	5b3db45e8f	Implement strip.invariant.group Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073	2018-07-02 04:49:30 +00:00
Sanjay Patel	279a1a39ad	[InstCombine] add abs tests with undef elts; NFC llvm-svn: 336065	2018-07-01 17:14:37 +00:00
Sanjay Patel	a9fdb9fd37	[PatternMatch] allow undef elements in vectors with m_Neg This is similar to the m_Not change from D44076. llvm-svn: 336064	2018-07-01 13:42:57 +00:00
David Green	963401d2be	[UnrollAndJam] New Unroll and Jam pass This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062	2018-07-01 12:47:30 +00:00
Simon Pilgrim	84f77ecba9	[SLPVectorizer][X86] Add some alternate tests for cast operators Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations llvm-svn: 336060	2018-07-01 11:29:46 +00:00
Eugene Leviant	6e4134459b	[Evaluator] Improve evaluation of call instruction Recommit of r335324 after buildbot failure fix llvm-svn: 336059	2018-07-01 11:02:07 +00:00
Sanjay Patel	16a42ca274	[InstCombine] add tests for negate vector with undef elts; NFC llvm-svn: 336050	2018-06-30 14:11:46 +00:00
Sanjay Patel	4491c0dd45	[InstCombine] add more tests for shuffle-binop folds; NFC The mul+shl tests add coverage for the fold enabled with D48678. The and+or tests are not handled yet; that's D48662. llvm-svn: 335984	2018-06-29 15:28:11 +00:00
Sanjay Patel	da66753e01	[InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806) This was discussed in D48401 as another improvement for: https://bugs.llvm.org/show_bug.cgi?id=37806 If we have 2 different variable values, then we shuffle (select) those lanes, shuffle (select) the constants, and then perform the binop. This eliminates a binop. The new shuffle uses the same shuffle mask as the existing shuffle, so there's no danger of creating a difficult shuffle. All of the earlier constraints still apply, but we also check for extra uses to avoid creating more instructions than we'll remove. Additionally, we're disallowing the fold for div/rem because that could expose a UB hole. Differential Revision: https://reviews.llvm.org/D48678 llvm-svn: 335974	2018-06-29 13:44:06 +00:00
Roman Lebedev	8d081b78e4	SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction Summary: An alternative to D48597. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 \| PR37936 ]]. The problem is as follows: 1. `indvars` marks `%dec` as `NUW`. 2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908) 3. `loop-reduce` tries to do some further modification, but crashes with an type assertion in cast, because `%dec` is no longer an `Instruction`, If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`, store that into a file, and then run `-loop-reduce`, there is no crash. So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV. But in this case we can just not crash if it's not an `Instruction`. This is just a local fix, unlike D48597, so there may very well be other problems. Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi Reviewed By: mkazantsev Subscribers: evstupac, javed.absar, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D48599 llvm-svn: 335950	2018-06-29 07:44:20 +00:00
Sanjay Patel	019d3cd3b4	[InstCombine] adjust shuffle tests; NFC Use xor for the extra uses test because div/rem have other problems. llvm-svn: 335924	2018-06-28 21:14:02 +00:00
Teresa Johnson	e87868b7e9	[ThinLTO] Port InlinerFunctionImportStats handling to new PM Summary: The InlinerFunctionImportStats will collect and dump stats regarding how many function inlined into the module were imported by ThinLTO. Reviewers: wmi, dexonsmith Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D48729 llvm-svn: 335914	2018-06-28 20:07:47 +00:00
Anastasis Grammenos	425df22ee3	[SROA] Preserve DebugLoc when rewriting alloca partitions When rewriting an alloca partition copy the DL from the old alloca over the the new one. Differential Revision: https://reviews.llvm.org/D48640 llvm-svn: 335904	2018-06-28 18:58:30 +00:00
Sanjay Patel	57bda365bf	[InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806) This is an enhancement to D48401 that was discussed in: https://bugs.llvm.org/show_bug.cgi?id=37806 We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other direction because that's generally better of course). This allows us to remove the shuffle as we do in the regular opcodes-are-the-same cases. This requires a small hack to make sure we don't introduce any extra poison: https://rise4fun.com/Alive/ZGv Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There are planned enhancements for opcode transforms such or -> add. Note that there's a different fold needed if we've already managed to simplify away a binop as seen in the test based on PR37806, but we manage to get that one case here because this fold is positioned above the demanded elements fold currently. Differential Revision: https://reviews.llvm.org/D48485 llvm-svn: 335888	2018-06-28 17:48:04 +00:00
Florian Hahn	388af14f85	[SCCP] Mark CFG as preserved. SCCP does not change the CFG, so we can mark it as preserved. Reviewers: dberlin, efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D47149 llvm-svn: 335820	2018-06-28 09:53:38 +00:00
Max Kazantsev	f5ba37182e	[IndVarSimplify] Ignore unreachable users of truncs If a trunc has a user in a block which is not reachable from entry, we can safely perform trunc elimination as if this user didn't exist. llvm-svn: 335816	2018-06-28 08:20:03 +00:00
Sanjay Patel	1ef49be8b6	[InstCombine] add tests for vector-select-of-binops with 2 variables; NFC llvm-svn: 335778	2018-06-27 20:23:47 +00:00
Sanjay Patel	7e45aebe55	[InstCombine] add more tests for shuffle with different binops; NFC llvm-svn: 335756	2018-06-27 17:21:57 +00:00
Craig Topper	31cbe75b3b	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Vedant Kumar	f6c0b41fb7	[InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms() This prevents InstCombine from creating mis-sized dbg.values when replacing a sequence of casts with a simpler cast. For example, in: (fptrunc (floor (fpext X))) -> (floorf X) We no longer emit dbg.value(X) (with a 32-bit float operand) to describe (fpext X) (which is a 64-bit float). This was diagnosed by the debugify check added in r335682. llvm-svn: 335696	2018-06-27 00:47:53 +00:00
Vedant Kumar	d13536e9f3	[Debugify] Handle failure to get fragment size when checking dbg.values It's not possible to get the fragment size of some dbg.values. Teach the mis-sized dbg.value diagnostic to detect this scenario and bail out. Tested with: $ find test/Transforms -print -exec opt -debugify-each -instcombine {} \; llvm-svn: 335695	2018-06-27 00:47:52 +00:00
Michael Zolotukhin	d3b8bdef01	[JumpThreading] Don't try to rewrite a use if it's already valid. Summary: When recording uses we need to rewrite after cloning a loop we need to check if the use is not dominated by the original def. The initial assumption was that the cloned basic block will introduce a new path and thus the original def will only dominate the use if they are in the same BB, but as the reproducer from PR37745 shows it's not always the case. This fixes PR37745. Reviewers: haicheng, Ka-Ka Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48111 llvm-svn: 335675	2018-06-26 22:19:48 +00:00
Matt Arsenault	7e991d30c0	ConstantFold: Don't fold global address vs. null for addrspace != 0 Not sure why this logic seems to be repeated in 2 different places, one called by the other. On AMDGPU addrspace(3) globals start allocating at 0, so these checks will be incorrect (not that real code actually tries to compare these addresses) llvm-svn: 335649	2018-06-26 18:55:43 +00:00
Matt Arsenault	2c1a570aab	LoopUnroll: Allow analyzing intrinsic call costs I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645	2018-06-26 18:51:17 +00:00
Sanjay Patel	ad0bfb844d	[InstSimplify] fold shifts by sext bool https://rise4fun.com/Alive/c3Y llvm-svn: 335633	2018-06-26 17:31:38 +00:00
Sanjay Patel	3d1e4d6fa6	[InstSimplify] add tests for shifts by sext bool; NFC llvm-svn: 335631	2018-06-26 17:15:07 +00:00
Sanjay Patel	3575f0c0b3	[InstCombine] fold urem with sext bool divisor Similar to other patches in this series: https://reviews.llvm.org/rL335512 https://reviews.llvm.org/rL335527 https://reviews.llvm.org/rL335597 https://reviews.llvm.org/rL335616 ...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform. I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold. Note that in this case, the backend might want to convert the select into math: Name: sext urem %e = sext i1 %x to i32 %r = urem i32 %y, %e => %c = icmp eq i32 %y, -1 %z = zext i1 %c to i32 %r = add i32 %z, %y llvm-svn: 335622	2018-06-26 16:30:00 +00:00
Simon Pilgrim	bbfc18b5b5	[SLPVectorizer] Recognise non uniform power of 2 constants Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them. As SLP works with arrays of values I don't think we can easily use the pattern match helpers here. Differential Revision: https://reviews.llvm.org/D48214 llvm-svn: 335621	2018-06-26 16:20:16 +00:00
Sanjay Patel	0f44759b0d	[InstCombine] add tests for urem with sext bool divisor; NFC llvm-svn: 335619	2018-06-26 16:01:24 +00:00
Sanjay Patel	2b7e31095d	[InstSimplify] fold srem with sext bool divisor llvm-svn: 335616	2018-06-26 15:32:54 +00:00
Sanjay Patel	0e0dbebeed	[InstSimplify] add tests for srem with sext bool divisor; NFC llvm-svn: 335609	2018-06-26 14:47:31 +00:00
Sanjay Patel	7c45debaea	[InstCombine] fold udiv with sext bool divisor Note: I didn't add a hasOneUse() check because the existing, related fold doesn't have that check. I suspect that the improved analysis and codegen make these some of the rare canonicalization cases where we allow an increase in instructions. llvm-svn: 335597	2018-06-26 12:41:15 +00:00
Florian Hahn	4a69b0bb36	[IPSCCP] Change dead blocks to unreachable after visiting all executable blocks. changeToUnreachable may remove PHI nodes from executable blocks we found values for and we would fail to replace them. By changing dead blocks to unreachable after we replaced constants in all executable blocks, we ensure such PHI nodes are replaced by their known value before. Fixes PR37780. Reviewers: efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D48421 llvm-svn: 335588	2018-06-26 10:15:02 +00:00
Bjorn Pettersson	550517bcab	Improve ConvertDebugDeclareToDebugValue Summary: This is a follow-up to r334830 and r335031. In the valueCoversEntireFragment check we now also handle the situation when there is a variable length array (VLA) involved, and the length of the array has been reduced to a constant. The ConvertDebugDeclareToDebugValue functions that are related to PHI nodes and load instructions now avoid inserting dbg.value intrinsics when the value does not, for certain, cover the variable/fragment that should be described. In r334830 we assumed that the value always covered the entire var/fragment and we had assertions in the code to show that assumption. However, those asserts failed when compiling code with VLAs, so we removed the asserts in r335031. Now when we know that the valueCoversEntireFragment check can fail also for PHI/Load instructions we avoid to insert the faulty dbg.value intrinsic in such situations. Compared to the Store instruction scenario we simply drop the dbg.value here (as the variable does not change its value due to PHI/Load, so an earlier dbg.value describing the variable should still be valid). Reviewers: aprantl, vsk, efriedma Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48547 llvm-svn: 335580	2018-06-26 06:17:00 +00:00
Gil Rapaport	da2e2caa6c	[InstCombine] (A + 1) + (B ^ -1) --> A - B Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B). This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B. Differential Revision: https://reviews.llvm.org/D48535 llvm-svn: 335579	2018-06-26 05:31:18 +00:00
Chandler Carruth	1652996fd6	[PM/LoopUnswitch] Teach the new unswitch to handle nontrivial unswitching of switches. This works much like trivial unswitching of switches in that it reliably moves the switch out of the loop. Here we potentially clone the entire loop into each successor of the switch and re-point the cases at these clones. Due to the complexity of actually doing nontrivial unswitching, this patch doesn't create a dedicated routine for handling switches -- it would duplicate far too much code. Instead, it generalizes the existing routine to handle both branches and switches as it largely reduces to looping in a few places instead of doing something once. This actually improves the results in some cases with branches due to being much more careful about how dead regions of code are managed. With branches, because exactly one clone is created and there are exactly two edges considered, somewhat sloppy handling of the dead regions of code was sufficient in most cases. But with switches, there are much more complicated patterns of dead code and so I've had to move to a more robust model generally. We still do as much pruning of the dead code early as possible because that allows us to avoid even cloning the code. This also surfaced another problem with nontrivial unswitching before which is that we weren't as precise in reconstructing loops as we could have been. This seems to have been mostly harmless, but resulted in pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we have to get this right, and everything benefits from it. While the testing may seem a bit light here because we only have two real cases with actual switches, they do a surprisingly good job of exercising numerous edge cases. Also, because we share the logic with branches, most of the changes in this patch are reasonably well covered by existing tests. The new unswitch now has all of the same fundamental power as the old one with the exception of the single unsound case of partial switch unswitching -- that really is just loop specialization and not unswitching at all. It doesn't fit into the canonicalization model in any way. We can add a loop specialization pass that runs late based on profile data if important test cases ever come up here. Differential Revision: https://reviews.llvm.org/D47683 llvm-svn: 335553	2018-06-25 23:32:54 +00:00
Sanjay Patel	0c90400bf2	[InstCombine] add/move tests for udiv; NFC llvm-svn: 335544	2018-06-25 22:27:36 +00:00
Sanjay Patel	6a96d90acd	[InstCombine] fold sdiv with sext bool divisor llvm-svn: 335527	2018-06-25 21:39:41 +00:00
Sanjay Patel	46f9b8c333	[InstCombine] add tests for sdiv with sext bool divisor; NFC llvm-svn: 335526	2018-06-25 21:36:09 +00:00
Florian Hahn	b10b141a79	Revert r335513: [SCEVExp] Advance found insertion point llvm-svn: 335522	2018-06-25 20:55:26 +00:00
Florian Hahn	0b3ed5742a	Force vector width for scev-expander-debug.ll test llvm-svn: 335520	2018-06-25 20:40:50 +00:00
Florian Hahn	5947c17fd4	[SCEVExp] Advance found insertion point until we find a non-dbg instruction. This avoids creating unnecessary casts if the IP used to be a dbg info intrinsic. Fixes PR37727. Reviewers: vsk, aprantl, sanjoy, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D47874 llvm-svn: 335513	2018-06-25 19:17:29 +00:00
Sanjay Patel	1e911fa746	[InstSimplify] fold div/rem of zexted bool I was looking at an unrelated fold and noticed that we don't have this simplification (because the other fold would break existing tests). Name: zext udiv %z = zext i1 %x to i32 %r = udiv i32 %y, %z => %r = %y Name: zext urem %z = zext i1 %x to i32 %r = urem i32 %y, %z => %r = 0 Name: zext sdiv %z = zext i1 %x to i32 %r = sdiv i32 %y, %z => %r = %y Name: zext srem %z = zext i1 %x to i32 %r = srem i32 %y, %z => %r = 0 https://rise4fun.com/Alive/LZ9 llvm-svn: 335512	2018-06-25 18:51:21 +00:00
Sanjay Patel	a46bcbec58	[InstSimplify] add tests for div/rem with bool divisor; NFC llvm-svn: 335509	2018-06-25 18:27:14 +00:00
Sanjay Patel	2e8babb4fa	[InstCombine] add tests for add-of-sext-bool; NFC We canonicalize to select with a zext-add and either zext-sub or sext-sub, so this shows a pattern that's not conforming to the general trend. llvm-svn: 335506	2018-06-25 17:52:10 +00:00
Stanislav Mekhanoshin	d8c9374797	Fix invariant fdiv hoisting in LICM FDiv is replaced with multiplication by reciprocal and invariant reciprocal is hoisted out of the loop, while multiplication remains even if invariant. Switch checks for all invariant operands and only invariant denominator to fix the issue. Differential Revision: https://reviews.llvm.org/D48447 llvm-svn: 335411	2018-06-23 04:01:28 +00:00
Eli Friedman	203eaaf5ba	[LoopReroll] Rewrite induction variable rewriting. This gets rid of a bunch of weird special cases; instead, just use SCEV rewriting for everything. In addition to being simpler, this fixes a bug where we would use the wrong stride in certain edge cases. The one bit I'm not quite sure about is the trip count handling, specifically the FIXME about overflow. In general, I think we need to widen the exit condition, but that's probably not profitable if the new type isn't legal, so we probably need a check somewhere. That said, I don't think I'm making the existing problem any worse. As a followup to this, a bunch of IV-related code in root-finding could be cleaned up; with SCEV-based rewriting, there isn't any reason to assume a loop will have exactly one or two PHI nodes. Differential Revision: https://reviews.llvm.org/D45191 llvm-svn: 335400	2018-06-22 22:58:55 +00:00
Simon Pilgrim	9d3ef8ee2b	[SLPVectorizer] Support alternate opcodes in tryToVectorizeList Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree. NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc. Differential Revision: https://reviews.llvm.org/D48488 llvm-svn: 335364	2018-06-22 16:37:34 +00:00
Simon Pilgrim	1e564504bb	[SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle. This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle. Differential Revision: https://reviews.llvm.org/D48477 llvm-svn: 335349	2018-06-22 14:04:06 +00:00
Simon Pilgrim	229a781214	[SLPVectorizer][X86] Add alternate opcode tests for simple build vector cases llvm-svn: 335348	2018-06-22 13:53:58 +00:00
Sanjay Patel	c2b37f73f5	[InstCombine] add shuffle+binops test from PR37806; NFC This one shows another pattern that we'll need to match in some cases, but the current ordering of folds allows us to match this as 2 binops before simplification takes place. llvm-svn: 335347	2018-06-22 13:44:42 +00:00
Sanjay Patel	cfd9da038c	[InstCombine] add tests for shuffle-with-different-binops; NFC llvm-svn: 335345	2018-06-22 13:19:25 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Eugene Leviant	6d711ca168	Revert r335324 due to a builtbot failure llvm-svn: 335327	2018-06-22 08:57:01 +00:00
Eugene Leviant	ea19c9473c	[Evaluator] Improve evaluation of call instruction Differential revision: https://reviews.llvm.org/D46584 llvm-svn: 335324	2018-06-22 08:29:36 +00:00
Chandler Carruth	fe70b29cf7	[LegacyPM] Fix PR37888 by teaching the legacy loop pass manager how to clear out deleted loops from the current queue beyond just the current loop. This is important because SimpleLoopUnswitch will now enqueue the same loop to be re-processed. When it does this with the legacy PM, we don't have a way of canceling the rest of the pipeline and so we can end up deleting the loop before we reprocess it. =/ This change also makes it easy to support deleting other loops in the queue to process, although I don't have any use cases for that. Differential Revision: https://reviews.llvm.org/D48470 llvm-svn: 335317	2018-06-22 02:43:41 +00:00
Sanjay Patel	4784e1506e	[InstCombine] fix shuffle-of-binops bug With non-commutative binops, we could be using the same variable value as operand 0 in 1 binop and operand 1 in the other, so we have to check for that possibility and bail out. llvm-svn: 335312	2018-06-21 23:56:59 +00:00
Sanjay Patel	23408c25f3	[InstCombine] add test for shuffle-of-binops; NFC This shows a miscompile that was missed in rL335283. llvm-svn: 335311	2018-06-21 23:53:01 +00:00
Matthew Voss	30648ab233	[GVN] Avoid casting a vector of size less than 8 bits to i8 Summary: A reprise of D25849. This crash was found through fuzzing some time ago and was documented in PR28879. No check for load size has been added due to the following tests: - Transforms/GVN/invariant.group.ll - Transforms/GVN/pr10820.ll These tests expect load sizes that are not a multiple of eight. Thanks to @davide for the original patch. Reviewers: nlopes, davide, RKSimon, reames, efriedma Reviewed By: efriedma Subscribers: davide, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D48330 llvm-svn: 335294	2018-06-21 21:43:20 +00:00
Sanjay Patel	a76b70069d	[InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806) This is the simplest case from PR37806: https://bugs.llvm.org/show_bug.cgi?id=37806 If we have a common variable operand used in a pair of binops with vector constants that are vector selected together, then we can constant shuffle the constant vectors to eliminate the shuffle instruction. This has some tricky parts that are hopefully addressed in the tests and their respective comments: 1. If the shuffle mask contains an undef element, then that lane of the result is undef: http://llvm.org/docs/LangRef.html#shufflevector-instruction Therefore, we can replace the constant in that lane with an undef value except for div/rem. With div/rem, an undef in the divisor would cause the whole op to be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'. 2. Intersect the wrapping and FMF of the original binops for the new binop. There should be no extra poison or fast-math potential in the new binop that wasn't possible in the original code. 3. Disregard other uses. Given that we're eliminating uses (shortening the dependency chain), I think that's always the right IR canonicalization. But I purposely chose the udiv test to demonstrate the scenario where both intermediate values have other uses because that seems likely worse for codegen with an expensive math op. This seems like a very rare possibility to me, so I don't think it requires a backend patch first. Differential Revision: https://reviews.llvm.org/D48401 llvm-svn: 335283	2018-06-21 20:15:09 +00:00
Francis Visoiu Mistrih	ac599b6951	Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions." This reverts commit r335206. As discussed here: https://reviews.llvm.org/rL333740, a fix will come tomorrow. In the meanwhile, revert this to fix some bots. llvm-svn: 335272	2018-06-21 19:18:36 +00:00
Sanjay Patel	3382dc644e	[InstCombine] add tests for shuffled cmps; NFC llvm-svn: 335266	2018-06-21 18:07:38 +00:00
Sanjay Patel	3244537a3c	[InstCombine] use constant pattern matchers with icmp+sext The previous code worked with vectors, but it failed when the vector constants contained undef elements. The matchers handle those cases. llvm-svn: 335262	2018-06-21 17:51:44 +00:00
Sanjay Patel	5522e968ad	[InstCombine] add vector icmp tests with undefs; NFC llvm-svn: 335261	2018-06-21 17:37:14 +00:00
Sanjay Patel	447e8ece4d	[LoopVectorize] regenerate full checks; NFC llvm-svn: 335257	2018-06-21 16:54:32 +00:00
Nicolai Haehnle	b29ee70122	InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded Summary: Use the expanded features of the TableGen generic tables to avoid manually adding the combinatorially exploded set of intrinsics. The getAMDGPUImageDimIntrinsic lookup function is early-out, i.e. non-AMDGPU intrinsics will never look at the underlying table. Use a generic approach for getting the new intrinsic overload to keep the code simple, and make the image dmask handling more generic: - handle non-sampler image loads - handle the case where the set of demanded elements is not a prefix There is some overlap between this code and an optimization that happens in the backend during code generation. They currently complement each other: - only the codegen optimization can generate vec3 loads - only the InstCombine optimization can handle D16 The InstCombine optimization also likely covers more cases since the codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization in codegen once the infrastructure for vec3 is in place (which will probably take a long time). Modify the test cases to use dimension-aware intrinsics. This makes it easier to see that the test coverage for the new intrinsics is equivalent, and the old style intrinsics will be removed in a follow-up commit anyway. Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad Reviewers: arsenm, rampitec, majnemer Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48165 llvm-svn: 335230	2018-06-21 13:37:31 +00:00
David Green	d143c65de3	[DA] Enable -da-delinearize by default This enables da-delinearize in Dependence Analysis for delinearizing array accesses into multiple dimensions. This can help to increase the power of Dependence analysis on multi-dimensional arrays and prevent having to fall back to the slower and less accurate MIV tests. It adds static checks on the bounds of the arrays to ensure that one dimension doesn't overflow into another, and brings our code in line with our tests. Differential Revision: https://reviews.llvm.org/D45872 llvm-svn: 335217	2018-06-21 11:53:16 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Simon Pilgrim	d08fbf6486	[SLPVectorizer][X86] Add horizontal add/sub tests Shows PR37882 perf regression llvm-svn: 335215	2018-06-21 11:16:10 +00:00
Florian Hahn	d36aa1f763	Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions. r335150 should resolve the issues with the clang-with-thin-lto-ubuntu and clang-with-lto-ubuntu builders. Original message: This patch updates IPSCCP to use PredicateInfo to propagate facts to true branches predicated by EQ and to false branches predicated by NE. As a follow up, we should be able to extend it to also propagate additional facts about nonnull. Reviewers: davide, mssimpso, dberlin, efriedma Reviewed By: davide, dberlin llvm-svn: 335206	2018-06-21 07:15:08 +00:00
Chandler Carruth	d1dab0c3c0	[PM/LoopUnswitch] Add partial non-trivial unswitching for invariant conditions feeding a chain of `and`s or `or`s for a branch. Much like with full non-trivial unswitching, we rely on the pass manager to handle iterating until all of the profitable unswitches have been done. This is to allow other more profitable unswitches to fire on any of the cloned, simpler versions of the loop if viable. Threading the partial unswiching through the non-trivial unswitching logic motivated some minor refactorings. If those are too disruptive to make it reasonable to review this patch, I can separate them out, but it'll be somewhat timeconsuming so I wanted to send it for initial review as-is. Feel free to tell me whether it warrants pulling apart. I've tried to re-use (and factor out) logic form the partial trivial unswitching, but not as much could be shared as I had haped. Still, this wasn't as bad as I naively expected. Some basic testing is added, but I probably need more. Suggestions for things you'd like to see tested more than welcome. One thing I'd like to do is add some testing that when we schedule this with loop-instsimplify it effectively cleans up the cruft created. Last but not least, this uncovered a bug that has been in loop cloning the entire time for non-trivial unswitching. Specifically, we didn't correctly add the outer-most cloned loop to the list of cloned loops. This meant that LCSSA wouldn't be updated for it hypothetically, and more significantly that we would never visit it in the loop pass manager. I noticed this while checking loop-instsimplify by hand. I'll try to separate this bugfix out into its own patch with a more focused test. But it is just one line, so shouldn't significantly confuse the review here. After this patch, the only missing "feature" in this unswitch I'm aware of us non-trivial unswitching of switches. I'll try implementing full non-trivial unswitching of switches (which is at least a sound thing to implement), but partial non-trivial unswitching of switches is something I don't see any sound and principled way to implement. I also have no interesting test cases for the latter, so I'm not really worried. The rest of the things that need to be ported are bug-fixes and more narrow / targeted support for specific issues. Differential Revision: https://reviews.llvm.org/D47522 llvm-svn: 335203	2018-06-21 06:14:03 +00:00
Alina Sbirlea	dfd14adeb0	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183	2018-06-20 22:01:04 +00:00
Sanjay Patel	88de778b9b	[InstCombine] fix typo in test comment; NFC llvm-svn: 335165	2018-06-20 20:16:45 +00:00
Chandler Carruth	4da3331d3d	[PM/LoopUnswitch] Support partial trivial unswitching. The idea of partial unswitching is to take a part of a branch's condition that is loop invariant and just unswitching that part. This primarily makes sense with i1 conditions of branches as opposed to switches. When dealing with i1 conditions, we can easily extract loop invariant inputs to a a branch and unswitch them to test them entirely outside the loop. As part of this, we now create much more significant cruft in the loop body, so this relies on adding cleanup passes to the loop pipeline and revisiting unswitched loops to do that cleanup before continuing to process them. This already appears to be more powerful at unswitching than the old loop unswitch pass, and so I'd appreciate pretty careful review in case I'm just missing some correctness checks. The `LIV-loop-condition` test case is not unswitched by the old unswitch pass, but is with this pass. Thanks to Sanjoy and Fedor for the review! Differential Revision: https://reviews.llvm.org/D46706 llvm-svn: 335156	2018-06-20 18:57:07 +00:00
Sanjay Patel	74442061f2	[InstCombine] add vector select of binops tests (PR37806) These represent the most basic requested transform - a matching operand and 2 constant operands. llvm-svn: 335151	2018-06-20 17:48:43 +00:00
Florian Hahn	5ac2629823	[PredicateInfo] Order instructions in different BBs by DFSNumIn. Using OrderedInstructions::dominates as comparator for instructions in BBs without dominance relation can cause a non-deterministic order between such instructions. That in turn can cause us to materialize copies in a non-deterministic order. While this does not effect correctness, it causes some minor non-determinism in the final generated code, because values have slightly different labels. Without this patch, running -print-predicateinfo on a reasonably large module produces slightly different output on each run. This patch uses the dominator trees DFSInNum to order instruction from different BBs, which should enforce a deterministic ordering and guarantee that dominated instructions come after the instructions that dominate them. Reviewers: dberlin, efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D48230 llvm-svn: 335150	2018-06-20 17:42:01 +00:00
Simon Pilgrim	2e2f20a949	[SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand. This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation. The AArch64 regression will be fixed by D48172. Differential Revision: https://reviews.llvm.org/D48174 llvm-svn: 335130	2018-06-20 14:26:28 +00:00
Sanjay Patel	0c57de4c21	[InstSimplify] Fix missed optimization in simplifyUnsignedRangeCheck() For both operands are unsigned, the following optimizations are valid, and missing: 1. X > Y && X != 0 --> X > Y 2. X > Y \|\| X != 0 --> X != 0 3. X <= Y \|\| X != 0 --> true 4. X <= Y \|\| X == 0 --> X <= Y 5. X > Y && X == 0 --> false unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; } should fold to x > y, but I found we haven't done it right now. besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; } Has been folded to x < y, so there may be a bug. Patch by: Li Jia He! Differential Revision: https://reviews.llvm.org/D47922 llvm-svn: 335129	2018-06-20 14:22:49 +00:00
Sanjay Patel	c1d2177f9d	[InstSimplify] Add tests for missed optimizations in simplifyUnsignedRangeCheck (NFC) These are the baseline tests for the functional change in D47922. Patch by Li Jia He! Differential Revision: https://reviews.llvm.org/D48000 llvm-svn: 335128	2018-06-20 14:03:13 +00:00
Sanjay Patel	825a4faa8d	[InstCombine] ignore debuginfo when removing redundant assumes (PR37726) This is similar to: rL335083 Fixes:: https://bugs.llvm.org/show_bug.cgi?id=37726 llvm-svn: 335121	2018-06-20 13:22:26 +00:00
Mikhail Dvoretckii	8393f90717	[InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil This patch replaces calls to X86-specific intrinsics with floor-ceil semantics with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations. Differential Revision: https://reviews.llvm.org/D48067 llvm-svn: 335039	2018-06-19 10:49:12 +00:00
David Green	e6a9c24878	[LoopSimplifyCFG] Invalidate SCEV in LoopSimplifyCFG LoopSimplifyCFG, being a loop pass, needs to preserve scalar evolution. This invalidates SE for the loops altered during block merging. Differential Revision: https://reviews.llvm.org/D48258 llvm-svn: 335036	2018-06-19 09:43:36 +00:00
Max Kazantsev	37da4333a8	[SimplifyIndVars] Eliminate redundant truncs This patch adds logic to deal with the following constructions: %iv = phi i64 ... %trunc = trunc i64 %iv to i32 %cmp = icmp <pred> i32 %trunc, %invariant Replacing it with %iv = phi i64 ... %cmp = icmp <pred> i64 %iv, sext/zext(%invariant) In case if it is legal. Specifically, if `%iv` has signed comparison users, it is required that `sext(trunc(%iv)) == %iv`, and if it has unsigned comparison uses then we require `zext(trunc(%iv)) == %iv`. The current implementation bails if `%trunc` has other uses than `icmp`, but in theory we can handle more cases here (e.g. if the user of trunc is bitcast). Differential Revision: https://reviews.llvm.org/D47928 Reviewed By: reames llvm-svn: 335020	2018-06-19 04:48:34 +00:00
Sanjoy Das	6e9b355cc9	Revert "[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags" This reverts r334428. It incorrectly marks some multiplications as nuw. Tim Shen is working on a proper fix. Original commit message: [SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe. Summary: Previously we would add them for adds, but not multiplies. llvm-svn: 335016	2018-06-19 04:09:44 +00:00
Xin Tong	54b4227f32	Revert "Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor" This reverts commit f976cf4cca0794267f28b54e468007fd476d37d9. I am reverting this because it causes break in a few bots and its going to take me sometime to look at this. llvm-svn: 334993	2018-06-18 23:20:08 +00:00
Xin Tong	bfd8cfcb8d	Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor Summary: Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor This is a missing small optimization in MergeBlockIntoPredecessor. This helps with one simplifycfg test which expects this case to be handled. Reviewers: davide, spatel, brzycki, asbirlea Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48284 llvm-svn: 334992	2018-06-18 22:59:13 +00:00
Diego Caballero	72aed5e5dc	Move redundant-vf2-cost.ll test to X86 directory redundant-vf2-cost.ll is X86 specific. Moved from test/Transforms/LoopVectorize/redundant-vf2-cost.ll to test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll llvm-svn: 334854	2018-06-15 18:46:03 +00:00
Tomasz Krupa	bcaab53d47	[X86] Lowering sqrt intrinsics to native IR Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849	2018-06-15 18:05:24 +00:00
Joseph Tremoulet	6f406d4f02	[InstCombine] Avoid iteration/mutation conflict Summary: When iterating users of a multiply in processUMulZExtIdiom, the call to setOperand in the truncation case may replace the use being visited; make sure the iterator has been advanced before doing that replacement. Reviewers: majnemer, davide Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48192 llvm-svn: 334844	2018-06-15 16:52:40 +00:00
Diego Caballero	68795245cf	[LV] Prevent LV to run cost model twice for VF=2 This is a minor fix for LV cost model, where the cost for VF=2 was computed twice when the vectorization of the loop was forced without specifying a VF. Reviewers: xusx595, hsaito, fhahn, mkuper Reviewed By: hsaito, xusx595 Differential Revision: https://reviews.llvm.org/D48048 llvm-svn: 334840	2018-06-15 16:21:35 +00:00
Bjorn Pettersson	428caf988b	Re-apply "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue" This is r334704 (which was reverted in r334732) with a fix for types like x86_fp80. We need to use getTypeAllocSizeInBits and not getTypeStoreSizeInBits to avoid dropping debug info for such types. Original commit msg: > Summary: > Do not convert a DbgDeclare to DbgValue if the store > instruction only refer to a fragment of the variable > described by the DbgDeclare. > > Problem was seen when for example having an alloca for an > array or struct, and there were stores to individual elements. > In the past we inserted a DbgValue intrinsics for each store, > just as if the store wrote the whole variable. > > When handling store instructions we insert a DbgValue that > indicates that the variable is "undefined", as we do not know > which part of the variable that is updated by the store. > > When ConvertDebugDeclareToDebugValue is used with a load/phi > instruction we assert that the referenced value is large enough > to cover the whole variable. Afaict this should be true for all > scenarios where those methods are used on trunk. If the assert > blows in the future I guess we could simply skip to insert a > dbg.value instruction. > > In the future I think we should examine which part of the variable > that is accessed, and add a DbgValue instrinsic with an appropriate > DW_OP_LLVM_fragment expression. > > Reviewers: dblaikie, aprantl, rnk > > Reviewed By: aprantl > > Subscribers: JDevlieghere, llvm-commits > > Tags: #debug-info > > Differential Revision: https://reviews.llvm.org/D48024 llvm-svn: 334830	2018-06-15 13:48:55 +00:00
Simon Pilgrim	180497ea11	[SLP][X86] Add AVX2 run to POW2 SDIV Tests Non-uniform pow2 tests are only make sense on targets with fast (low cost) non-uniform shifts llvm-svn: 334821	2018-06-15 10:29:37 +00:00
Simon Pilgrim	ca6215f8c8	[SLP][X86] Regenerate POW2 SDIV Tests Added non-uniform pow2 test as well llvm-svn: 334819	2018-06-15 10:07:03 +00:00
Roman Lebedev	84c11aed10	[InstCombine] Recommit: Fold (x << y) >> y -> x & (-1 >> y) Summary: We already do it for splat constants, but not just values. Also, undef cases are mostly non-functional. The original commit was reverted because it broke tests for amdgpu backend, which i didn't check. Now, the backed was updated to recognize these new patterns, so we are good. https://bugs.llvm.org/show_bug.cgi?id=37603 https://rise4fun.com/Alive/cplX Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm Reviewed By: spatel, rampitec, nhaehnle Subscribers: wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D47980 llvm-svn: 334818	2018-06-15 09:56:52 +00:00
Mikhail Dvoretckii	0531ec654a	NFC: Regenerating x86-sse41.ll test for InstCombine Test regenerated to reduce noise in further patches. llvm-svn: 334806	2018-06-15 07:59:29 +00:00
Eli Friedman	3f1ce093ea	Make uitofp and sitofp defined on overflow. IEEE 754 defines the expected result on overflow. As far as I know, hardware implementations (of f16), and compiler-rt (__floatuntisf) correctly return +-Inf on overflow. And I can't think of any useful transform that would take advantage of overflow being undefined here. Differential Revision: https://reviews.llvm.org/D47807 llvm-svn: 334777	2018-06-14 22:58:48 +00:00
Bjorn Pettersson	972fd1c9e7	Revert rL334704: "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue" This reverts commit r334704. Buildbots detected an assertion in "test tsan in debug compiler-rt build". llvm-svn: 334732	2018-06-14 16:08:22 +00:00
Max Kazantsev	ff6d1c9188	[EarlyCSE] Propagate conditions of AND and OR instructions This patches teaches EarlyCSE to figure out that if `and i1 %x, %y` is true then both `%x` and `%y` are true in the taken branch, and if `or i1 %x, %y` is false then both `%x` and `%y` are false in non-taken branch. Fix for PR37635. Differential Revision: https://reviews.llvm.org/D47574 Reviewed By: reames llvm-svn: 334707	2018-06-14 13:02:13 +00:00
Bjorn Pettersson	e406b29c22	[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue Summary: Do not convert a DbgDeclare to DbgValue if the store instruction only refer to a fragment of the variable described by the DbgDeclare. Problem was seen when for example having an alloca for an array or struct, and there were stores to individual elements. In the past we inserted a DbgValue intrinsics for each store, just as if the store wrote the whole variable. When handling store instructions we insert a DbgValue that indicates that the variable is "undefined", as we do not know which part of the variable that is updated by the store. When ConvertDebugDeclareToDebugValue is used with a load/phi instruction we assert that the referenced value is large enough to cover the whole variable. Afaict this should be true for all scenarios where those methods are used on trunk. If the assert blows in the future I guess we could simply skip to insert a dbg.value instruction. In the future I think we should examine which part of the variable that is accessed, and add a DbgValue instrinsic with an appropriate DW_OP_LLVM_fragment expression. Reviewers: dblaikie, aprantl, rnk Reviewed By: aprantl Subscribers: JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D48024 llvm-svn: 334704	2018-06-14 11:23:42 +00:00
Max Kazantsev	0ed79620c6	[SimplifyIndVars] Ignore dead users IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular, if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening transforms, however it makes no sense to do it. This patch teaches IndVarsSimplify ignore the mist trivial cases of that. Differential Revision: https://reviews.llvm.org/D47974 Reviewed By: sanjoy llvm-svn: 334567	2018-06-13 02:25:32 +00:00
Wei Mi	a0c0857e7a	[SampleFDO] Add a new compact binary format for sample profile. Name table occupies a big chunk of size in current binary format sample profile. In order to reduce its size, the patch changes the sample writer/reader to save/restore MD5Hash of names in the name table. Sample annotation phase will also use MD5Hash of name to query samples accordingly. Experiment shows compact binary format can reduce the size of sample profile by 2/3 compared with binary format generally. Differential Revision: https://reviews.llvm.org/D47955 llvm-svn: 334447	2018-06-11 22:40:43 +00:00
Farhana Aleen	078cd48a39	[SLP] Add testcases of min/max reduction pattern for AMDGPU. Author: FarhanaAleen llvm-svn: 334435	2018-06-11 20:29:31 +00:00
Justin Lebar	aa4fec94d8	[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe. Summary: Previously we would add them for adds, but not multiplies. Reviewers: sanjoy Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D48038 llvm-svn: 334428	2018-06-11 18:57:42 +00:00
Roman Lebedev	ebb3252f00	Revert rL334371 / D47980: "[InstCombine] Fold (x << y) >> y -> x & (-1 >> y)" test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll broke, and i did not notice because i did not build that backend. llvm-svn: 334373	2018-06-10 20:32:03 +00:00
Roman Lebedev	eb795a0661	[InstCombine] Fold (x >> y) << y -> x & (-1 << y) Summary: We already do it for matching splat constants, but not just values. Further improvements for non-matching splat constants, as noted in https://reviews.llvm.org/D46760#1123713 will be needed, but i'd prefer to do that as a follow-up. https://bugs.llvm.org/show_bug.cgi?id=37603 https://rise4fun.com/Alive/cplX https://rise4fun.com/Alive/0HF Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47981 llvm-svn: 334372	2018-06-10 20:10:13 +00:00
Roman Lebedev	4cdc59ecf2	[InstCombine] Fold (x << y) >> y -> x & (-1 >> y) Summary: We already do it for splat constants, but not just values. Also, undef cases are mostly non-functional. https://bugs.llvm.org/show_bug.cgi?id=37603 https://rise4fun.com/Alive/cplX Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47980 llvm-svn: 334371	2018-06-10 20:10:06 +00:00
Roman Lebedev	1e9457ee76	[NFC][InstCombine] Revisit tests for D47980 / D47981 once more. llvm-svn: 334370	2018-06-10 20:10:00 +00:00
Craig Topper	98a79934af	[X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead. llvm-svn: 334358	2018-06-10 06:01:36 +00:00
Roman Lebedev	ff7652e7ab	[NFC][InstCombine] More tests for (x >> y) << y -> x & (-1 << y) fold. Followup for rL334347. The fold is also valid for ashr. https://rise4fun.com/Alive/0HF https://bugs.llvm.org/show_bug.cgi?id=37603 https://reviews.llvm.org/D46760#1123713 https://rise4fun.com/Alive/cplX llvm-svn: 334349	2018-06-09 14:01:46 +00:00
Roman Lebedev	8aada65f81	[NFC][InstCombine] Tests for (x >> y) << y -> x & (-1 << y) fold. We already do it for splat constants, but not just values. Also, undef cases are mostly non-functional. https://bugs.llvm.org/show_bug.cgi?id=37603 https://reviews.llvm.org/D46760#1123713 https://rise4fun.com/Alive/cplX llvm-svn: 334347	2018-06-09 09:27:43 +00:00
Roman Lebedev	794e29f964	[NFC][InstCombine] Tests for (x << y) >> y -> x & (-1 >> y) fold. We already do it for splat constants, but not just values. Also, undef cases are mostly non-functional. https://bugs.llvm.org/show_bug.cgi?id=37603 https://rise4fun.com/Alive/cplX llvm-svn: 334346	2018-06-09 09:27:39 +00:00
Davide Italiano	189c2cf114	[InstCombine] Skip dbg.value(s) when looking at stack{save,restore}. Fixes PR37713. llvm-svn: 334317	2018-06-08 20:42:36 +00:00
Sanjay Patel	afcf39e1f9	[InstCombine] add llvm.assume + debuginfo test (PR37726); NFC llvm-svn: 334314	2018-06-08 18:47:33 +00:00
Daniil Fukalov	37433dc2e1	reapply r334209 with fixes for harfbuzz in Chromium r334209 description: [LSR] Check yet more intrinsic pointer operands the patch fixes another assertion in isLegalUse() Differential Revision: https://reviews.llvm.org/D47794 llvm-svn: 334300	2018-06-08 16:22:52 +00:00
Roman Lebedev	b060ce45ca	[InstSimplify] add nuw %x, -1 -> -1 fold. Summary: `%ret = add nuw i8 %x, C` From [[ https://llvm.org/docs/LangRef.html#add-instruction \| langref ]]: nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs. So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`. I'm not sure we want to use `KnownBits`/`LVI` here, because there is exactly one possible value (all bits set, `-1`), so some other pass should take care of replacing the known-all-ones with constant `-1`. The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change is confusing. What happening is, before this: (omitting `nuw` for simplicity) 1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits` 2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`, 3. `-1` is inverted to `0`. But now: 1. This InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first, before InstCombine D47428/rL334127 fold could happen. Thus we now end up with the opposite constant, and it is all good: https://rise4fun.com/Alive/OA9 https://rise4fun.com/Alive/sldC Was mentioned in D47428 review. Follow-up for D47883. Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47908 llvm-svn: 334298	2018-06-08 15:44:47 +00:00
Roman Shirokiy	9ba0aa2da0	[LV] Fix PR36983. For a given recurrence, fix all phis in exit block There could be more than one PHIs in exit block using same loop recurrence. Don't assume there is only one and fix each user. Differential Revision: https://reviews.llvm.org/D47788 llvm-svn: 334271	2018-06-08 08:21:20 +00:00
Reid Kleckner	a3609f75b2	Revert r334209 "[LSR] Check yet more intrinsic pointer operands" This causes cast failures when compiling harfbuzz in Chromium. Reproducer on the way. llvm-svn: 334254	2018-06-08 00:43:27 +00:00

... 5 6 7 8 9 ...

11509 Commits