llvm-project

Commit Graph

Author	SHA1	Message	Date
Igor Laevsky	b40152d5d1	[DeadStoreElimination] Check function modref behavior before considering memory clobbered Differential Revision: https://reviews.llvm.org/D29996 llvm-svn: 296625	2017-03-01 14:38:29 +00:00
Alexey Bataev	4a45efa431	[SLP] Preserve IR flags when vectorizing horizontal reductions. Summary: The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar horizontal reductions instructions into vectors, just like for other vectorized instructions. It doe not include IR propagation for extra arguments, we need to handle original scalar operations for extra args to propagate correct flags. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30418 llvm-svn: 296614	2017-03-01 12:43:39 +00:00
Alexey Bataev	74e5a36856	[SLP] Preserve IR flags for extra args. Summary: We should preserve IR flags for extra args. These IR flags should be taken from original scalar operations, not from the reduction operations. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30447 llvm-svn: 296613	2017-03-01 12:22:33 +00:00
Alexey Bataev	dfec81107f	[SLP] Fix for PR32038: extra add of PHI node when it is not required. Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606	2017-03-01 10:50:44 +00:00
Mikael Holmen	760dc9aba7	Remove sometimes faulty rewrite of memcpy in instcombine. Summary: Solves PR 31990. The bad rewrite could replace a memcpy of one word with store i4 -1 while it should actually be store i8 -1 Hopefully opt and llc has improved enough so the original optimization done by the code isn't needed anymore. One already existing testcase is affected. It originally tested that the memcpy was replaced with load double but since we now remove that rewrite it will be load i64 instead. Patch suggestion by Eli Friedman. Reviewers: eli.friedman, majnemer, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D30254 llvm-svn: 296585	2017-03-01 06:45:20 +00:00
Adam Nemet	15032a0455	[LV] These remark should have been missed remarks The practice in LV is that we emit analysis remarks and then finally report either a missed or applied remark on the final decision whether vectorization is taking place. On this code path, we were closing with an analysis remark. llvm-svn: 296578	2017-03-01 04:31:15 +00:00
Mohammad Shahid	175ffa8c35	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d llvm-svn: 296575	2017-03-01 03:51:54 +00:00
Adam Nemet	abe46080d4	Revert "(HEAD, origin/master, origin/HEAD, master) [LV] These should missed remarks" This reverts commit r296544. This got committed by accident. llvm-svn: 296546	2017-02-28 23:54:27 +00:00
Adam Nemet	e193da191d	[LV] These should missed remarks llvm-svn: 296544	2017-02-28 23:48:58 +00:00
Daniel Berlin	03f6938edc	Fix PR 24415 (at least), by making our post-dominator tree behavior sane. Summary: Currently, our post-dom tree tries to ignore and remove the effects of infinite loops. It fails miserably at this, because it tries to do it ahead of time, and thus can only detect self-loops, and any other type of infinite loop, it pretends doesn't exist at all. This can, in a bunch of cases, lead to wrong answers and a completely empty post-dom tree. Wrong answer: ``` declare void foo() define internal void @f() { entry: br i1 undef, label %bb35, label %bb3.i bb3.i: call void @foo() br label %bb3.i bb35.loopexit3: br label %bb35 bb35: ret void } ``` We get: ``` Inorder PostDominator Tree: [1] <<exit node>> {0,7} [2] %bb35 {1,6} [3] %bb35.loopexit3 {2,3} [3] %entry {4,5} ``` This is a trivial modification of the testcase for PR 6047 Note that we pretend bb3.i doesn't exist. We also pretend that bb35 post-dominates entry. While it's true that it does not exit in a theoretical sense, it's not really helpful to try to ignore the effect and pretend that bb35 post-dominates entry. Worse, we pretend the infinite loop does nothing (it's usually considered a side-effect), and doesn't even exist, even when it calls a function. Sadly, this makes it impossible to use when you are trying to move code safely. All compilers also create virtual or real single exit nodes (including us), and connect infinite loops there (which this patch does). In fact, others have worked around our behavior here, to the point of building their own post-dom trees: https://zneak.github.io/fcd/2016/02/17/structuring.html and pointing out the region infrastructure is near-useless for them with postdom in this state :( Completely empty post-dom tree: ``` define void @spam() #0 { bb: br label %bb1 bb1: ; preds = %bb1, %bb br label %bb1 bb2: ; No predecessors! ret void } ``` Printing analysis 'Post-Dominator Tree Construction' for function 'foo': =============================-------------------------------- Inorder PostDominator Tree: [1] <<exit node>> {0,1} :( (note that even if you ignore the effects of infinite loops, bb2 should be present as an exit node that post-dominates nothing). This patch changes post-dom to properly handle infinite loops and does root finding during calculation to prevent empty tress in such cases. We match gcc's (and the canonical theoretical) behavior for infinite loops (find the backedge, connect it to the exit block). Testcases coming as soon as i finish running this on a ton of random graphs :) Reviewers: chandlerc, davide Subscribers: bryant, llvm-commits Differential Revision: https://reviews.llvm.org/D29705 llvm-svn: 296535	2017-02-28 22:57:50 +00:00
Dehao Chen	a60cdd3881	Add function importing info from samplepgo profile to the module summary. Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D30053 llvm-svn: 296498	2017-02-28 18:09:44 +00:00
Adrian Prantl	80d0c93436	Strip debug info when inlining into a nodebug function. The LLVM backend cannot produce any debug info for an llvm::Function without a DISubprogram attachment. When inlining a debug-info-carrying function into a nodebug function, there is therefore no reason to keep any debug info intrinsic calls or debug locations on the instructions. This fixes a problem discovered in PR32042. rdar://problem/30679307 llvm-svn: 296488	2017-02-28 16:58:13 +00:00
Xin Tong	c647dcfba0	Empty line. NFCI llvm-svn: 296438	2017-02-28 05:30:48 +00:00
Xin Tong	01feb29ad2	[LoopUnswitch] Common pushing LIC's user to worklist. llvm-svn: 296432	2017-02-28 03:32:41 +00:00
Matt Arsenault	cdb468c0f9	AMDGPU: Basic folds for fmed3 intrinsic Constant fold, canonicalize constants to RHS, reduce to minnum/maxnum when inputs are nan/undef. llvm-svn: 296409	2017-02-27 23:08:49 +00:00
Petr Hosek	6f16857167	[AddressSanitizer] Put shadow at 0 for Fuchsia The Fuchsia ASan runtime reserves the low part of the address space. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30426 llvm-svn: 296405	2017-02-27 22:49:37 +00:00
Hans Wennborg	2d5841fa73	Revert r296366 "[InlineFunction] add nonnull assumptions based on argument attributes" It causes miscompiles e.g. during self-host of Clang (PR32082). llvm-svn: 296398	2017-02-27 22:33:02 +00:00
Xin Tong	fe422f7462	Empty line. NFCI llvm-svn: 296392	2017-02-27 21:51:48 +00:00
Sanjay Patel	40975e05eb	[InlineFunction] add nonnull assumptions based on argument attributes This was suggested in D27855: have the inliner add assumptions, so we don't lose nonnull info provided by argument attributes. This still doesn't solve PR28430 (dyn_cast), but this gets us closer. https://reviews.llvm.org/D29999 llvm-svn: 296366	2017-02-27 18:13:48 +00:00
Xin Tong	16b85a6601	Fix a bug when unswitching on partial LIV for SwitchInst Summary: Fix a bug when unswitching on partial LIV for SwitchInst. Reviewers: hfinkel, efriedma, sanjoy Reviewed By: sanjoy Subscribers: david2050, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D29107 llvm-svn: 296363	2017-02-27 18:00:13 +00:00
Artur Pilipenko	0860bfc676	Loop predication expand both sides of the widened condition This is a fix for a loop predication bug which resulted in malformed IR generation. Loop invariant side of the widened condition is not guaranteed to be available in the preheader as is, so we need to expand it as well. See added unsigned_loop_0_to_n_hoist_length test for example. Reviewed By: sanjoy, mkazantsev Differential Revision: https://reviews.llvm.org/D30099 llvm-svn: 296345	2017-02-27 15:44:49 +00:00
Xin Tong	3ca169f3a9	Update comments. NFCI llvm-svn: 296298	2017-02-26 19:08:44 +00:00
Davide Italiano	49a0aac1aa	[LoopDeletion] Modernize and simplify a bit. NFCI. llvm-svn: 296294	2017-02-26 07:08:20 +00:00
Xin Tong	42ef2177af	[SCCP] Remove manual folding of terminator instructions. Summary: BranchInst, SwitchInst (with non-default case) with Undef as input is not possible at this point. As we always default-fold terminator to one target in ResolvedUndefsIn and set the input accordingly. So we should only have constantint/blockaddress here. If ConstantFoldTerminator fails, that could mean 2 things. 1. ConstantFoldTerminator is doing something unexpected, i.e. not folding on constantint or blockaddress and not making blocks that should be dead dead. 2. This is not a terminator on constantint or blockaddress. Its on a constant or overdefined, then this block should not be dead. In both cases, we should assert. Reviewers: davide, efriedma, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30381 llvm-svn: 296281	2017-02-26 02:11:24 +00:00
Xin Tong	b529c66ef3	Empty line. NFCI llvm-svn: 296250	2017-02-25 08:10:28 +00:00
Akira Hatanaka	f003cd3344	Remove redundant code. NFC. llvm-svn: 296219	2017-02-25 00:59:49 +00:00
Akira Hatanaka	2b882050ce	Clean up ObjCARCOpts.cpp. NFC. I removed unused functions and variables and moved variables closer to their uses. llvm-svn: 296218	2017-02-25 00:53:38 +00:00
Yaxun Liu	e6d1ce59c0	[InstCombine] Fix bug in pointer replacement This optimisation was crashing when there was a chain of more than one bitcast instruction to replace, as a result of the changes in D27283. Patch by James Price. Differential Revision: https://reviews.llvm.org/D30347 llvm-svn: 296163	2017-02-24 20:27:25 +00:00
Matthew Simpson	bdc9c78880	[LV] Merge floating-point and integer induction widening code This patch merges the existing floating-point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating-point induction variables. Differential Revision: https://reviews.llvm.org/D30211 llvm-svn: 296145	2017-02-24 18:20:12 +00:00
Sanjay Patel	ec9a8de0e6	[InstCombine] don't try SimplifyDemandedInstructionBits from zext/sext because it's slow and unnecessary This one seems more obvious than D30270 that it can't make improvements because an extension always needs all of the incoming bits. There's one specific transform in SimplifyDemandedInstructionBits of converting a sext to a zext when the sign-bit is known zero, but that is handled explicitly in visitSext() with ComputeSignBit(). Like D30270, there are no IR differences (other than instruction names) for the case in PR32037: https://bugs.llvm.org//show_bug.cgi?id=32037 ...and no regression test differences. Zext/sext are a smaller part of the profile, but this still appears to shave off another 0.5% or so from 'opt -O2'. Differential Revision: https://reviews.llvm.org/D30280 llvm-svn: 296129	2017-02-24 15:18:42 +00:00
Xin Tong	f51d804a9f	Fix an iterator invalidation bug when simplifying LIC user. LoopUnswitch/simplify-with-nonvalness.ll is the test case for this. The LIC has 2 users and deleting the 1st user when it can be simplified invalidated the iterator for the 2nd user. llvm-svn: 296069	2017-02-24 01:43:36 +00:00
Evgeniy Stepanov	d1daf631f4	[msan] Fix instrumentation of array allocas. Before this, MSan poisoned exactly one element of any array alloca, even if the number of elements was zero. llvm-svn: 296050	2017-02-24 00:13:17 +00:00
Xin Tong	df4dff3928	Delete outdated comment. NFC llvm-svn: 296043	2017-02-23 23:47:10 +00:00
Xin Tong	ec6f90bec9	LoopUnswitch - Simplify based on known not to a be constant. Summary: In case we do not know what the condition is in an unswitched loop, but we know its definitely NOT a known constant. We can perform simplifcations based on this information. Reviewers: sanjoy, hfinkel, chenli, efriedma Reviewed By: efriedma Subscribers: david2050, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D28968 llvm-svn: 296041	2017-02-23 23:42:19 +00:00
Adam Nemet	de53bfb94d	[OptDiag] Hide legacy remark ctors These are only used when emitting remarks without ORE directly using the free functions emitOptimizationRemark*. llvm-svn: 296037	2017-02-23 23:11:11 +00:00
Hans Wennborg	5cd9a9bf8c	Revert r282872 "CVP. Turn marking adds as no wrap on by default" While not CVP's fault, this caused miscompiles (PR31181). Reverting until those are resolved. (This also reverts the follow-ups r288154 and r288161 which removed the flag.) llvm-svn: 296030	2017-02-23 22:29:00 +00:00
Dehao Chen	cc75d2441d	Add call branch annotation for ICP promoted direct call in SamplePGO mode. Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic. Reviewers: davidxl, eraman, xur Reviewed By: davidxl, xur Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D30282 llvm-svn: 296028	2017-02-23 22:15:18 +00:00
Adam Nemet	41b019a39c	[LAA] Remove unused LoopAccessReport The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296017	2017-02-23 21:17:36 +00:00
Adam Nemet	bf0e69532c	[LV] Remove unused VectorizationReport The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296016	2017-02-23 21:17:31 +00:00
Chad Rosier	95abfa35d6	[Reassociate] Add negated value of negative constant to the Duplicates list. In OptimizeAdd, we scan the operand list to see if there are any common factors between operands that can be factored out to reduce the number of multiplies (e.g., 'AA+ABC+D' -> 'A(A+BC)+D'). For each operand of the operand list, we only consider unique factors (which is tracked by the Duplicate set). Now if we find a factor that is a negative constant, we add the negated value as a factor as well, because we can percolate the negate out. However, we mistakenly don't add this negated constant to the Duplicates set. Consider the expression A2-2 + B. Obviously, nothing to factor. For the added value A2*-2 we over count 2 as a factor without this change, which causes the assert reported in PR30256. The problem is that this code is assuming that all the multiply operands of the add are already reassociated. This change avoids the issue by making OptimizeAdd tolerate multiplies which haven't been completely optimized; this sort of works, but we're doing wasted work: we'll end up revisiting the add later anyway. Another possible approach would be to enforce RPO iteration order more strongly. If we have RedoInsts, we process them immediately in RPO order, rather than waiting until we've finished processing the whole function. Intuitively, it seems like the natural approach: reassociation works on expression trees, so the optimization only works in one direction. That said, I'm not sure how practical that is given the current Reassociate; the "optimal" form for an expression depends on its use list (see all the uses of "user_back()"), so Reassociate is really an iterative optimization of sorts, so any changes here would probably get messy. PR30256 Differential Revision: https://reviews.llvm.org/D30228 llvm-svn: 296003	2017-02-23 18:49:03 +00:00
Dehao Chen	533bc6ea8e	Use base discriminator in sample pgo profile matching. Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching. Reviewers: dblaikie, davidxl Reviewed By: dblaikie, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30218 llvm-svn: 295999	2017-02-23 18:27:45 +00:00
Filipe Cabecinhas	33dd486f1d	[AddressSanitizer] Add PS4 offset llvm-svn: 295994	2017-02-23 17:10:28 +00:00
Sanjay Patel	68e4cb3c86	[InstCombine] use loop instead of recursion to peek through FPExt; NFCI llvm-svn: 295992	2017-02-23 16:39:51 +00:00
Sanjay Patel	adf2ab16e4	[InstCombine] use 'match' to reduce code; NFCI llvm-svn: 295991	2017-02-23 16:26:03 +00:00
Alexey Bataev	f77d1656af	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972	2017-02-23 13:37:09 +00:00
Alexey Bataev	14b370c1bf	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957	2017-02-23 11:09:35 +00:00
Alexey Bataev	7ae653285d	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956	2017-02-23 10:57:15 +00:00
Alexey Bataev	7337212e83	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951	2017-02-23 09:59:29 +00:00
Alexey Bataev	68f2402c61	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949	2017-02-23 09:40:38 +00:00
Matt Arsenault	f0a88dbaab	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933	2017-02-23 03:58:53 +00:00
Matt Arsenault	d4bca1e9ef	AMDGPU: Replace disabled exp inputs with undef llvm-svn: 295914	2017-02-23 00:44:03 +00:00
Michael Kuperstein	6181c62b95	Revert r295868 because it breaks a different SLP lit test. llvm-svn: 295906	2017-02-22 23:35:13 +00:00
Matt Arsenault	f5262256a1	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Sanjay Patel	4805ce0b17	[InstCombine] don't try SimplifyDemandedInstructionBits from add/sub because it's slow and unlikely to succeed Notably, no regression tests change when we remove these calls, and these are expensive calls. The motivation comes from the general acknowledgement that the compiler is getting slower: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109188.html http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html And specifically the test case attached to PR32037: https://bugs.llvm.org//show_bug.cgi?id=32037 Profiling the middle-end (opt) part of the compile: $ ./opt -O2 row_common.bc -o /dev/null ...visitAdd and visitSub are near the top of the instcombine list, and the calls to SimplifyDemandedInstructionBits() are high within each of those. Those calls account for 1%+ of the opt time in either debug or release profiles. And that's the rough win I see from this patch when testing opt built release from r295864 on an iMac with Haswell 4GHz (model 4790K). It seems unlikely that we'd be able to eliminate add/sub or change their operands given that add/sub normally affect all bits, and the PR32037 example shows no IR difference after this change using -O2. Also worth noting - the code comment in visitAdd: // This handles stuff like (X & 254)+1 -> (X&254)\|1 ...isn't true. That transform is handled later with a call to haveNoCommonBitsSet(). Differential Revision: https://reviews.llvm.org/D30270 llvm-svn: 295898	2017-02-22 23:01:12 +00:00
Daniel Berlin	fccbda967a	PredicateInfo: Support switch statements Summary: Depends on D29606 and D29682 Makes us pass GVN's edge.ll (we also will pass a few other testcases they just need cleaning up). Thoughts on the Predicate* hiearchy of classes especially welcome :) (it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something). Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29747 llvm-svn: 295889	2017-02-22 22:20:58 +00:00
Daniel Berlin	17e8d0eae2	Move updating functions to MemorySSAUpdater. Add updater to passes that now need it. Move around code in MemorySSA to expose needed functions. Summary: Mostly cleanup Reviewers: george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30221 llvm-svn: 295887	2017-02-22 22:19:55 +00:00
Wei Mi	74d5a90fa6	[LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg. After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs. Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg. But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg. Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant parts RegA and RegB are not grouped together and cannot be promoted outside of loop. With this patch, it will ensure lsr.iv to be generated later in the expr: RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop. Differential Revision: https://reviews.llvm.org/D26781 llvm-svn: 295884	2017-02-22 21:47:08 +00:00
Alexey Bataev	b551a81c28	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868	2017-02-22 20:06:40 +00:00
Karl-Johan Karlsson	6eaed7aceb	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858	2017-02-22 18:37:36 +00:00
Alexey Bataev	2e0031b371	[SLP] Remove unused initial value from the variable, NFC. llvm-svn: 295826	2017-02-22 12:57:58 +00:00
Sean Silva	9011aca5f4	Use const-ref in range-loop for to avoid copying pairs of std::string No reason to create temporaries. Differential Revision: https://reviews.llvm.org/D29871 Patch by sergio.martins! llvm-svn: 295807	2017-02-22 06:34:04 +00:00
Matt Arsenault	1f17c66890	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Michael Kuperstein	c2af82b4b7	[LoopUnroll] Enable PGO-based loop peeling by default. This enables peeling of loops with low dynamic iteration count by default, when profile information is available. Differential Revision: https://reviews.llvm.org/D27734 llvm-svn: 295796	2017-02-22 00:27:34 +00:00
Xin Tong	ccee0e0c05	Make default value for disable-licm-promotion in licm explicit. llvm-svn: 295767	2017-02-21 20:53:48 +00:00
Sanjay Patel	cb731f1538	[InstCombine] canonicalize non-obivous forms of integer min/max This is part of trying to clean up our handling of min/max patterns in IR. By converting these to canonical form, we're more likely to recognize them because there are various places in InstCombine that don't use matchSelectPattern or m_SMax and friends. The backend fixups referenced in the now deleted TODO comment were added with: https://reviews.llvm.org/rL291392 https://reviews.llvm.org/rL289738 If there's any codegen fallout from this change, we should be able to address it in DAGCombiner or target-specific lowering. llvm-svn: 295758	2017-02-21 19:33:53 +00:00
Xin Tong	ebfe01c121	[LoopSimplify] Simplify how we compute UniqueExit Summary: Simplify how we compute UniqueExit. Reuse ExitBlockSet. Reviewers: sanjoy, efriedma, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30182 llvm-svn: 295751	2017-02-21 19:10:58 +00:00
Anna Thomas	ec36f3b79a	[InstCombine] Do not exercise nested max/min pattern on abs Summary: This is a fix for assertion failure in `getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern. We should not be invoking the simplification rule for ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations. Added a test case which would cause an assertion failure without the patch. Reviewers: sanjoy, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30051 llvm-svn: 295719	2017-02-21 14:40:28 +00:00
Evgeny Stupachenko	9909872e30	The patch introduces new way of narrowing complex (>UINT16 variants) solutions. The new method introduced under "-lsr-exp-narrow" option (currenlty set to true). Summary: The method is based on registers number mathematical expectation and should be generally closer to optimal solution. Please see details in comments to "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function (in lib/Transforms/Scalar/LoopStrengthReduce.cpp). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 295704	2017-02-21 07:34:40 +00:00
Sanjoy Das	90208720e3	Add a wrapper around copy_if in STLExtras; NFC I will add one more use for this in a later change. llvm-svn: 295685	2017-02-21 00:38:44 +00:00
Sanjoy Das	85cd132068	[IndVars] Add an assert We've already checked that the loop is in simplify form before, but a little paranoia never hurt anyone. llvm-svn: 295680	2017-02-20 23:37:11 +00:00
Daniel Berlin	78cbd28102	MemorySSA: Add support for renaming uses in the updater. Summary: This lets one add aliasing stores to the updater. (i'm next going to move the creation/etc functions to the updater) Reviewers: george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30154 llvm-svn: 295677	2017-02-20 22:26:03 +00:00
Alexey Bataev	f96465b9b8	[SLP] nullptr'ize initial value in `findBuildAggregate()`, NFC. Initial value of V is sett nullptr, as it is not used. llvm-svn: 295642	2017-02-20 08:04:11 +00:00
Alexey Bataev	2f6b124e01	[SLP] Rework `findBuildAggregate()` from ercursive form to iterative, NFC. Reviewers: mkuper Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30103 llvm-svn: 295641	2017-02-20 07:49:39 +00:00
Simon Pilgrim	5798180947	Removed extra ';' llvm-svn: 295603	2017-02-19 12:32:44 +00:00
Daniel Berlin	a4b5c01dd2	Add a DebugCounter for PredicateInfo renaming, and an associated test llvm-svn: 295594	2017-02-19 04:29:01 +00:00
Simon Pilgrim	dba9011942	Fix unused variable warning when assertions are disabled. llvm-svn: 295587	2017-02-19 00:33:37 +00:00
Daniel Berlin	f7d9580a08	NewGVN: Start making use of predicateinfo pass. Summary: This begins using the predicateinfo pass in NewGVN. Reviewers: davide Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D29682 llvm-svn: 295583	2017-02-18 23:06:50 +00:00
Daniel Berlin	b355c4ff5f	NewGVN: Make ranking prefer undef to constants. Fix direction of shouldSwapOperands to be correct. llvm-svn: 295582	2017-02-18 23:06:47 +00:00
Daniel Berlin	588e0be39d	PredicateInfo: Clean up predicate info a little, using insertion helpers, and fixing support for the renaming the comparison. llvm-svn: 295581	2017-02-18 23:06:38 +00:00
Sanjay Patel	53c5c3d65d	[InstCombine] add nsw/nuw X, signbit --> or X, signbit Changing to 'or' (rather than 'xor' when no wrapping flags are set) allows icmp simplifies to happen as expected. Differential Revision: https://reviews.llvm.org/D29729 llvm-svn: 295574	2017-02-18 22:20:09 +00:00
Piotr Padlewski	cc5868c186	[MemorySSA] NFC small fixes Summary: 2 small fixes extracted from https://reviews.llvm.org/D29064 Reviewers: kuhar, davide, dberlin, george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30109 llvm-svn: 295566	2017-02-18 20:34:36 +00:00
Dehao Chen	7d230325ef	Increases full-unroll threshold. Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538	2017-02-18 03:46:51 +00:00
Justin Bogner	7bc978b543	OptDiag: Allow constructing DiagnosticLocation from DISubprograms This avoids creating a DILocation just to represent a line number, since creating Metadata is expensive. Creating a DiagnosticLocation directly is much cheaper. llvm-svn: 295531	2017-02-18 02:00:27 +00:00
Davide Italiano	bca05df38b	[NewGVN] isOnlyReachableViaThisEdge() is dead now. NFCI. llvm-svn: 295503	2017-02-17 22:12:30 +00:00
Davide Italiano	6db0ecafd7	[NewGVN] createVariableOrConstant is not required anymore. NFCI. llvm-svn: 295500	2017-02-17 21:55:47 +00:00
Peter Collingbourne	184773d81f	WholeProgramDevirt: For VCP use a 32-bit ConstantInt for the byte offset. A future change will cause this byte offset to be inttoptr'd and then exported via an absolute symbol. On the importing end we will expect the symbol to be in range [0,2^32) so that it will fit into a 32-bit relocation. The problem is that on 64-bit architectures if the offset is negative it will not be in the correct range once we inttoptr it. This change causes us to use a 32-bit integer so that it can be inttoptr'd (which zero extends) into the correct range. Differential Revision: https://reviews.llvm.org/D30016 llvm-svn: 295487	2017-02-17 19:43:45 +00:00
Peter Collingbourne	37317f1207	WholeProgramDevirt: Examine the function body when deciding whether functions are readnone. The goal is to get an analysis result even for de-refineable functions. Differential Revision: https://reviews.llvm.org/D29803 llvm-svn: 295472	2017-02-17 18:17:04 +00:00
Matthew Simpson	f68e183f91	[LV] Remove constant restriction for vector phi creation We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456	2017-02-17 16:09:07 +00:00
Eugene Leviant	958fcd7502	InstCombine: fix extraction when performing vector/array punning Differential revision: https://reviews.llvm.org/D29491 llvm-svn: 295429	2017-02-17 07:36:03 +00:00
Sanjoy Das	8b859c26ec	[JumpThreading] Re-enable JumpThreading for guards Summary: JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200 due to the following problem: the feature used the following algorithm for detection of diamond patters: 1. Find a block with 2 predecessors; 2. Check that these blocks have a common single parent; 3. Check that the parent's terminator is a branch instruction. The problem is that these checks are insufficient. They may pass for a non-diamond construction in case if those two predecessors are actually the same block. This may happen if parent's terminator is a br (either conditional or unconditional) to a block that ends with "switch" instruction with exactly two branches going to one block. This patch re-enables the JumpThreading for guards and fixes this issue by adding the check that those found predecessors are actually different blocks. This guarantees that parent's terminator is a conditional branch with exactly 2 different successors, which is now ensured by assertions. It also adds two more tests for this situation (with parent's terminator being a conditional and an unconditional branch). Patch by Max Kazantsev! Reviewers: anna, sanjoy, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30036 llvm-svn: 295410	2017-02-17 04:21:14 +00:00
Matt Arsenault	c18b67745b	Bug 31948: Fix assertion when bitcasting constantexpr pointers llvm-svn: 295387	2017-02-17 00:32:19 +00:00
Wei Mi	493fb266ed	[LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops other than current loop. This is good for the case when induction variable of outerloop being used in expr in innerloop. But it is very bad to allow such Reg from sibling loop because we may need to add lsr.iv in other sibling loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase below, one loop can be inserted with a bunch of lsr.iv because of LSR for other loops. // The induction variable j from a loop in the middle will have initial // value generated from previous sibling loop and exit value used by its // next sibling loop. void goo(long i, long j); long cond; void foo(long N) { long i = 0; long j = 0; i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); } The fix is to only allow formula with SCEVAddRecExpr type of Reg from current loop or its parents. Differential Revision: https://reviews.llvm.org/D30021 llvm-svn: 295378	2017-02-16 21:27:31 +00:00
Matt Arsenault	920576042d	InstCombine: Canonicalize fast fmuladd to fmul + fadd llvm-svn: 295353	2017-02-16 18:46:24 +00:00
Craig Topper	3731f4d173	[AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit. llvm-svn: 295294	2017-02-16 07:35:23 +00:00
Peter Collingbourne	08eb081ac3	PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline. Differential Revision: https://reviews.llvm.org/D30008 llvm-svn: 295260	2017-02-15 23:48:38 +00:00
Peter Collingbourne	50cbd7cc90	Re-apply r295110 and r295144 with a fix for the ASan issue. llvm-svn: 295241	2017-02-15 21:56:51 +00:00
Sanjay Patel	845ea963aa	[InstCombine] improve formatting; NFC llvm-svn: 295237	2017-02-15 21:31:34 +00:00
Arnold Schwaighofer	8d61e0030a	AddressSanitizer: don't track swifterror memory addresses They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295230	2017-02-15 20:43:43 +00:00
Arnold Schwaighofer	8eb1a48540	ThreadSanitizer: don't track swifterror memory addresses They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295215	2017-02-15 18:57:06 +00:00
Anna Thomas	94c8d4976c	Revert "[JumpThreading] Thread through guards" This reverts commit r294617. We fail on an assert while trying to get a condition from an unconditional branch. llvm-svn: 295200	2017-02-15 17:08:29 +00:00
Sanjay Patel	288f075f8e	[InlineFunction] use getFunction(); NFC llvm-svn: 295185	2017-02-15 15:22:18 +00:00
Sanjay Patel	32d753cae3	[InlineFunction] use getCaller(); NFCI llvm-svn: 295181	2017-02-15 15:08:38 +00:00
Sanjay Patel	ada717e25b	[InlineFunction] use range-for loop; NFCI llvm-svn: 295179	2017-02-15 14:56:11 +00:00
Daniel Jasper	eef9b03395	Revert r295110 and r295144. This fails under ASAN: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio llvm-svn: 295162	2017-02-15 09:56:08 +00:00
Peter Collingbourne	0609acc10d	SimplifyCFG: Register cloned assume intrinsics with assumption cache when creating critical edge. Differential Revision: https://reviews.llvm.org/D29976 llvm-svn: 295145	2017-02-15 03:01:11 +00:00
Peter Collingbourne	e2367415b6	WholeProgramDevirt: Separate the code that applies optzns from the code that decides whether to apply them. NFCI. The idea is that the apply* functions will also be called when importing devirt optimizations. Differential Revision: https://reviews.llvm.org/D29745 llvm-svn: 295144	2017-02-15 02:13:08 +00:00
Easwaran Raman	5a12f236c6	Fix a bug in caller's BFI update code after inlining. Multiple blocks in the callee can be mapped to a single cloned block since we prune the callee as we clone it. The existing code iterates over the value map and clones the block frequency (and eventually scales the frequencies of the cloned blocks). Value map's iteration is not deterministic and so the cloned block might get the frequency of any of the original blocks. The fix is to set the max of the original frequencies to the cloned block. The first block in the sequence must have this max frequency and, in the call context, subsequent blocks must have its frequency. Differential Revision: https://reviews.llvm.org/D29696 llvm-svn: 295115	2017-02-14 22:49:28 +00:00
Michael Kuperstein	569162fefe	[LV] Rename Induction to PrimaryInduction. NFC. llvm-svn: 295111	2017-02-14 22:14:01 +00:00
Peter Collingbourne	534c0175b6	WholeProgramDevirt: Change internal vcall data structures to match summary. Group calls into constant and non-constant arguments up front, and use uint64_t instead of ConstantInt to represent constant arguments. The goal is to allow the information from the summary to fit naturally into this data structure in a future change (specifically, it will be added to CallSiteInfo). This has two side effects: - We disallow VCP for constant integer arguments of width >64 bits. - We remove the restriction that the bitwidth of a vcall's argument and return types must match those of the vfunc definitions. I don't expect either of these to matter in practice. The first case is uncommon, and the second one will lead to UB (so we can do anything we like). Differential Revision: https://reviews.llvm.org/D29744 llvm-svn: 295110	2017-02-14 22:12:23 +00:00
Taewook Oh	2e945ebb13	[BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions created in SplitBlockPredecessors Summary: When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of ``` 1 typedef struct _node_t { 2 struct _node_t next; 3 } node_t; 4 5 extern node_t root; 6 7 int foo() { 8 node_t node, tmp; 9 int ret = 0; 10 11 node = tmp = root->next; 12 while (node != root) { 13 while (node) { 14 tmp = node; 15 node = node->next; 16 ret++; 17 } 18 } 19 20 return ret; 21 } ``` , below is the basicblock corresponding to line 12 after Reassociate expressions pass: ``` while.cond: ; preds = %while.cond2, %entry %node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ] %ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ] tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21 tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31 %cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33 br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35 ``` As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is ``` !21 = !DILocation(line: 9, column: 7, scope: !6) ``` , which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction. Reviewers: dblaikie, samsonov, eugenis Reviewed By: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29867 llvm-svn: 295106	2017-02-14 21:10:40 +00:00
Vedant Kumar	55891fc71e	Re-apply "[profiling] Remove dead profile name vars after emitting name data" This reverts 295092 (re-applies 295084), with a fix for dangling references from the array of coverage names passed down from frontends. I missed this in my initial testing because I only checked test/Profile, and not test/CoverageMapping as well. Original commit message: The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295099	2017-02-14 20:03:48 +00:00
Vedant Kumar	27ebdf4bcb	Revert "[profiling] Remove dead profile name vars after emitting name data" This reverts commit r295084. There is a test failure on: http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/ llvm-svn: 295092	2017-02-14 19:08:39 +00:00
Vedant Kumar	bb10484662	[profiling] Remove dead profile name vars after emitting name data The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295084	2017-02-14 18:48:48 +00:00
Taewook Oh	f22fa72e4a	Do not apply redundant LastCallToStaticBonus Summary: As written in the comments above, LastCallToStaticBonus is already applied to the cost if Caller has only one user, so it is redundant to reapply the bonus here. If the only user is not a caller, TotalSecondaryCost will not be adjusted anyway because callerWillBeRemoved is false. If there's no caller at all, we don't need to care about TotalSecondaryCost because inliningPreventsSomeOuterInline is false. Reviewers: chandlerc, eraman Reviewed By: eraman Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29169 llvm-svn: 295075	2017-02-14 17:30:05 +00:00
Brian Cain	6dedf65cc9	Correct a typo, s/hosting/hoisting/ llvm-svn: 295066	2017-02-14 16:41:10 +00:00
Matthew Simpson	f09d13e5cc	Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps" This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063	2017-02-14 16:28:32 +00:00
Alexey Bataev	2a2f35d59c	[SLP] Fix for PR31879: vectorize repeated scalar ops that don't get put back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056	2017-02-14 15:20:48 +00:00
Karl-Johan Karlsson	ec21b769ec	Revert "[LoopVectorize] Added address space check when analysing interleaved accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042	2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson	2ec409cca2	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038	2017-02-14 08:14:06 +00:00
Karl-Johan Karlsson	38cbf5869d	Test commit permission Removing whitespace. llvm-svn: 295037	2017-02-14 07:31:36 +00:00
Mikael Holmen	ece84cd10c	[LSR] Pointers with different address spaces are considered incompatible. Summary: Function isCompatibleIVType is already used as a guard before the call to SE.getMinusSCEV(OperExpr, PrevExpr); in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions to be of the same type, so we now consider two pointers with different address spaces to be incompatible, since it is possible that the pointers in fact have different sizes. Reviewers: qcolombet, eli.friedman Reviewed By: qcolombet Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D29885 llvm-svn: 295033	2017-02-14 06:37:42 +00:00
Peter Collingbourne	002c2d5380	ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module. Differential Revision: https://reviews.llvm.org/D29701 llvm-svn: 295021	2017-02-14 03:42:38 +00:00
Philip Reames	b2bca7e309	[LICM] Make store promotion work in the face of unordered atomics Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled. Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct. It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following: while(b) { load i128 ... if (can lower i128 atomic) store atomic i128 ... else store i128 } It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically. It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result. Differential Revision: https://reviews.llvm.org/D15592 llvm-svn: 295015	2017-02-14 01:38:31 +00:00
Peter Collingbourne	c45f7f3eb4	FunctionAttrs: Factor out a function for querying memory access of a specific copy of a function. NFC. This will later be used by ThinLTOBitcodeWriter to add copies of readnone functions to the regular LTO module. Differential Revision: https://reviews.llvm.org/D29695 llvm-svn: 295008	2017-02-14 00:28:13 +00:00
Sanjay Patel	4f74216da0	[FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back to its parent function As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html ...we should be able to propagate 'nonnull' info from a callsite back to its parent. The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430), but this won't solve that problem. The transform is currently disabled by default while we wait for clang to work-around potential security problems: http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html Differential Revision: https://reviews.llvm.org/D27855 llvm-svn: 294998	2017-02-13 23:10:51 +00:00
Peter Collingbourne	2b33f65317	IR: Type ID summary extensions for WPD; thread summary into WPD pass. Make the whole thing testable by adding YAML I/O support for the WPD summary information and adding some negative tests that exercise the YAML support. Differential Revision: https://reviews.llvm.org/D29782 llvm-svn: 294981	2017-02-13 19:26:18 +00:00
Matthew Simpson	659f92e2aa	Revert "[LV] Extend trunc optimization to all IVs with constant integer steps" This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973	2017-02-13 18:02:35 +00:00
Matthew Simpson	7b7f40297f	[LV] Extend trunc optimization to all IVs with constant integer steps This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967	2017-02-13 16:48:00 +00:00
Alexey Bataev	e8b1536e21	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934	2017-02-13 08:01:26 +00:00
Daniel Berlin	4d54796f87	NewGVN: Reverse order of congruence class elimination to maximize trivial deadness llvm-svn: 294926	2017-02-12 23:24:45 +00:00
Daniel Berlin	508a1dec94	NewGVN: Use shouldSwapOperands in one more place llvm-svn: 294925	2017-02-12 23:24:42 +00:00
Daniel Berlin	31e1b8fe48	Revert accidental commit titled "testing" This reverts commit r294919 llvm-svn: 294923	2017-02-12 22:40:10 +00:00
Daniel Berlin	86eab15f2b	NewGVN: Apply the fast math flags fix in r267113 to NewGVN as well. llvm-svn: 294922	2017-02-12 22:25:20 +00:00
Daniel Berlin	dbe8264c93	PredicateInfo: Handle critical edges Summary: This adds support for placing predicateinfo such that it affects critical edges. This fixes the issues mentioned by Nuno on the mailing list. Depends on D29519 Reviewers: davide, nlopes Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29606 llvm-svn: 294921	2017-02-12 22:12:20 +00:00
Daniel Berlin	eccb8740d1	NewGVN: Fix missed call that should be to shouldSwapOperands llvm-svn: 294920	2017-02-12 22:02:47 +00:00
Daniel Berlin	3fecad0d3e	testing llvm-svn: 294919	2017-02-12 22:02:20 +00:00
Sanjay Patel	45b7e69fef	[InstCombine] fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2) I found one special case of this transform for 'slt 0', so I removed that and added the general transform. Alive code to check correctness: Name: slt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp slt %a, C1 => %b = icmp slt %x, C1 - C2 Name: sgt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp sgt %a, C1 => %b = icmp sgt %x, C1 - C2 http://rise4fun.com/Alive/MH Differential Revision: https://reviews.llvm.org/D29774 llvm-svn: 294898	2017-02-12 16:40:30 +00:00
Daniel Berlin	22a4a01ffa	NewGVN: Reverse sense of this test to make it clearer llvm-svn: 294851	2017-02-11 15:20:15 +00:00
Daniel Berlin	1529bb93c9	NewGVN: Add missing initialization of NumFuncArgs lost due to bad merge. llvm-svn: 294850	2017-02-11 15:13:49 +00:00
Daniel Berlin	1c08767f88	NewGVN: Rank and order commutative operands consistently. llvm-svn: 294849	2017-02-11 15:07:01 +00:00
Daniel Berlin	b79f53669a	NewGVN: Clean up how we handle the INITIAL class so that everything in it is dead or unreachable, as it should be. This also makes the leader of INITIAL undef, enabling us to handle irreducibility properly. Summary: This lets us verify, more than we do now, that we didn't screw up value numbering. Reviewers: davide Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D29842 llvm-svn: 294844	2017-02-11 12:48:50 +00:00
Benjamin Kramer	efcf06f5f2	Move symbols from the global namespace into (anonymous) namespaces. NFC. llvm-svn: 294837	2017-02-11 11:06:55 +00:00
Evgeny Stupachenko	fe6f548d2d	Fix PR23384 (under "-lsr-insns-cost" option) Summary: The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D28307 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294821	2017-02-11 02:57:43 +00:00
Wei Mi	8f20e63a20	[LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with outerloop. The recommit includes some changes of testcases. No functional change to the patch. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 294814	2017-02-11 00:50:23 +00:00
Chandler Carruth	027340f3b9	[PM] Fix a bug in how I ported LoopDeletion to the new PM. This was marking the loop for deletion after the loop was deleted. This almost works, except that when we do any kind of debug logging it starts reading the name of the loop from deleted memory or otherwise blowing up. This can fail in a bunch of ways. I recently added a test that always does this, and it started failing on the sanitizer bots. The fix is to mark the loop as deleted in the loop PM infrastructure before we remove the loop. We can do this by passing the updater into the routine. That also lets us simplify a bunch of other interface components here for a net win. llvm-svn: 294810	2017-02-11 00:09:30 +00:00
Benjamin Kramer	03ab8a366e	[InstCombine] Move class into anonymous namespace. NFC. This is necessary to avoid warnings from GCC. InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared with greater visibility than the type of its field 'PointerReplacer::IC' llvm-svn: 294794	2017-02-10 22:26:35 +00:00
Benjamin Kramer	684c87be4f	[InstCombine] Silence unused variable warning in Release builds. llvm-svn: 294788	2017-02-10 22:04:17 +00:00
Yaxun Liu	ba01ed00fe	Fix invalid addrspacecast due to combining alloca with global var For function-scope variables with large initialisation list, FE usually generates a global variable to hold the initializer, then generates memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst identifies such allocas which are accessed only by reading and replaces them with the global variable. This is done by casting the global variable to the type of the alloca and replacing all references. However, when the global variable is in a different address space which is disjoint with addr space 0 (e.g. for IR generated from OpenCL, global variable cannot be in private addr space i.e. addr space 0), casting the global variable to addr space 0 results in invalid IR for certain targets (e.g. amdgpu). To fix this issue, when the global variable is not in addr space 0, instead of casting it to addr space 0, this patch chases down the uses of alloca until reaching the load instructions, then replaces load from alloca with load from the global variable. If during the chasing bitcast and GEP are encountered, new bitcast and GEP based on the global variable are generated and used in the load instructions. Differential Revision: https://reviews.llvm.org/D27283 llvm-svn: 294786	2017-02-10 21:46:07 +00:00
Dehao Chen	fb02f7140a	Encode duplication factor from loop vectorization and loop unrolling to discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782	2017-02-10 21:09:07 +00:00
Matthew Simpson	df124a7569	[LV] Remove type restriction for vector phi creation We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755	2017-02-10 16:15:26 +00:00
Philip Reames	578dafbd8b	[LoopUnswitch] Remove BFI usage (dead code) Chandler mentioned at the last social that the need for BFI in the new pass manager was causing a slight hiccup for this pass. Given this code has been checked in, but off for over a year, it makes sense to just remove it for now. Note that there's nothing wrong with the general idea - it's actually a quite good one - and once we have the infrastructure in place to implement this without the full recompuation on every loop, we absolutely should. llvm-svn: 294715	2017-02-10 06:12:06 +00:00
Chandler Carruth	addcda483e	[PM] Port ArgumentPromotion to the new pass manager. Now that the call graph supports efficient replacement of a function and spurious reference edges, we can port ArgumentPromotion to the new pass manager very easily. The old PM-specific bits are sunk into callbacks that the new PM simply doesn't use. Unlike the old PM, the new PM simply does argument promotion and afterward does the update to LCG reflecting the promoted function. Differential Revision: https://reviews.llvm.org/D29580 llvm-svn: 294667	2017-02-09 23:46:27 +00:00
Peter Collingbourne	17febdbb25	WholeProgramDevirt: Check that VCP candidate functions are defined before evaluating them. This was crashing before. llvm-svn: 294666	2017-02-09 23:46:26 +00:00
Chandler Carruth	aaad9f84be	[PM/LCG] Teach the LazyCallGraph how to replace a function without disturbing the graph or having to update edges. This is motivated by porting argument promotion to the new pass manager. Because of how LLVM IR Function objects work, in order to change their signature a new object needs to be created. This is efficient and straight forward in the IR but previously was very hard to implement in LCG. We could easily replace the function a node in the graph represents. The challenging part is how to handle updating the edges in the graph. LCG previously used an edge to a raw function to represent a node that had not yet been scanned for calls and references. This was the core of its laziness. However, that model causes this kind of update to be very hard: 1) The keys to lookup an edge need to be `Function`s that would all need to be updated when we update the node. 2) There will be some unknown number of edges that haven't transitioned from `Function` edges to `Node` edges. All of this complexity isn't necessary. Instead, we can always build a node around any function, always pointing edges at it and always using it as the key to lookup an edge. To maintain the laziness, we need to sink the edges* of a node into a secondary object and explicitly model transitioning a node from empty to populated by scanning the function. This design seems much cleaner in a number of ways, but importantly there is now exactly one place where the `Function` has to be updated! Some other cleanups that fall out of this include having something to model the entry* edges more accurately. Rather than hand rolling parts of the node in the graph itself, we have an explicit `EdgeSequence` object that gives us exactly the functionality needed. We also have a consistent place to define the edge iterators and can use them for both the entry edges and the internal edges of the graph. The API used to model the separation between a node and its edges is intentionally very thin as most clients are expected to deal with nodes that have populated edges. We model this exactly as an optional does with an additional method to populate the edges when that is a reasonable thing for a client to do. This is based on API design suggestions from Richard Smith and David Blaikie, credit goes to them for helping pick how to model this without it being either too explicit or too implicit. The patch is somewhat noisy due to shifting around iterator types and new syntax for walking the edges of a node, but most of the functionality change is in the `Edge`, `EdgeSequence`, and `Node` types. Differential Revision: https://reviews.llvm.org/D29577 llvm-svn: 294653	2017-02-09 23:24:13 +00:00
Sanjay Patel	f38bab73aa	[InstCombine] allow (X * C2) << C1 --> X * (C2 << C1) for vectors This fold already existed for vectors but only when 'C1' was a splat constant (but 'C2' could be any constant). There were no tests for any vector constants, so I'm adding a test that shows non-splat constants for both operands. llvm-svn: 294650	2017-02-09 23:13:04 +00:00
Peter Collingbourne	cea1e4e79a	De-duplicate some code for creating an AARGetter suitable for the legacy PM. I'm about to use this in a couple more places. Differential Revision: https://reviews.llvm.org/D29793 llvm-svn: 294648	2017-02-09 23:11:52 +00:00
Michael J. Spencer	714d9d22ad	[LoadCombine] Fix combining of loads which span an aliasing store. Fixes PR31517 Differential Revision: https://reviews.llvm.org/D28922 llvm-svn: 294632	2017-02-09 21:46:49 +00:00
Peter Collingbourne	857aba4410	Rename LowerTypeTestsSummaryAction to PassSummaryAction. NFCI. I intend to use the same type with the same semantics in the WholeProgramDevirt pass. Differential Revision: https://reviews.llvm.org/D29746 llvm-svn: 294629	2017-02-09 21:45:01 +00:00
Sanjay Patel	ae3b43e488	[InstCombine] use m_APInt to allow demanded bits analysis on splat constants llvm-svn: 294628	2017-02-09 21:43:06 +00:00
Sanjoy Das	74bda4d591	[JumpThreading] Thread through guards Summary: This patch allows JumpThreading also thread through guards. Virtually, guard(cond) is equivalent to the following construction: if (cond) { do something } else {deoptimize} Yet it is not explicitly converted into IFs before lowering. This patch enables early threading through guards in simple cases. Currently it covers the following situation: if (cond1) { // code A } else { // code B } // code C guard(cond2) // code D If there is implication cond1 => cond2 or !cond1 => cond2, we can transform this construction into the following: if (cond1) { // code A // code C } else { // code B // code C guard(cond2) } // code D Thus, removing the guard from one of execution branches. Patch by Max Kazantsev! Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29620 llvm-svn: 294617	2017-02-09 19:40:22 +00:00
Mike Aizatsky	4705ae936d	[sancov] using comdat only when it is enabled Differential Revision: https://reviews.llvm.org/D29733 llvm-svn: 294529	2017-02-08 23:12:46 +00:00
Mike Aizatsky	401d369328	[sancov] specifying comdat for sancov constructors Differential Revision: https://reviews.llvm.org/D29662 llvm-svn: 294517	2017-02-08 21:20:33 +00:00
Peter Collingbourne	28ffd3261f	ThinLTOBitcodeWriter: Strip debug info from merged module. This module will contain nothing but vtable definitions and (soon) available_externally function definitions, so there is no point in keeping debug info in the module. Differential Revision: https://reviews.llvm.org/D28913 llvm-svn: 294511	2017-02-08 20:44:00 +00:00
Elena Demikhovsky	5267edd3e3	[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503	2017-02-08 19:25:23 +00:00
Matt Arsenault	560665250f	NVPTX: Extract mem intrinsic expansions into utilities llvm-svn: 294490	2017-02-08 17:49:52 +00:00
Chad Rosier	e22c992ba9	[Reassociate] Remove an unused argument. NFC. llvm-svn: 294489	2017-02-08 17:45:27 +00:00
Sanjay Patel	6dd2eae76a	[InstCombine] add local name for repeated calls; NFC llvm-svn: 294470	2017-02-08 16:19:36 +00:00
Igor Laevsky	a9b6872908	[InstComobineCalls] Fix buildbot failures after r294453. Some targets don't support uint64_t options. Change type to unsigned. Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294461	2017-02-08 15:21:48 +00:00
Igor Laevsky	900ffa34c8	[InstCombineCalls] Unfold element atomic memcpy instruction Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294453	2017-02-08 14:32:04 +00:00
Igor Laevsky	4b317fa24e	[InstCombineCalls] Remove zero length atomic memcpy intrinsics Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294452	2017-02-08 14:23:47 +00:00
Matt Arsenault	cb3fa37c7e	LSR: Check atomic instruction pointer operands llvm-svn: 294410	2017-02-08 06:44:58 +00:00
Daniel Berlin	73eb7fe478	Revert "CVP: Make CVP iterate in an order that maximizes reuse of LVI cache" This reverts commit r294398, it seems to be failing on the bots. llvm-svn: 294399	2017-02-08 02:48:25 +00:00
Daniel Berlin	b83cfb7724	CVP: Make CVP iterate in an order that maximizes reuse of LVI cache Summary: After the DFS order change for LVI, i have a few testcases that now take forever. The TL;DR - This is mainly due to the overdefined cache, but that requires predicateinfo to fix[1] In order to maximize reuse of the LVI cache for now, change the order we iterate in. This reduces my testcase from 5 minutes to 4 seconds. I have verified cases like gmic do not get slower. I am playing with whether the order should be postorder or idf. [1] In practice, overdefined anywhere should be overdefined everywhere, so this cache should be global. That also fixes this bug. The problem, however, is that LVI relies on this cache being filled in per-block because it wants different values in different blocks due to precisely the naming issue that predicateinfo fixes. With predicateinfo, making the cache global works fine on individual passes, and also resolves this issue. Reviewers: davide, sanjoy, chandlerc Subscribers: llvm-commits, djasper Differential Revision: https://reviews.llvm.org/D29679 llvm-svn: 294398	2017-02-08 02:35:07 +00:00
Sanjoy Das	ec892139bd	[IRCE] Add a missing invariant check Currently IRCE relies on the loops it transforms to be (semantically) of the form: for (i = START; i < END; i++) ... or for (i = START; i > END; i--) ... However, we were not verifying the presence of the START < END entry check (i.e. check before the first iteration). We were only verifying that the backedge was guarded by (i + 1) < END. Usually this would work "fine" since (especially in Java) most loops do actually have the START < END check, but of course that is not guaranteed. llvm-svn: 294375	2017-02-07 23:59:07 +00:00
Daniel Berlin	c763fd1ab4	PredicateInfo: Some compilers are unhappy with naming Use *'s Use. Change the name. llvm-svn: 294364	2017-02-07 22:11:43 +00:00
Daniel Berlin	439042b7ad	Add PredicateInfo utility and printing pass Summary: This patch adds a utility to build extended SSA (see "ABCD: eliminating array bounds checks on demand"), and an intrinsic to support it. This is then used to get functionality equivalent to propagateEquality in GVN, in NewGVN (without having to replace instructions as we go). It would work similarly in SCCP or other passes. This has been talked about a few times, so i built a real implementation and tried to productionize it. Copies are inserted for operands used in assumes and conditional branches that are based on comparisons (see below for more) Every use affected by the predicate is renamed to the appropriate intrinsic result. E.g. %cmp = icmp eq i32 %x, 50 br i1 %cmp, label %true, label %false true: ret i32 %x false: ret i32 1 will become %cmp = icmp eq i32, %x, 50 br i1 %cmp, label %true, label %false true: ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 } %x.0 = call @llvm.ssa_copy.i32(i32 %x) ret i32 %x.0 false: ret i23 1 (you can use -print-predicateinfo to get an annotated-with-predicateinfo dump) This enables us to easily determine what operations are affected by a given predicate, and how operations affected by a chain of predicates. Reviewers: davide, sanjoy Subscribers: mgorny, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D29519 Update for review comments Fix a bug Nuno noticed where we are giving information about and/or on edges where the info is not useful and easy to use wrong Update for review comments llvm-svn: 294351	2017-02-07 21:10:46 +00:00
David Blaikie	4c01af203e	Fix the -Werror build for some sign-comparisons llvm-svn: 294331	2017-02-07 18:58:17 +00:00
Davide Italiano	2133bf5562	[InstCombine] Make max size array combine a tunable. Requested by Sanjoy/Hal a while ago, and forgotten by me (r283612). llvm-svn: 294323	2017-02-07 17:56:50 +00:00
Reid Kleckner	79e37d517c	Revert "[GVNHoist] Merge DebugLoc metadata on hoisted instructions" This reverts commit r294250. It caused PR31891. Add a test case that shows that inlinable calls retain location information with an accurate scope. llvm-svn: 294317	2017-02-07 17:31:13 +00:00
Peter Collingbourne	1ea1fd893c	LowerTypeTests: Simplify. NFC. llvm-svn: 294273	2017-02-07 03:20:58 +00:00
Dehao Chen	4a9dd70213	Fix the samplepgo indirect call promotion bug: we should not promote a direct call. Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29570 llvm-svn: 294260	2017-02-06 23:33:15 +00:00
Paul Robinson	383c5c228f	Merge DebugLoc on combined stores; in this case, when combining stores from the end of two blocks, merge instead of arbitrarily picking one. Differential Revision: http://reviews.llvm.org/D29504 llvm-svn: 294251	2017-02-06 22:19:04 +00:00
Taewook Oh	44a856f7d5	[GVNHoist] Merge DebugLoc metadata on hoisted instructions Summary: When instructions are hoisted, current implementation keeps DebugLoc metadata of the instruction that chosen as Repl (and its GEP operand if Repl is a load or a store). However, DebugLoc metadata should be updated to the 'merged' location across all hoisted instructions. See the following example code: ``` 1: typedef struct { 2: int a[10]; 3: } S1; 4: 5: extern S1 s1[10]; 6: 7: void foo(int x, int y, int i) { 8: if (y) 9: s1[i]->a[i] = x + y; 10: else 11: s1[i]->a[i] = x; 12: } ``` Below is LLVM IR representation of the program before gvn-hoist: ``` %struct.S1 = type { [10 x i32] } @s1 = external local_unnamed_addr global [10 x %struct.S1], align 16 define void @foo(i32 %x, i32 %y, i32 %i) !dbg !4 { entry: %tobool = icmp ne i32 %y, 0, !dbg !8 br i1 %tobool, label %if.then, label %if.else, !dbg !10 if.then: ; preds = %entry %add = add nsw i32 %x, %y, !dbg !11 %idxprom = sext i32 %i to i64, !dbg !12 %arrayidx = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !12 %0 = load %struct.S1, %struct.S1* %arrayidx, align 8, !dbg !12, !tbaa !13 %a = getelementptr inbounds %struct.S1, %struct.S1* %0, i32 0, i32 0, !dbg !17 br label %if.end, !dbg !12 if.else: ; preds = %entry %idxprom3 = sext i32 %i to i64, !dbg !18 %arrayidx4 = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom3, !dbg !18 %1 = load %struct.S1, %struct.S1* %arrayidx4, align 8, !dbg !18, !tbaa !13 %a5 = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !19 br label %if.end if.end: ; preds = %if.else, %if.then %a5.sink = phi [10 x i32]* [ %a5, %if.else ], [ %a, %if.then ] %.sink = phi i32 [ %x, %if.else ], [ %add, %if.then ] %idxprom6 = sext i32 %i to i64 %arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %a5.sink, i64 0, i64 %idxprom6 store i32 %.sink, i32* %arrayidx7, align 4, !tbaa !20 ret void, !dbg !22 } ``` where ``` !11 = !DILocation(line: 9, column: 18, scope: !9) !12 = !DILocation(line: 9, column: 5, scope: !9) !18 = !DILocation(line: 11, column: 5, scope: !9) !19 = !DILocation(line: 11, column: 9, scope: !9) ``` . And below is after gvn-hoist: ``` define void @foo(i32 %x, i32 %y, i32 %i) !dbg !4 { entry: %tobool = icmp ne i32 %y, 0, !dbg !8 %idxprom = sext i32 %i to i64, !dbg !10 %0 = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !10 %1 = load %struct.S1, %struct.S1* %0, align 8, !dbg !10, !tbaa !11 br i1 %tobool, label %if.then, label %if.else, !dbg !15 if.then: ; preds = %entry %add = add nsw i32 %x, %y, !dbg !16 %arrayidx = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !10 %a = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !17 br label %if.end, !dbg !10 if.else: ; preds = %entry %arrayidx4 = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !18 %a5 = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !19 br label %if.end if.end: ; preds = %if.else, %if.then %a5.sink = phi [10 x i32]* [ %a5, %if.else ], [ %a, %if.then ] %.sink = phi i32 [ %x, %if.else ], [ %add, %if.then ] %arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %a5.sink, i64 0, i64 %idxprom store i32 %.sink, i32* %arrayidx7, align 4, !tbaa !20 ret void, !dbg !22 } ``` As you see, loads and their GEPs have been hosited from if.then/if.else block to entry block. However, DebugLoc metadata of these new instructions are still same as the instructions in if.then block, as they are moved/cloned from if.then block. This may result incorrect stepping and imprecise sample profile result. Reviewers: majnemer, pcc, sebpop Reviewed By: sebpop Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29377 llvm-svn: 294250	2017-02-06 22:05:04 +00:00
Michael Kuperstein	7a86bb2589	[SLP] Revert "Allow using of extra values in horizontal reductions." This breaks when one of the extra values is also a scalar that participates in the same vectorization tree which we'll end up reducing. llvm-svn: 294245	2017-02-06 21:50:59 +00:00
Sanjay Patel	54656ca7db	[ValueTracking] emit a remark when we detect a conflicting assumption (PR31809) This is a follow-up to D29395 where we try to be good citizens and let the user know that we've probably gone off the rails. This should allow us to resolve: https://llvm.org/bugs/show_bug.cgi?id=31809 Differential Revision: https://reviews.llvm.org/D29404 llvm-svn: 294208	2017-02-06 18:26:06 +00:00
Dehao Chen	c81483d79c	Fix the bug of samplepgo indirect call promption when type casting of the return value is needed. Summary: When type casting of the return value is needed, promoteIndirectCall will return the type casting instruction instead of the direct call. This patch changed to return the direct call instruction instead. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29569 llvm-svn: 294205	2017-02-06 18:10:36 +00:00
Sanjay Patel	cf4c90f3d3	[InstCombine] simplify dyn_cast + isa; NFCI llvm-svn: 294198	2017-02-06 17:16:16 +00:00
Dehao Chen	448b5790f6	Refactor SampleProfile.cpp to make it cleaner. (NFC) llvm-svn: 294118	2017-02-05 07:32:17 +00:00
Davide Italiano	ec49313b11	[IPCP] Don't propagate return value for naked functions. This is pretty much the same change made in SCCP. llvm-svn: 294098	2017-02-04 19:44:14 +00:00
Xinliang David Li	c7db0d0e75	Fix variable name /NFC llvm-svn: 294090	2017-02-04 07:40:43 +00:00
Sanjay Patel	0fe32ac256	[InstCombine] treat i1 as a special type in shouldChangeType() This patch is based on the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html Folding to i1 should always be desirable because that's better for value tracking and we have special folds for i1 types. I checked for other users of shouldChangeType() where this might have an effect, but we already handle the i1 case differently than other types in all of those cases. Side note: the default datalayout includes i1, so it seems we only find this gap in shouldChangeType + phi folding for the case when there is (1) an explicit datalayout without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming casted operands (as Björn mentioned). Differential Revision: https://reviews.llvm.org/D29336 llvm-svn: 294066	2017-02-03 23:13:11 +00:00
Xinliang David Li	6144a59b7f	[PGO] Add select instr profile in graph dump Differential Revision: http://reviews.llvm.org/D29474 llvm-svn: 294055	2017-02-03 21:57:51 +00:00
Sanjay Patel	73fc8ddb06	[InstCombine] fix operand-complexity-based canonicalization (PR28296) The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg) operators from arguments. Adding another level to the weighting scheme provides more structure and can help simplify the pattern matching in InstCombine and other places. I fixed regressions that would have shown up from this change in: rL290067 rL290127 But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests. Should fix: https://llvm.org/bugs/show_bug.cgi?id=28296 Differential Revision: https://reviews.llvm.org/D27933 llvm-svn: 294049	2017-02-03 21:43:34 +00:00
Michael Kuperstein	2a735b71b6	[SLP] Make sortMemAccesses explicitly return an error. NFC. llvm-svn: 294029	2017-02-03 19:32:50 +00:00
Anna Thomas	b555cc8cb6	NFC: [LoopUnroll] More meaningful message in tracing llvm-svn: 294017	2017-02-03 17:12:43 +00:00
Peter Collingbourne	e6fd9ff96a	IRMover: Merge flags LinkModuleInlineAsm and IsPerformingImport. Currently these flags are always the inverse of each other, so there is no need to keep them separate. Differential Revision: https://reviews.llvm.org/D29471 llvm-svn: 294016	2017-02-03 17:01:14 +00:00
Peter Collingbourne	6d8f817f8b	FunctionImport: Use IRMover directly. The importer was previously using ModuleLinker in a sort of "IRMover mode". Use IRMover directly instead in order to remove a level of indirection. I will remove all importing support from ModuleLinker in a separate change. Differential Revision: https://reviews.llvm.org/D29468 llvm-svn: 294014	2017-02-03 16:56:27 +00:00
Alexey Bataev	a16cfe6fa9	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also it supports a vectorization of instructions with some extra arguments, like: float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D28961 llvm-svn: 293994	2017-02-03 08:08:50 +00:00
Mehdi Amini	1380edf4ef	Revert "[ThinLTO] Add an auto-hide feature" This reverts commit r293970. After more discussion, this belongs to the linker side and there is no added value to do it at this level. llvm-svn: 293993	2017-02-03 07:41:43 +00:00
Marcos Pividori	db5a565514	[sanitizer coverage] Fix Instrumentation to work on Windows. On Windows, the symbols "___stop___sancov_guards" and "___start___sancov_guards" are not defined automatically. So, we need to take a different approach. We define 3 sections: Section ".SCOV$A" will only hold a variable ___start___sancov_guard. Section ".SCOV$M" will hold the main data. Section ".SCOV$Z" will only hold a variable ___stop___sancov_guards. When linking, they will be merged sorted by the characters after the $, so we can use the pointers of the variables ___[start\|stop]___sancov_guard to know the actual range of addresses of that section. In this diff, I updated instrumentation to include all the guard arrays in section ".SCOV$M". Differential Revision: https://reviews.llvm.org/D28434 llvm-svn: 293987	2017-02-03 01:08:06 +00:00
Mehdi Amini	b0a8ff71e5	[ThinLTO] Add an auto-hide feature When a symbol is not exported outside of the DSO, it is can be hidden. Usually we try to internalize as much as possible, but it is not always possible, for instance a symbol can be referenced outside of the LTO unit, or there can be cross-module reference in ThinLTO. This is a recommit of r293912 after fixing build failures, and a recommit of r293918 after fixing LLD tests. Differential Revision: https://reviews.llvm.org/D28978 llvm-svn: 293970	2017-02-03 00:32:38 +00:00
Mehdi Amini	21c89dc920	Revert "[ThinLTO] Add an auto-hide feature" This reverts commit r293918, one lld test does not pass. llvm-svn: 293961	2017-02-02 23:20:36 +00:00
Xinliang David Li	58fcc9bdce	[PGO] internal option cleanups 1. Added comments for options 2. Added missing option cl::desc field 3. Uniified function filter option for graph viewing. Now PGO count/raw-counts share the same filter option: -view-bfi-func-name=. llvm-svn: 293938	2017-02-02 21:29:17 +00:00
Xinliang David Li	1eb4ec6a2e	[PGO] make graph view internal options available for all builds Differential Revision: https://reviews.llvm.org/D29259 llvm-svn: 293921	2017-02-02 19:18:56 +00:00
Peter Collingbourne	37e2459186	FunctionImport: Remove the -disable-force-link-odr flag and change importFunctions to never force link. This removes some functionality that was only being used by tests. Differential Revision: https://reviews.llvm.org/D29439 llvm-svn: 293919	2017-02-02 18:42:25 +00:00
Mehdi Amini	97624fb1ec	[ThinLTO] Add an auto-hide feature When a symbol is not exported outside of the DSO, it is can be hidden. Usually we try to internalize as much as possible, but it is not always possible, for instance a symbol can be referenced outside of the LTO unit, or there can be cross-module reference in ThinLTO. This is a recommit of r293912 after fixing build failures. Differential Revision: https://reviews.llvm.org/D28978 llvm-svn: 293918	2017-02-02 18:31:35 +00:00
Mehdi Amini	827600deaf	Revert "[ThinLTO] Add an auto-hide feature" This reverts r293912, bots are broken. llvm-svn: 293914	2017-02-02 18:24:37 +00:00
Mehdi Amini	dc5a7444f0	[ThinLTO] Add an auto-hide feature When a symbol is not exported outside of the DSO, it is can be hidden. Usually we try to internalize as much as possible, but it is not always possible, for instance a symbol can be referenced outside of the LTO unit, or there can be cross-module reference in ThinLTO. Differential Revision: https://reviews.llvm.org/D28978 llvm-svn: 293912	2017-02-02 18:13:46 +00:00
Jun Bum Lim	180bc5a021	[JumpThread] Enhance finding partial redundant loads by continuing scanning single predecessor Summary: While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor. Reviewers: mcrosier, rengolin, reames, davidxl, haicheng Reviewed By: rengolin Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D29200 llvm-svn: 293896	2017-02-02 15:12:34 +00:00
Anna Thomas	7f4b26e189	[LICM] Hoist loads that are dominated by invariant.start intrinsic, and are invariant in the loop. Summary: We can hoist out loads that are dominated by invariant.start, to the preheader. We conservatively assume the load is variant, if we see a corresponding use of invariant.start (it could be an invariant.end or an escaping call). Reviewers: mkuper, sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29331 llvm-svn: 293887	2017-02-02 13:22:03 +00:00
Adam Nemet	0bf1b863b9	[LV] Also port failure remarks to new OptimizationRemarkEmitter API llvm-svn: 293866	2017-02-02 05:41:51 +00:00
Matt Arsenault	300836098f	InferAddressSpaces: Handle more cases with constant select operands llvm-svn: 293859	2017-02-02 03:37:22 +00:00
Davide Italiano	cb68f37184	[IPSCCP] Restore the old behaviour (pre r293799). It's not clear the change I made a good idea, and it definitely needs further discussion. Thanks to Eli for pointing out. llvm-svn: 293846	2017-02-02 00:46:54 +00:00
Matt Arsenault	db6e9e89a9	InferAddressSpaces: clang-format some things llvm-svn: 293843	2017-02-02 00:28:25 +00:00
Sanjay Patel	c56d1ccd79	[InstCombine] move folds for shift-shift pairs; NFCI Although this is 'no-functional-change-intended', I'm adding tests for shl-shl and lshr-lshr pairs because there is no existing test coverage for those folds. It seems like we should be able to remove some code from foldShiftedShift() at this point because we're handling those patterns on the general path. llvm-svn: 293814	2017-02-01 21:31:34 +00:00
Michael Kuperstein	3c6b3ba258	Shut up another GCC warning about operator precedence. NFC. llvm-svn: 293812	2017-02-01 21:06:33 +00:00
Jun Bum Lim	423406fdcb	[JumpThread] No need to erase BB from LoopHeaders. NFC. Summary: No need to try to ease BB from LoopHeaders as we already know that BB is not in LoopHeaders. Reviewers: hsung, majnemer, mcrosier, haicheng, rengolin Reviewed By: rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29232 llvm-svn: 293802	2017-02-01 19:06:55 +00:00
Davide Italiano	6849f20d85	[IPSCCP] Don't propagate return values of functions marked as noinline. This tries to address what Hal defined (in the post-commit review of r293727) a long-standing problem with noinline, where we end up de facto inlining trivial functions e.g. __attribute__((noinline)) int patatino(void) { return 5; } because of return value propagation. llvm-svn: 293799	2017-02-01 18:52:20 +00:00
Matthew Simpson	ba5cf9dfee	[LV] Move interleaved access helper functions to VectorUtils (NFC) This patch moves some helper functions related to interleaved access vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like to use these functions in a follow-on patch that improves interleaved load and store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was already duplicated there and has been removed. Differential Revision: https://reviews.llvm.org/D29398 llvm-svn: 293788	2017-02-01 17:45:46 +00:00
Sanjoy Das	e0e5795f6b	[InstCombine] Allow InstCombine to merge adjacent guards Summary: If there are two adjacent guards with different conditions, we can remove one of them and include its condition into the condition of another one. This patch allows InstCombine to merge them by the following pattern: guard(a); guard(b) -> guard(a & b). Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29378 llvm-svn: 293778	2017-02-01 16:34:55 +00:00
Artur Pilipenko	2cbaded5b5	[LoopPredication] Add a new line to debug output in LoopPredication pass llvm-svn: 293762	2017-02-01 12:25:38 +00:00
Florian Hahn	a35b8a4852	[LoopUnroll] Use addClonedBlockToLoopInfo to add loop header to LI (NFC). Summary: I have a similar patch up for review already (D29173). If you prefer I can squash them both together. Also I think there more potential for code sharing between LoopUnroll.cpp and LoopUnrollRuntime.cpp. Do you think patches for that would be worthwhile? Reviewers: mkuper, mzolotukhin Reviewed By: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29311 llvm-svn: 293758	2017-02-01 10:39:35 +00:00
Justin Bogner	41e632bf6b	SanitizerCoverage: Support sanitizer guard section on darwin MachO's sections need a segment as well as a section name, and the section start and end symbols are spelled differently than on ELF. llvm-svn: 293733	2017-02-01 02:38:39 +00:00
Davide Italiano	7343b9f340	[IPSCCP] Teach how to not propagate return values of naked functions. Differential Revision: https://reviews.llvm.org/D29360 llvm-svn: 293727	2017-02-01 01:01:22 +00:00
Matt Arsenault	bdd59e6879	InferAddressSpaces: Handle select This fails to handle some cases where one of the inputs is a constant to be fixed in a later commit. llvm-svn: 293723	2017-02-01 00:08:53 +00:00
Matt Arsenault	864fbacb4a	InferAddressSpaces: Remove dead declaration llvm-svn: 293720	2017-01-31 23:57:20 +00:00
Matt Arsenault	517a290e4f	InferAddressSpaces: Avoid double map lookup llvm-svn: 293719	2017-01-31 23:48:44 +00:00
Matt Arsenault	2a46d81038	InferAddressSpaces: Fix broken casting of constants llvm-svn: 293718	2017-01-31 23:48:40 +00:00
Daniel Berlin	97718e6081	NewGVN: Dead argument cleanup llvm-svn: 293708	2017-01-31 22:32:03 +00:00
Daniel Berlin	ff12c922fe	NewGVN: Cleanup conditions to match reality llvm-svn: 293707	2017-01-31 22:32:01 +00:00
Daniel Berlin	c22aafe5b3	NewGVN: Add basic support for symbolic comparison evaluation llvm-svn: 293706	2017-01-31 22:31:58 +00:00
Daniel Berlin	808e3ff8a2	NewGVN: Formatting cleanup after lookupOperandLeader change llvm-svn: 293705	2017-01-31 22:31:56 +00:00
Daniel Berlin	203f47bbd8	NewGVN: Remove the unsued two arguments from lookupOperandLeader. llvm-svn: 293704	2017-01-31 22:31:53 +00:00
Daniel Berlin	74d300361a	NewGVN: Cleanup header files we are using. llvm-svn: 293703	2017-01-31 22:31:50 +00:00
Davide Italiano	116464a55d	[NewGVN] Preserve TargetLibraryInfo analysis. We can maybe preserve more but this is a first step. Ack'ed by Danny on IRC. llvm-svn: 293694	2017-01-31 21:53:18 +00:00
Taewook Oh	75acec8a14	Do not propagate DebugLoc across basic blocks Summary: DebugLoc shouldn't be propagated across basic blocks to prevent incorrect stepping and imprecise sample profile result. rL288903 addressed the wrong DebugLoc propagation issue by limiting the copy of DebugLoc when GVN removes a fully redundant load that is dominated by some other load. However, DebugLoc is still incorrectly propagated in the following example: ``` 1: extern int g; 2: 3: void foo(int x, int y, int z) { 4: if (x) 5: g = 0; 6: else 7: g = 1; 8: 9: int i = 0; 10: for ( ; i < y ; i++) 11: if (i > z) 12: g++; 13: } ``` Below is LLVM IR representation of the program before GVN: ``` @g = external local_unnamed_addr global i32, align 4 ; Function Attrs: nounwind uwtable define void @foo(i32 %x, i32 %y, i32 %z) local_unnamed_addr #0 !dbg !4 { entry: %not.tobool = icmp eq i32 %x, 0, !dbg !8 %.sink = zext i1 %not.tobool to i32, !dbg !8 store i32 %.sink, i32* @g, align 4, !tbaa !9 %cmp8 = icmp sgt i32 %y, 0, !dbg !13 br i1 %cmp8, label %for.body.preheader, label %for.end, !dbg !17 for.body.preheader: ; preds = %entry br label %for.body, !dbg !19 for.body: ; preds = %for.body.preheader, %for.inc %i.09 = phi i32 [ %inc4, %for.inc ], [ 0, %for.body.preheader ] %cmp1 = icmp sgt i32 %i.09, %z, !dbg !19 br i1 %cmp1, label %if.then2, label %for.inc, !dbg !21 if.then2: ; preds = %for.body %0 = load i32, i32* @g, align 4, !dbg !22, !tbaa !9 %inc = add nsw i32 %0, 1, !dbg !22 store i32 %inc, i32* @g, align 4, !dbg !22, !tbaa !9 br label %for.inc, !dbg !23 for.inc: ; preds = %for.body, %if.then2 %inc4 = add nuw nsw i32 %i.09, 1, !dbg !24 %exitcond = icmp ne i32 %inc4, %y, !dbg !13 br i1 %exitcond, label %for.body, label %for.end.loopexit, !dbg !17 for.end.loopexit: ; preds = %for.inc br label %for.end, !dbg !26 for.end: ; preds = %for.end.loopexit, %entry ret void, !dbg !26 } ``` where ``` !21 = !DILocation(line: 11, column: 9, scope: !15) !22 = !DILocation(line: 12, column: 8, scope: !20) !23 = !DILocation(line: 12, column: 7, scope: !20) !24 = !DILocation(line: 10, column: 20, scope: !25) ``` And below is after GVN: ``` @g = external local_unnamed_addr global i32, align 4 define void @foo(i32 %x, i32 %y, i32 %z) local_unnamed_addr !dbg !4 { entry: %not.tobool = icmp eq i32 %x, 0, !dbg !8 %.sink = zext i1 %not.tobool to i32, !dbg !8 store i32 %.sink, i32* @g, align 4, !tbaa !9 %cmp8 = icmp sgt i32 %y, 0, !dbg !13 br i1 %cmp8, label %for.body.preheader, label %for.end, !dbg !17 for.body.preheader: ; preds = %entry br label %for.body, !dbg !19 for.body: ; preds = %for.inc, %for.body.preheader %0 = phi i32 [ %1, %for.inc ], [ %.sink, %for.body.preheader ], !dbg !21 %i.09 = phi i32 [ %inc4, %for.inc ], [ 0, %for.body.preheader ] %cmp1 = icmp sgt i32 %i.09, %z, !dbg !19 br i1 %cmp1, label %if.then2, label %for.inc, !dbg !22 if.then2: ; preds = %for.body %inc = add nsw i32 %0, 1, !dbg !21 store i32 %inc, i32* @g, align 4, !dbg !21, !tbaa !9 br label %for.inc, !dbg !23 for.inc: ; preds = %if.then2, %for.body %1 = phi i32 [ %inc, %if.then2 ], [ %0, %for.body ] %inc4 = add nuw nsw i32 %i.09, 1, !dbg !24 %exitcond = icmp ne i32 %inc4, %y, !dbg !13 br i1 %exitcond, label %for.body, label %for.end.loopexit, !dbg !17 for.end.loopexit: ; preds = %for.inc br label %for.end, !dbg !26 for.end: ; preds = %for.end.loopexit, %entry ret void, !dbg !26 } ``` As you see, GVN removes the load in if.then2 block and creates a phi instruction in for.body for it. The problem is that DebugLoc of remove load instruction is propagated to the newly created phi instruction, which is wrong. rL288903 cannot handle this case because ValuesPerBlock.size() is not 1 in this example when the load is removed. Reviewers: aprantl, andreadb, wolfgangp Reviewed By: andreadb Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D29254 llvm-svn: 293688	2017-01-31 20:57:13 +00:00
Davide Italiano	aec4617dc8	[Instcombine] Combine consecutive identical fences Differential Revision: https://reviews.llvm.org/D29314 llvm-svn: 293661	2017-01-31 18:09:05 +00:00
Arnold Schwaighofer	c368563bd6	Don't combine stores to a swifterror pointer operand to a different type llvm-svn: 293658	2017-01-31 17:53:49 +00:00
Dehao Chen	274df5ea41	Explicitly promote indirect calls before sample profile annotation. Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 293657	2017-01-31 17:49:37 +00:00
Sanjay Patel	2217f75ad1	fix formatting; NFC llvm-svn: 293652	2017-01-31 17:25:42 +00:00
Silviu Baranga	c6d21eba0e	[InstCombine] Make sure that LHS and RHS have the same type in transformToIndexedCompare If they don't have the same type, the size of the constant index would need to be adjusted (and this wouldn't be always possible). Alternatively we could try the analysis with the initial RHS value, which would guarantee that the two sides have the same type. However it is unlikely that in practice this would pass our transformation requirements. Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808). llvm-svn: 293629	2017-01-31 14:04:15 +00:00
Florian Hahn	5364cf3b56	[LoopUnroll] Use addClonedBlockToLoopInfo to clone the top level loop (NFC) Summary: rL293124 added the necessary infrastructure to properly add the cloned top level loop to LoopInfo, which means we do not have to do it manually in CloneLoopBlocks. @mkuper sorry for not pointing this out during my review of D29156, I just realized that today. Reviewers: mzolotukhin, chandlerc, mkuper Reviewed By: mkuper Subscribers: llvm-commits, mkuper Differential Revision: https://reviews.llvm.org/D29173 llvm-svn: 293615	2017-01-31 11:13:44 +00:00
Matt Arsenault	973c4aebad	InferAddressSpaces: Rename constant llvm-svn: 293594	2017-01-31 02:17:41 +00:00
Matt Arsenault	72f259b8eb	InferAddressSpaces: Handle icmp llvm-svn: 293593	2017-01-31 02:17:32 +00:00
Matt Arsenault	6d5a8d48fd	InferAddressSpaces: Support memory intrinsics llvm-svn: 293587	2017-01-31 01:56:57 +00:00
Matt Arsenault	6c907a9bb3	InferAddressSpaces: Support atomics llvm-svn: 293584	2017-01-31 01:40:38 +00:00
Matt Arsenault	d89a6e11a7	InferAddressSpaces: Don't replace volatile users llvm-svn: 293582	2017-01-31 01:30:16 +00:00
Matt Arsenault	850657a439	NVPTX: Move InferAddressSpaces to generic code llvm-svn: 293579	2017-01-31 01:10:58 +00:00
Sanjay Patel	8c5f236197	[InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors with splat constants llvm-svn: 293570	2017-01-30 23:35:52 +00:00
Benjamin Kramer	365c9bd941	[ICP] Fix bool conversion warning and actually write out the reason instead of dropping it. llvm-svn: 293564	2017-01-30 23:11:29 +00:00

... 3 4 5 6 7 ...

17522 Commits