llvm-project

Commit Graph

Author	SHA1	Message	Date
Mehdi Amini	55b06538b5	Fix test after renaming -name-anon-functions pass to -name-anon-globals llvm-svn: 281752	2016-09-16 17:18:16 +00:00
Mehdi Amini	2cac787919	Fix NameAnonFunctions pass: for ThinLTO we need to rename global variables as well A follow-up patch will rename this pass and the source file accordingly, but I figured the non-NFC change will be easier to spot in isolation. Differential Revision: https://reviews.llvm.org/D24641 llvm-svn: 281744	2016-09-16 16:56:25 +00:00
David Majnemer	08566532ae	Add a test for r280191 llvm-svn: 281694	2016-09-16 02:43:36 +00:00
Sanjay Patel	af91d1f81e	[InstCombine] allow icmp (shr/shl) folds for vectors These 2 helper functions were already using APInt internally, so just change the API and caller to allow folds for splats. The scalar regression tests look quite thorough, so I just added a couple of tests to prove that vectors are handled too. These folds should be grouped with the other cmp+shift folds though. That can be an NFC follow-up. llvm-svn: 281663	2016-09-15 21:35:30 +00:00
Sanjay Patel	d590db6eac	regenerate checks llvm-svn: 281655	2016-09-15 20:39:01 +00:00
Mehdi Amini	d880309835	[GlobalOpt] Dead Eliminate declarations GlobalOpt is already dead-code-eliminating global definitions. With this change it also takes care of declarations. Hopefully this should make it now a strict superset of GlobalDCE. This is important for LTO/ThinLTO as we don't want the linker to see "undefined reference" when it processes the input files: it could prevent proper internalization (or even load an extra file from a static archive, changing the behavior of the program!). llvm-svn: 281653	2016-09-15 20:26:27 +00:00
David Majnemer	8b16da8744	[InstCombine] Do not RAUW a constant GEP canRewriteGEPAsOffset expects to process instructions, not constants. This fixes PR30342. llvm-svn: 281650	2016-09-15 20:10:09 +00:00
Sanjay Patel	886a542e23	[InstCombine] allow icmp (sub nsw) folds for vectors Also, clean up the code and comments for the existing folds in foldICmpSubConstant(). llvm-svn: 281631	2016-09-15 18:05:17 +00:00
Sanjay Patel	514068397e	[InstCombine] add vector tests for icmp (sub nsw) llvm-svn: 281630	2016-09-15 17:54:47 +00:00
Sanjay Patel	40c53ea933	[InstCombine] allow (icmp sgt smin(PosA, B), 0) fold for vectors llvm-svn: 281624	2016-09-15 16:23:20 +00:00
Sanjay Patel	78a7c7d26a	[InstCombine] add vector tests for icmp sgt smin llvm-svn: 281623	2016-09-15 16:13:41 +00:00
Sanjay Patel	e957f1fe2c	[InstCombine] auto-generate checks llvm-svn: 281621	2016-09-15 15:48:53 +00:00
Sanjay Patel	7577a3d799	[InstCombine] use m_APInt to allow icmp folds using known bits for splat constant vectors llvm-svn: 281613	2016-09-15 14:15:47 +00:00
NAKAMURA Takumi	6a5bac48cf	llvm/test/Transforms/CorrelatedValuePropagation/alloca.ll REQUIRES +Asserts. llvm-svn: 281598	2016-09-15 09:45:31 +00:00
Wei Mi	f160e345be	Add some shortcuts in LazyValueInfo to reduce compile time of Correlated Value Propagation. The patch is to partially fix PR10584. Correlated Value Propagation queries LVI to check non-null for pointer params of each callsite. If we know the def of param is an alloca instruction, we know it is non-null and can return early from LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is constant. If the def of the pointer is an alloca instruction, we know it is not a constant pointer. These shortcuts can reduce the cost of CVP significantly. Differential Revision: https://reviews.llvm.org/D18066 llvm-svn: 281586	2016-09-15 06:28:34 +00:00
Sanjay Patel	9f036b5a97	[InstCombine] add vector tests for foldICmpUsingKnownBits() llvm-svn: 281559	2016-09-14 23:15:11 +00:00
Matthew Simpson	b25e87fca5	[LV] Process pointer IVs with PHINodes in collectLoopUniforms This patch moves the processing of pointer induction variables in collectLoopUniforms from the consecutive pointer phase of the analysis to the phi node phase. Previously, if a pointer induction variable was used by both a scalarized non-memory instruction as well as a vectorized memory instruction, we would incorrectly identify the pointer as uniform. Pointer induction variables should be treated the same as other phi nodes. That is, they are uniform if all users of the induction variable and induction variable update are uniform. Differential Revision: https://reviews.llvm.org/D24511 llvm-svn: 281485	2016-09-14 14:47:40 +00:00
Andrea Di Biagio	e8e1af3649	[InstCombine] Merged two test files and regenerated checks using update_test_checks.py. NFC. llvm-svn: 281478	2016-09-14 14:18:21 +00:00
Akira Hatanaka	dea090e6b2	[ObjCARC] Traverse chain downwards to replace uses of argument passed to ObjC library call with call return. ARC contraction tries to replace uses of an argument passed to an objective-c library call with the call return value. For example, in the following IR, it replaces uses of argument %9 and uses of the values discovered traversing the chain upwards (%7 and %8) with the call return %10, if they are dominated by the call to @objc_autoreleaseReturnValue. This transformation enables code-gen to tail-call the call to @objc_autoreleaseReturnValue, which is necessary to enable auto release return value optimization. %7 = tail call i8* @objc_loadWeakRetained(i8** %6) %8 = bitcast i8* %7 to %0* %9 = bitcast %0* %8 to i8* %10 = tail call i8* @objc_autoreleaseReturnValue(i8* %9) ret %0* %8 Since r276727, llvm started removing redundant bitcasts and as a result started feeding the following IR to ARC contraction: %7 = tail call i8* @objc_loadWeakRetained(i8** %6) %8 = bitcast i8* %7 to %0* %9 = tail call i8* @objc_autoreleaseReturnValue(i8* %7) ret %0* %8 ARC contraction no longer does the optimization described above since it only traverses the chain upwards and fails to recognize that the function return can be replaced by the call return. This commit changes ARC contraction to traverse the chain downwards too and replace uses of bitcasts with the call return. rdar://problem/28011339 Differential Revision: https://reviews.llvm.org/D24523 llvm-svn: 281419	2016-09-13 23:43:11 +00:00
Sanjay Patel	ab40f9d0ef	add tests for PR28672 I'm not sure if we actually want to transform all of these in InstCombine yet, so I'm not labeling these with FIXME. llvm-svn: 281386	2016-09-13 20:36:13 +00:00
Matt Arsenault	e2e6cfee61	Reapply "InstCombine: Reduce trunc (shl x, K) width." This reapplies r272987 with a fix for infinitely looping when the truncated value is another shift of a constant. llvm-svn: 281379	2016-09-13 19:43:57 +00:00
Andrea Di Biagio	7277afeec1	[ConstantFold] Improve the bitcast folding logic for constant vectors. The constant folder didn't know how to always fold bitcasts of constant integer vectors. In particular, it was unable to handle the case where a constant vector had some undef elements, and the resulting (i.e. bitcasted) vector type had more elements than the original vector type. Example: %cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32> On a little endian target, %cast could have been folded to: <4 x i32><i32 undef, i32 undef, i32 2, i32 0> This patch improves the folding logic by teaching how to correctly propagate undef elements in the folded vector. Differential Revision: https://reviews.llvm.org/D24301 llvm-svn: 281343	2016-09-13 14:50:47 +00:00
Andrea Di Biagio	3647a96a44	[InstSimplify] Add tests to show missed bitcast folding opportunities. InstSimplify doesn't always know how to fold a bitcast of a constant vector. In particular, the logic in InstSimplify doesn't know how to handle the case where the constant vector in input contains some undef elements, and the number of elements is smaller than the number of elements of the bitcast vector type. llvm-svn: 281332	2016-09-13 13:17:42 +00:00
Sam Parker	64781ed4bb	Remove InstCombine test file My previous commit should of removed a test file but I missed it. llvm-svn: 281326	2016-09-13 12:33:06 +00:00
Sam Parker	214f7bf5cc	Enable simplify libcalls for ARM PCS Teach SimplifyLibcalls that in can treat functions annotated with apcs, aapcs or aapcs_vfp like normal C functions if they only take and return integer or pointer values, and the target is not iOS. Differential Revision: https://reviews.llvm.org/D24453 llvm-svn: 281322	2016-09-13 12:10:14 +00:00
Peter Collingbourne	d4135bbc30	DebugInfo: New metadata representation for global variables. This patch reverses the edge from DIGlobalVariable to GlobalVariable. This will allow us to more easily preserve debug info metadata when manipulating global variables. Fixes PR30362. A program for upgrading test cases is attached to that bug. Differential Revision: http://reviews.llvm.org/D20147 llvm-svn: 281284	2016-09-13 01:12:59 +00:00
Sanjay Patel	ff00fae8e6	add more tests for PR30273 llvm-svn: 281270	2016-09-12 22:28:29 +00:00
Sanjay Patel	dea26950a0	[InstCombine] add test for PR30327 llvm-svn: 281248	2016-09-12 19:50:08 +00:00
Sanjay Patel	2eea9a1d58	[InstCombine] regenerate checks llvm-svn: 281247	2016-09-12 19:29:26 +00:00
Sanjay Patel	f5887f1fbd	[InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors isSignBitCheck could be changed to take a pointer param to avoid the 'UnusedBit' ugliness. llvm-svn: 281231	2016-09-12 16:25:41 +00:00
David Majnemer	c83044d9bb	[FunctionAttrs] Don't try to infer returned if it is already on an argument Trying to infer the 'returned' attribute if an argument is already 'returned' can lead to verification failure: inference might determine that a different argument is passed through which would result in two different arguments marked as 'returned'. This fixes PR30350. llvm-svn: 281221	2016-09-12 16:04:59 +00:00
Sanjay Patel	db400baa80	[InstCombine] add tests to show missing vector folds llvm-svn: 281219	2016-09-12 15:51:42 +00:00
Sanjay Patel	f9ca770225	[InstCombine] regenerate checks llvm-svn: 281186	2016-09-12 00:12:56 +00:00
Sanjay Patel	a2aabfcc17	[InstCombine] regenerate checks llvm-svn: 281185	2016-09-12 00:08:33 +00:00
James Molloy	104370ab37	[SimplifyCFG] Be even more conservative in SinkThenElseCodeToEnd This should actually fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas. The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan). llvm-svn: 281162	2016-09-11 09:00:03 +00:00
James Molloy	18d96e8fa5	[SimplifyCFG] Harden up the profitability heuristic for block splitting during sinking Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example. So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction. Should fix PR30244. llvm-svn: 281160	2016-09-11 08:07:30 +00:00
Justin Lebar	11a3204355	Add handling of !invariant.load to PropagateMetadata. Summary: This will let e.g. the load/store vectorizer propagate this metadata appropriately. Reviewers: arsenm Subscribers: tra, jholewinski, hfinkel, mzolotukhin Differential Revision: https://reviews.llvm.org/D23479 llvm-svn: 281153	2016-09-11 01:39:08 +00:00
Arnold Schwaighofer	5d335559b9	InstCombine: Don't combine loads/stores from swifterror to a new type This generates invalid IR: the only users of swifterror can be call arguments, loads, and stores. rdar://28242257 llvm-svn: 281144	2016-09-10 18:14:57 +00:00
Arnold Schwaighofer	c9277f40fd	Inliner: Don't mark swifterror allocas with lifetime markers This would create a bitcast use which fails the verifier: swifterror values may only be used by loads, stores, and as function arguments. rdar://28233244 llvm-svn: 281114	2016-09-09 22:40:27 +00:00
Matt Arsenault	950a82047b	LSV: Fix incorrectly increasing alignment If the unaligned access has a dynamic offset, it may be odd which would make the adjusted alignment incorrect to use. llvm-svn: 281110	2016-09-09 22:20:14 +00:00
Sanjay Patel	58109abe91	[InstCombine] use m_APInt to allow icmp ult X, C folds for splat constant vectors llvm-svn: 281107	2016-09-09 21:59:37 +00:00
Dehao Chen	22ce5eb051	Do not widen load for different variable in GVN. Summary: Widening load in GVN is too early because it will block other optimizations like PRE, LICM. https://llvm.org/bugs/show_bug.cgi?id=29110 The SPECCPU2006 benchmark impact of this patch: Reference: o2_nopatch (1): o2_patched Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 25.2 -0.08% spec/2006/fp/C++/447.dealII 45.92 +1.05% spec/2006/fp/C++/450.soplex 41.7 -0.26% spec/2006/fp/C++/453.povray 35.65 +1.68% spec/2006/fp/C/433.milc 23.79 +0.42% spec/2006/fp/C/470.lbm 41.88 -1.12% spec/2006/fp/C/482.sphinx3 47.94 +1.67% spec/2006/int/C++/471.omnetpp 22.46 -0.36% spec/2006/int/C++/473.astar 21.19 +0.24% spec/2006/int/C++/483.xalancbmk 36.09 -0.11% spec/2006/int/C/400.perlbench 33.28 +1.35% spec/2006/int/C/401.bzip2 22.76 -0.04% spec/2006/int/C/403.gcc 32.36 +0.12% spec/2006/int/C/429.mcf 41.04 -0.41% spec/2006/int/C/445.gobmk 26.94 +0.04% spec/2006/int/C/456.hmmer 24.5 -0.20% spec/2006/int/C/458.sjeng 28 -0.46% spec/2006/int/C/462.libquantum 55.25 +0.27% spec/2006/int/C/464.h264ref 45.87 +0.72% geometric mean +0.23% For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400. Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames Subscribers: gberry, junbuml Differential Revision: https://reviews.llvm.org/D24096 llvm-svn: 281074	2016-09-09 18:42:35 +00:00
Sanjay Patel	6da0fb8c74	[InstCombine] add tests to show pattern matching failures due to commutation I was looking to fix a bug in getComplexity(), and these cases showed up as obvious failures. I'm not sure how to find these in general though. llvm-svn: 281055	2016-09-09 16:35:20 +00:00
Gor Nishanov	faf36c2e0b	[Coroutines] Part13: Handle single edge PHINodes across suspends Summary: If one of the uses of the value is a single edge PHINode, handle it. Original: %val = something <suspend> %p = PHINode [%val] After Spill + Part13: %val = something %slot = gep val.spill.slot store %val, %slot <suspend> %p = load %slot Plus tiny fixes/changes: * use correct index for coro.free in CoroCleanup * fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24242 llvm-svn: 281020	2016-09-09 05:39:00 +00:00
Dehao Chen	87823f8e4d	Remove debug info when hoisting instruction from then/else branch. Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch. Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D24164 llvm-svn: 280995	2016-09-08 21:53:33 +00:00
Sanjay Patel	a4c6223319	[InstCombine] regenerate checks llvm-svn: 280993	2016-09-08 21:40:21 +00:00
Matthew Simpson	bfe5e1817b	[LV] Ensure proper handling of multi-use case when collecting uniforms The test case included in r280979 wasn't checking what it was supposed to be checking for the predicated store case. Fixing the test revealed that the multi-use case (when a pointer is used by both vectorized and scalarized memory accesses) wasn't being handled properly. We can't skip over non-consecutive-like pointers since they may have looked consecutive-like with a different memory access. llvm-svn: 280992	2016-09-08 21:38:26 +00:00
Sanjay Patel	ed9fda01a3	[InstCombine] regenerate checks llvm-svn: 280991	2016-09-08 21:32:21 +00:00
Matthew Simpson	408a3abcfe	[LV] Don't mark pointers used by scalarized memory accesses uniform Previously, all consecutive pointers were marked uniform after vectorization. However, if a consecutive pointer is used by a memory access that is eventually scalarized, the pointer won't remain uniform after all. An example is predicated stores. Even though a predicated store may be consecutive, it will still be scalarized, making it's pointer operand non-uniform. This patch updates the logic in collectLoopUniforms to consider the cases where a memory access may be scalarized. If a memory access may be scalarized, its pointer operand is not marked uniform. The determination of whether a given memory instruction will be scalarized or not has been moved into a common function that is used by the vectorizer, cost model, and legality analysis. Differential Revision: https://reviews.llvm.org/D24271 llvm-svn: 280979	2016-09-08 19:11:07 +00:00
Dehao Chen	ebb715b119	Add unittest for r280760 llvm-svn: 280963	2016-09-08 16:53:40 +00:00
Simon Pilgrim	073472a2bf	[InstCombine][X86] Regenerate masked memory op combine tests llvm-svn: 280960	2016-09-08 16:32:37 +00:00
Simon Pilgrim	cd7b2830b9	[InstCombine][X86] Regenerate vperm2f128/vperm2i128 combine tests llvm-svn: 280959	2016-09-08 16:30:46 +00:00
Simon Pilgrim	3b1ecbe66c	[InstCombine][X86] Regenerate insertps combine tests llvm-svn: 280957	2016-09-08 16:15:21 +00:00
Michael Zolotukhin	e72997a524	Revert "[LoopUnroll] Properly update loop-info when cloning prologues and epilogues." This reverts commit r280901. This caused a bunch of failures, reverting it until I investigate them. llvm-svn: 280905	2016-09-08 03:51:30 +00:00
Michael Zolotukhin	5e0a20697e	[LoopUnroll] Properly update loop-info when cloning prologues and epilogues. Summary: When cloning blocks for prologue/epilogue we need to replicate the loop structure from the original loop. It wasn't a problem for the innermost loops, but it led to an incorrect loop info when we unrolled a loop with a child loop - in this case created prologue-loop had a child loop, but loop info didn't reflect that. This fixes PR28888. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits, silvas Differential Revision: https://reviews.llvm.org/D24203 llvm-svn: 280901	2016-09-08 01:52:26 +00:00
Sanjay Patel	9b40f98357	[InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for splat constant vectors llvm-svn: 280873	2016-09-07 22:33:03 +00:00
Hal Finkel	ac5803ba91	[SimplifyCFG] Don't try to create metadata-valued PHIs We can't create metadata-valued PHIs; don't try to do so when sinking. I created a test case for this using the @llvm.type.test intrinsic, because it takes a metadata parameter and does not have severe side effects (thus SimplifyCFG is willing to otherwise sink it). Previously, running the test case would crash with: Invalid use of metadata! %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0> LLVM ERROR: Broken function found, compilation aborted! llvm-svn: 280866	2016-09-07 21:38:22 +00:00
Sanjay Patel	def931e76a	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors This is a revert of r280676 which was a revert of r280637; ie, this is r280637 again. It was speculatively reverted to help debug buildbot failures. llvm-svn: 280861	2016-09-07 20:50:44 +00:00
Justin Lebar	3a5f40c191	[LSV] Use the original loads' names for the extractelement instructions. Summary: LSV replaces multiple adjacent loads with one vectorized load and a bunch of extractelement instructions. This patch makes the extractelement instructions' names match those of the original loads, for (hopefully) improved readability. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D23748 llvm-svn: 280818	2016-09-07 15:49:48 +00:00
Andrea Di Biagio	bdd576dbb0	Regenerate vector bitcast folding tests using update_test_checks.py. Two tests have been merged together, regenerated and then moved to a more appropriate directory. No functional change. llvm-svn: 280814	2016-09-07 14:50:07 +00:00
Andrea Di Biagio	f3fd316223	[InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807	2016-09-07 12:47:53 +00:00
Andrea Di Biagio	8df5b9cf48	[InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804	2016-09-07 12:03:03 +00:00
James Molloy	ec905a62ae	[SimplifyCFG] Update workaround for PR30188 to also include loads I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores. Fixes PR30188 (again). llvm-svn: 280792	2016-09-07 08:40:20 +00:00
James Molloy	bf1837d9c9	[SimplifyCFG] Check PHI uses more accurately PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG. Fixes PR30292. llvm-svn: 280790	2016-09-07 08:15:54 +00:00
Adam Nemet	c520822dbf	[JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info Currently the pass updates branch weights in the IR if the function has any PGO info (entry frequency is set). However we could still have regions of the CFG that does not have branch weights collected (e.g. a cold region). In this case we'd use static estimates. Since static estimates for branches are determined independently, they are inconsistent. Updating them can "randomly" inflate block frequencies. I've run into this in a completely cold loop of h264ref from SPEC. -Rpass-with-hotness showed the loop to be completely cold during inlining (before JT) but completely hot during vectorization (after JT). The new testcase demonstrate the problem. We check array elements against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting check. The block names should be self-explanatory. In this example, jump threading incorrectly updates the weight of the loop-exiting branch to 0, drastically inflating the frequency of the loop (in the range of billions). There is no run-time profile info for edges inside the loop, so branch probabilities are estimated. These are the resulting branch and block frequencies for the loop body: check_1 (16) (8) / \| eq_1 \| (8) \ \| check_2 (16) (8) / \| eq_2 \| (8) \ \| check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) First we thread eq_1 -> check_2 to check_3. Frequencies are updated to remove the frequency of eq_1 from check_2 and then from the false edge leaving check_2. Changed frequencies are highlighted with * : check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| \ eq_2 \| (0) \ \ \| ` --- check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new back edges. Frequencies are updated to remove the frequency of eq_1 and eq_3 from check_3 and then the false edge leaving check_3 (changed frequencies are highlighted with ): check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| /-- eq_2~ \| (0) (back edge) \| check_3 (0) (0) / \| (loop exit) \| (0*) \| (back edge) As a result, the loop exit edge ends up with 0 frequency which in turn makes the loop header to have maximum frequency. There are a few potential problems here: 1. The profile data seems odd. There is a single profile sample of the loop being entered. On the other hand, there are no weights inside the loop. 2. Based on static estimation we shouldn't set edges to "extreme" values, i.e. extremely likely or unlikely. 3. We shouldn't create profile metadata that is calculated from static estimation. I am not sure what policy is but it seems to make sense to treat profile metadata as something that is known to originate from profiling. Estimated probabilities should only be reflected in BPI/BFI. Any one of these would probably fix the immediate problem. I went for 3 because I think it's a good policy to have and added a FIXME about 2. Differential Revision: https://reviews.llvm.org/D24118 llvm-svn: 280713	2016-09-06 16:08:33 +00:00
Sanjay Patel	e341c919b0	fix FileCheck variables for test added with r280677 The script (utils/update_test_checks.py) seems to have problems with variable names that start with the same string. llvm-svn: 280679	2016-09-05 23:49:32 +00:00
Gor Nishanov	ccabaca273	[Coroutines] Part12: Handle alloca address-taken Summary: Move early uses of spilled variables after CoroBegin. For example, if a parameter had address taken, we may end up with the code like: define @f(i32 %n) { %n.addr = alloca i32 store %n, %n.addr ... call @coro.begin This patch fixes the problem by moving uses of spilled variables after CoroBegin. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24234 llvm-svn: 280678	2016-09-05 23:45:45 +00:00
Sanjay Patel	eea2ef7862	[InstCombine] don't assert that division-by-constant has been folded (PR30281) This is effectively a revert of: https://reviews.llvm.org/rL280115 And this should fix https://llvm.org/bugs/show_bug.cgi?id=30281: llvm-svn: 280677	2016-09-05 23:38:22 +00:00
Sanjay Patel	46f9df5b71	[InstCombine] revert r280637 because it causes test failures on an ARM bot http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll llvm-svn: 280676	2016-09-05 22:36:32 +00:00
Oliver Stannard	ef38d53a7e	[SimplifyCFG] Add test for sinking inline asm in if/else This test code previously caused a failure in the module verifier, because SimplifyCFG created this invalid instruction, which tries to take the address of inline asm: %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r" This has been fixed recently, presumably by James Molloy's patches that re-wrote and changed parts of SimplifyCFG, so this patch just adds a regression test for it. Differential Revision: https://reviews.llvm.org/D24231 llvm-svn: 280660	2016-09-05 13:49:26 +00:00
Gor Nishanov	0e18f75a92	[Coroutines] Part11: Add final suspend handling. Summary: A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties: * it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic; * a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic. This patch adds final suspend handling logic to CoroEarly and CoroSplit passes. Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll). Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24068 llvm-svn: 280646	2016-09-05 04:44:30 +00:00
Sanjay Patel	c641e9d6ff	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors The code to calculate 'UsesRemoved' could be simplified. As-is, that code is a victim of PR30273: https://llvm.org/bugs/show_bug.cgi?id=30273 llvm-svn: 280637	2016-09-04 20:58:27 +00:00
Dorit Nuzman	abd15f69b2	[InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617	2016-09-04 07:49:39 +00:00
Joseph Tremoulet	e92e0a9042	Fix inliner funclet unwind memoization Summary: The inliner may need to determine where a given funclet unwinds to, and this determination may depend on other funclets throughout the funclet tree. The code that performs this walk in getUnwindDestToken memoizes results to avoid redundant computations. In the case that a funclet's unwind destination is derived from its ancestor, there's code to walk back down the tree from the ancestor updating the memo map of its descendants to record the unwind destination. This change fixes that code to account for the case that some descendant has a different unwind destination, which can happen if that unwind dest is a descendant of the EHPad being queried and thus didn't determine its unwind destination. Also update test inline-funclets.ll, which is supposed to cover such scenarios, to include a case that fails an assertion without this fix but passes with it. Fixes PR29151. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24117 llvm-svn: 280610	2016-09-04 01:23:20 +00:00
Matt Arsenault	46a0382ab2	AMDGPU: Do basic folding of class intrinsic This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586	2016-09-03 07:06:58 +00:00
Wei Mi	c37307a5f4	Fix buildbot error. Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86. llvm-svn: 280568	2016-09-03 01:43:28 +00:00
Xinliang David Li	40afd5c9e4	[Profile] handle select instruction in 'expect' lowering Builtin expect lowering currently ignores select. This patch fixes the issue Differential Revision: http://reviews.llvm.org/D24166 llvm-svn: 280547	2016-09-02 22:03:40 +00:00
Sanjay Patel	70277411d3	[InstCombine] auto-generate assertions for tighter checking llvm-svn: 280531	2016-09-02 19:38:37 +00:00
Wei Mi	c54d1298f5	Split the store of a wide value merged from an int-fp pair into multiple stores. For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505	2016-09-02 17:17:04 +00:00
Sanjay Patel	521f19f249	[InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126) The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. Note that the transform is intentionally limited to shuffles that are equivalent to vector selects to avoid creating arbitrary shuffle masks that may not lower well. This should solve PR29126: https://llvm.org/bugs/show_bug.cgi?id=29126 Differential Revision: https://reviews.llvm.org/D23886 llvm-svn: 280504	2016-09-02 17:05:43 +00:00
Matthew Simpson	b65c230eab	[LV] Ensure reverse interleaved group GEPs remain uniform For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497	2016-09-02 16:19:22 +00:00
Andrea Di Biagio	805815f407	[instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN. This patch fixes a crash caused by an incorrect folding of an ordered comparison between a packed floating point vector and a splat vector of NaN. An ordered comparison between a vector and a constant vector of NaN, should always be folded into a constant vector where each element is i1 false. Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar 'false'. Later on, this would cause an assertion failure, since the value type of the folded value doesn't match the expected value type of the uses of the original instruction: "Assertion failed: New->getType() == getType() && "replaceAllUses of value with new value of different type!". This patch fixes the issue and adds a test case to the already existing test InstSimplify/floating-point-compares.ll. Differential Revision: https://reviews.llvm.org/D24143 llvm-svn: 280488	2016-09-02 14:47:43 +00:00
Alexey Bataev	ed27d69037	[InstCombine] Add test for insertelementinsts with constants. Added a tests that shows that several insertelementinsts with constant indexes/data are not folded into a single shuffleinst. llvm-svn: 280474	2016-09-02 09:00:53 +00:00
James Molloy	f3cf2a494b	[SimplifyCFG] Add a workaround to fix PR30188 We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280470	2016-09-02 07:29:00 +00:00
NAKAMURA Takumi	9aec5c3797	llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes. llvm-svn: 280451	2016-09-02 01:33:00 +00:00
Sanjay Patel	199a8e6a92	[InstCombine] add tests to show potential shuffle+insert folds llvm-svn: 280403	2016-09-01 19:14:19 +00:00
Sanjay Patel	dd861964d1	[InstCombine] remove fold of an icmp pattern that should never happen While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger this code path. The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic() or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast(). Differential Revision: https://reviews.llvm.org/D24031 llvm-svn: 280370	2016-09-01 14:20:43 +00:00
James Molloy	88cad7e5cf	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280364	2016-09-01 12:58:13 +00:00
James Molloy	eec6df3193	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280351	2016-09-01 10:44:35 +00:00
Hal Finkel	40d7f5c277	Add a counter-function insertion pass As discussed in https://reviews.llvm.org/D22666, our current mechanism to support -pg profiling, where we insert calls to mcount(), or some similar function, is fundamentally broken. We insert these calls in the frontend, which means they get duplicated when inlining, and so the accumulated execution counts for the inlined-into functions are wrong. Because we don't want the presence of these functions to affect optimizaton, they should be inserted in the backend. Here's a pass which would do just that. The knowledge of the name of the counting function lives in the frontend, so we're passing it here as a function attribute. Clang will be updated to use this mechanism. Differential Revision: https://reviews.llvm.org/D22825 llvm-svn: 280347	2016-09-01 09:42:39 +00:00
Nick Lewycky	97e49ac59e	Add -fprofile-dir= to clang. -fprofile-dir=path allows the user to specify where .gcda files should be emitted when the program is run. In particular, this is the first flag that causes the .gcno and .o files to have different paths, LLVM is extended to support this. -fprofile-dir= does not change the file name in the .gcno (and thus where lcov looks for the source) but it does change the name in the .gcda (and thus where the runtime library writes the .gcda file). It's different from a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX. To implement this we split -coverage-file into -coverage-data-file and -coverage-notes-file to specify the two different names. The !llvm.gcov metadata node grows from a 2-element form {string coverage-file, node dbg.cu} to 3-elements, {string coverage-notes-file, string coverage-data-file, node dbg.cu}. In the 3-element form, the file name is already "mangled" with .gcno/.gcda suffixes, while the 2-element form left that to the middle end pass. llvm-svn: 280306	2016-08-31 23:04:32 +00:00
Sanjay Patel	0d70831d73	[InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 ) allows us to remove the ConstantInt check; no other changes needed. llvm-svn: 280300	2016-08-31 22:18:43 +00:00
Sanjay Patel	541aef4661	[InstCombine] allow icmp (div X, Y), C folds for splat constant vectors Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO. llvm-svn: 280299	2016-08-31 21:57:21 +00:00
Geoff Berry	8d84605f25	[EarlyCSE] Optionally use MemorySSA. NFC. Summary: Use MemorySSA, if requested, to do less conservative memory dependency checking. This change doesn't enable the MemorySSA enhanced EarlyCSE in the default pipelines, so should be NFC. Reviewers: dberlin, sanjoy, reames, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19821 llvm-svn: 280279	2016-08-31 19:24:10 +00:00
Geoff Berry	64f5ed172a	[EarlyCSE] Allow forwarding a non-invariant load into an invariant load. Reviewers: sanjoy Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23935 llvm-svn: 280265	2016-08-31 17:45:31 +00:00
Philip Reames	2b1084ac93	[statepoints][experimental] Add support for live-in semantics of values in deopt bundles This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we only fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any known miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 llvm-svn: 280250	2016-08-31 15:12:17 +00:00
James Molloy	cacfc16109	Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd" This reverts commit r280216 - it caused buildbot failures. llvm-svn: 280234	2016-08-31 13:16:52 +00:00
James Molloy	76c9d423a7	Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches" This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280233	2016-08-31 13:16:45 +00:00
James Molloy	06a45483a1	Revert "[SimplifyCFG] Add a workaround to fix PR30188" This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280232	2016-08-31 13:16:36 +00:00
James Molloy	8a66a39cbf	Revert "[SimplifyCFG] Fix bootstrap failure after r280220" This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence. llvm-svn: 280231	2016-08-31 13:16:30 +00:00
James Molloy	b7efa6c227	[SimplifyCFG] Fix bootstrap failure after r280220 We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash. llvm-svn: 280228	2016-08-31 12:33:48 +00:00
James Molloy	171fdac7ce	[SimplifyCFG] Add a workaround to fix PR30188 We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280219	2016-08-31 10:46:45 +00:00
James Molloy	c53b40b509	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280217	2016-08-31 10:46:33 +00:00
James Molloy	55bd04cd20	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280216	2016-08-31 10:46:23 +00:00
James Molloy	923e98c232	[SimplifyCFG] Tail-merge calls with sideeffects This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour at least vaguely consistent with the previous version and keep it as close to NFC as I could. There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup along the way to ensure we don't create indirect calls. Should fix PR28964. llvm-svn: 280215	2016-08-31 10:46:16 +00:00
Gor Nishanov	50d7fb974f	[Coroutines] Part 10: Add coroutine promise support. Summary: 1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain a coroutine promise pointer from a coroutine frame and vice versa. 2) CoroFrame now interprets Promise argument of llvm.coro.begin to place CoroutinPromise alloca at a deterministic offset from the coroutine frame. Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll). Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23993 llvm-svn: 280184	2016-08-31 00:35:41 +00:00
Alina Sbirlea	3f8f7840bf	[LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148. Summary: LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize. A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head. e.g.: i1: store a[0] i2: store a[1] i3: store a[1] Leads to: H: i1 T: i2 i3 Instead of: H: i1 i1 T: i2 i3 So the positions for instructions that follow i3 will have different indexes in H/T. This patch resolves PR29148. This issue also surfaced the fact that if the chain is too long, and TLI returns a "not-fast" answer, the whole chain will be abandoned for vectorization, even though a smaller one would be beneficial. Added a testcase and FIXME for this. Reviewers: tstellarAMD, arsenm, jlebar Subscribers: mzolotukhin, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24057 llvm-svn: 280179	2016-08-30 23:53:59 +00:00
Sanjay Patel	ddb53dd080	[InstCombine] add tests to show type limitations of InsertRangeTest and callers llvm-svn: 280175	2016-08-30 23:16:59 +00:00
Michael Kuperstein	2954d1db77	[LoopVectorizer] Predicate instructions in blocks with several incoming edges We don't need to limit predication to blocks that have a single incoming edge, we just need to use the right mask. This fixes PR30172. Differential Revision: https://reviews.llvm.org/D24009 llvm-svn: 280148	2016-08-30 20:22:21 +00:00
Daniel Berlin	c943d72d94	IntrArgMemOnly is only defined (and current AA machinery only sanely supports) pointer arguments, and these intrinsics have vector of pointer arguments. Remove ArgMemOnly until we either have the machinery, define a new attribute, or something similar llvm-svn: 280143	2016-08-30 19:58:48 +00:00
Sanjay Patel	b37145712e	[InstCombine] replace divide-by-constant checks with asserts; NFC These folds already have tests for scalar and vector types, except for the vector div-by-0 case, so I'm adding tests for that. llvm-svn: 280115	2016-08-30 17:31:34 +00:00
James Molloy	d13b1239e4	[SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles. llvm-svn: 280072	2016-08-30 10:56:08 +00:00
Teresa Johnson	8c1bc986b5	[ThinLTO] Indirect call promotion fixes for promoted local functions Summary: Fix a couple issues limiting the application of indirect call promotion in ThinLTO mode: - Invoke indirect call promotion before globalopt, since it may eliminate imported functions which appear unreferenced. - Invoke indirect call promotion with InLTO=true so that the PGOFuncName metadata is used to get the name for locals which would have been renamed during promotion. Reviewers: davidxl, mehdi_amini Subscribers: Prazek, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24004 llvm-svn: 280024	2016-08-29 22:46:56 +00:00
Matthew Simpson	df19502b16	[LV] Move insertelement sequence after scalar definitions After r279649 when getting a vector value from VectorLoopValueMap, we create an insertelement sequence on-demand if the value has been scalarized instead of vectorized. We previously inserted this insertelement sequence before the value's first vector user. However, this insert location is problematic if that user is the phi node of a first-order recurrence. With this patch, we move the insertelement sequence after the last scalar instruction we created when scalarizing the value. Thus, the value's vector definition in the new loop will immediately follow its scalar definitions. This should fix PR30183. Reference: https://llvm.org/bugs/show_bug.cgi?id=30183 llvm-svn: 280001	2016-08-29 20:14:04 +00:00
David Majnemer	e8fd5f9ffd	[SimplifyCFG] Hoisting invalidates metadata We forgot to remove optimization metadata when performing hosting during FoldTwoEntryPHINode. This fixes PR29163. llvm-svn: 279980	2016-08-29 17:14:08 +00:00
Anna Thomas	2bc129c5fd	[StatepointsForGC] Rematerialize in the presence of PHIs Summary: While walking the use chain for identifying rematerializable values in RS4GC, add the case where the current value and base value are the same PHI nodes. This will aid rematerialization of geps and casts instead of relocating. Reviewers: sanjoy, reames, igor Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23920 llvm-svn: 279975	2016-08-29 15:41:59 +00:00
Sanjay Patel	25475bcc0c	[Constant] remove fdiv and frem from canTrap() Assuming the default FP env, we should not treat fdiv and frem any differently in terms of trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env. This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in the backend after: https://reviews.llvm.org/rL279970 llvm-svn: 279973	2016-08-29 15:27:17 +00:00
Sanjay Patel	20f02b271b	[SimplifyCFG] rename test file, regenerate checks, and add test The fdiv test shows a problem similar to: https://reviews.llvm.org/rL279970 llvm-svn: 279972	2016-08-29 14:57:53 +00:00
Gor Nishanov	dce9b02677	[Coroutines] Part 9: Add cleanup subfunction. Summary: [Coroutines] Part 9: Add cleanup subfunction. This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll) Intrinsic Changes: * coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine. * coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined. CoroSplit now creates three subfunctions: # f$resume - resume logic # f$destroy - cleanup logic, followed by a deallocation code # f$cleanup - just the cleanup code CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not. Other fixes, improvements: * Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point. * Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>) Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23844 llvm-svn: 279971	2016-08-29 14:34:12 +00:00
Sanjay Patel	5c5311f4e5	[InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat constant vectors llvm-svn: 279937	2016-08-28 18:18:00 +00:00
Elena Demikhovsky	3622fbfc68	[Loop Vectorizer] Fixed memory confilict checks. Fixed a bug in run-time checks for possible memory conflicts inside loop. The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size". Differential revision: https://reviews.llvm.org/D23176 llvm-svn: 279930	2016-08-28 08:53:53 +00:00
Adam Nemet	cef3314156	[Inliner] Report when inlining fails because callee's def is unavailable Summary: This is obviously an interesting case because it may motivate code restructuring or LTO. Reporting this requires instantiation of ORE in the loop where the call sites are first gathered. I've checked compile-time overhead with -Rpass-with-hotness and the worst slow-down was 6% in mcf and quickly tailing off. As before without -Rpass-with-hotness there is no overhead. Because this could be a pretty noisy diagnostics, it is currently qualified as 'verbose'. As of this patch, 'verbose' diagnostics are only emitted with -Rpass-with-hotness, i.e. when the output is expected to be filtered. Reviewers: eraman, chandlerc, davidxl, hfinkel Subscribers: tejohnson, Prazek, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D23415 llvm-svn: 279860	2016-08-26 20:21:05 +00:00
Tim Shen	3ad8b43cc2	[MemCpy] Add comments for r279769 Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279778	2016-08-25 21:03:46 +00:00
Tim Shen	a3dbead2d6	[MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands Summary: This fixes pr29105. The reason is that lifetime marks creates new aliasing pointers the original ones, but before this patch aliases were not checked in performMemCpyToMemSetOptzn. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279769	2016-08-25 19:27:26 +00:00
Sebastian Pop	5f0d0e60d1	GVN-hoist: fix hoistingFromAllPaths for loops (PR29034) It is invalid to hoist stores or loads if they are not executed on all paths from the hoisting point to the exit of the function. In the testcase, there are paths in the loop that do not execute the stores or the loads, and so hoisting them within the loop is unsafe. The problem is that the current implementation of hoistingFromAllPaths is incomplete: it walks all blocks dominated by the hoisting point, and does not return false when the loop contains a path on which the hoisted ld/st is not executed. Differential Revision: https://reviews.llvm.org/D23843 llvm-svn: 279732	2016-08-25 11:55:47 +00:00
Xinliang David Li	cad3a995a4	[Profile] Propagate branch metadata properly in instcombine Differential Revision: http://reviews.llvm.org/D23590 llvm-svn: 279693	2016-08-25 00:26:32 +00:00
Sanjay Patel	d398d4a39e	[InstCombine] use m_APInt to allow icmp eq/ne (shr X, C2), C folds for splat constant vectors llvm-svn: 279677	2016-08-24 22:22:06 +00:00
Matthew Simpson	abd2be1e2e	[LV] Unify vector and scalar maps This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop are maintained in VectorLoopValueMap. The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions. If a scalarized value is used by a scalar instruction, the scalar value is used directly. However, if the scalarized value is needed by a vector instruction, we generate the needed insertelement instructions on-demand. A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into VectorLoopValueMap. These functions work together to return the requested values if they're available or to produce them if they're not. The mapping has also be made less permissive. Entries can be added to VectorLoopValue map with the new initVector and initScalar functions. getVectorValue has been modified to return a constant reference to the mapped entries. There's no real functional change with this patch; however, in some cases we will generate slightly different code. For example, instead of an insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes. Differential Revision: https://reviews.llvm.org/D23169 llvm-svn: 279649	2016-08-24 18:23:17 +00:00
Sanjoy Das	ff855b6020	[SCCP] Don't delete side-effecting instructions I'm not sure if the `!isa<CallInst>(Inst) && !isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the case we know is broken. llvm-svn: 279647	2016-08-24 18:10:21 +00:00
Gil Rapaport	550148b2f6	[Loop Vectorizer] Support predication of div/rem div/rem instructions in basic blocks that require predication currently prevent vectorization. This patch extends the existing mechanism for predicating stores to handle other instructions and leverages it to predicate divs and rems. Differential Revision: https://reviews.llvm.org/D22918 llvm-svn: 279620	2016-08-24 11:37:57 +00:00
Gor Nishanov	241b041fba	[Coroutines] Part 8: Coroutine Frame Building algorithm Summary: This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled). Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) ... 7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461) 8. Coroutine Frame Building algorithm <= we are here 9. Add f.cleanup subfunction. 10+. The rest of the logic Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23586 llvm-svn: 279609	2016-08-24 04:44:35 +00:00
Matthew Simpson	df2ab917ad	[SLP] Avoid signed integer overflow The test case included with r279125 exposed an existing signed integer overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together with other costs, such as getReductionCost. This patch removes the possibility of assigning a cost of INT_MAX. Since we were previously using INT_MAX as an indicator for "should not vectorize", we now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable" before computing a cost. This patch adds a run-line to the test case used for r279125 that ensures we don't vectorize. Previously, this line would vectorize the test case by chance due to undefined behavior in the cost calculation. Differential Revision: https://reviews.llvm.org/D23723 llvm-svn: 279562	2016-08-23 20:48:50 +00:00
Sanjay Patel	6946e2ade3	[InstSimplify] allow icmp with constant folds for splat vectors, part 2 Completes the m_APInt changes for simplifyICmpWithConstant(). Other commits in this series: https://reviews.llvm.org/rL279492 https://reviews.llvm.org/rL279530 https://reviews.llvm.org/rL279534 https://reviews.llvm.org/rL279538 llvm-svn: 279543	2016-08-23 18:00:51 +00:00
Sanjay Patel	200e3cbfb0	[InstSimplify] allow icmp with constant folds for splat vectors, part 1 llvm-svn: 279538	2016-08-23 17:30:56 +00:00
Sanjay Patel	ada2bb3d5d	[InstSimplify] add tests to show missing vector icmp folds llvm-svn: 279534	2016-08-23 17:13:38 +00:00
Sanjay Patel	5c269d0b7a	[InstSimplify] move icmp with constant tests to another file; NFC ...because like the corresponding code, this is just too big to keep adding to. And the next step is to add a vector version of each of these tests to show missed folds. Also, auto-generate CHECK lines and add comments for the tests that correspond to the source code. llvm-svn: 279530	2016-08-23 16:46:53 +00:00
Sanjay Patel	a392049419	[InstCombine] use m_APInt to allow icmp (shr exact X, Y), 0 folds for splat constant vectors llvm-svn: 279472	2016-08-22 20:45:06 +00:00
James Molloy	5bf2114265	[SimplifyCFG] Rewrite SinkThenElseCodeToEnd [Recommitting now an unrelated assertion in SROA is sorted out] The new version has several advantages: 1) IMSHO it's more readable and neater 2) It handles loads and stores properly 3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch. With this change we can now finally sink load-modify-store idioms such as: if (a) return b += 3; else return b += 4; => %z = load i32, i32* %y %.sink = select i1 %a, i32 5, i32 7 %b = add i32 %z, %.sink store i32 %b, i32* %y ret i32 %b When this works for switches it'll be even more powerful. Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables. This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup. llvm-svn: 279460	2016-08-22 19:07:15 +00:00
Jun Bum Lim	ec8b8cc595	[InstCombine] Allow sinking from unique predecessor with multiple edges Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination. Reviewers: mcrosier, majnemer, spatel, reames Subscribers: reames, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23722 llvm-svn: 279448	2016-08-22 18:21:56 +00:00
James Molloy	475f4a763f	Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd" This reverts commit r279443. It caused buildbot failures. llvm-svn: 279447	2016-08-22 18:13:12 +00:00
James Molloy	353052698a	[SimplifyCFG] Rewrite SinkThenElseCodeToEnd The new version has several advantages: 1) IMSHO it's more readable and neater 2) It handles loads and stores properly 3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch. With this change we can now finally sink load-modify-store idioms such as: if (a) return b += 3; else return b += 4; => %z = load i32, i32* %y %.sink = select i1 %a, i32 5, i32 7 %b = add i32 %z, %.sink store i32 %b, i32* %y ret i32 %b When this works for switches it'll be even more powerful. Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables. This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup. llvm-svn: 279443	2016-08-22 17:40:23 +00:00
Artur Pilipenko	bc76ecada0	Revert -r278267 [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers This change cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks. See https://reviews.llvm.org/D18777 for details. llvm-svn: 279433	2016-08-22 13:14:07 +00:00
Artur Pilipenko	b78ad9d41f	Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks. See comments on https://reviews.llvm.org/D18777 for details. llvm-svn: 279432	2016-08-22 13:12:07 +00:00
Balaram Makam	a927aa4ad0	[PM] Port LoopDataPrefetch AArch64 tests to new pass manager Reviewers: mcrosier, tejohnson Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23724 llvm-svn: 279431	2016-08-22 12:59:58 +00:00
Sanjay Patel	643d21a62c	[InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 4 This concludes the fixes for icmp+shl in this series: https://reviews.llvm.org/rL279339 https://reviews.llvm.org/rL279398 https://reviews.llvm.org/rL279399 llvm-svn: 279401	2016-08-21 17:10:07 +00:00
Sanjay Patel	163a5ab799	remove FIXME comment; fixed by previous commit llvm-svn: 279400	2016-08-21 16:40:42 +00:00
Sanjay Patel	7ffcde7422	[InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 3 This is a partial enablement (move the ConstantInt guard down). llvm-svn: 279399	2016-08-21 16:35:34 +00:00
Sanjay Patel	7e09f13fed	[InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 2 This is a partial enablement (move the ConstantInt guard down). llvm-svn: 279398	2016-08-21 16:28:22 +00:00
Matthew Simpson	235e479984	Reapply "[SLP] Initialize VectorizedValue when gathering" The test case included in r279125 exposed existing undefined behavior in the SLP vectorizer that it did not introduce. This patch reapplies the original patch, but modifies the test case to avoid hitting the undefined behavior. This allows us to close PR28330 while keeping the UBSan bot happy. The undefined behavior the original test uncovered will be addressed in a follow-on patch. Reference: https://llvm.org/bugs/show_bug.cgi?id=28330 llvm-svn: 279370	2016-08-20 14:49:02 +00:00
Vitaly Buka	cc7db13bf0	Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot. This reverts commit r279125. https://reviews.llvm.org/D23410 llvm-svn: 279363	2016-08-20 07:09:39 +00:00

1 2 3 4 5 ...

7662 Commits