llvm-project

Commit Graph

Author	SHA1	Message	Date
David Majnemer	de55c606d1	[InstCombine] Fold ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) This further extends r292179 to support additional binary operators beyond subtraction. llvm-svn: 292238	2017-01-17 18:08:06 +00:00
Chad Rosier	8520429bdd	[ValueTracking] Extend known bits to understand @llvm.bitreverse. Differential Revision: https://reviews.llvm.org/D28780 llvm-svn: 292233	2017-01-17 17:23:51 +00:00
Simon Pilgrim	d4eb800b03	[InstCombine][X86][AVX] Add DemandedElts support for VPERMILPD/VPERMILPS instructions Simplify a vpermilvar shuffle mask based on the elements of the mask that are actually demanded. llvm-svn: 292209	2017-01-17 11:35:03 +00:00
Sanjoy Das	679bc32c6a	[InstCombine] Don't DSE across readnone functions that may throw Summary: Depends on D28740 Reviewers: dberlin, chandlerc, hfinkel, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D28742 llvm-svn: 292197	2017-01-17 05:45:09 +00:00
David Majnemer	36d382b773	[InstCombine] Fold ((C1-zext(X)) & C2) -> zext((C1-X) & C2) This is valid if C2 fits within the bitwidth of X thanks to two's complement modulo arithmetic. llvm-svn: 292179	2017-01-17 00:45:57 +00:00
Matt Arsenault	c8cc2be9f8	Add comment to test file I forgot to save llvm-svn: 292178	2017-01-17 00:35:28 +00:00
Matt Arsenault	b948b4d8df	SimplifyLibCalls: Remove checks for fabs Use the intrinsic instead of emitting the libcall which will be replaced by the intrinsic. llvm-svn: 292176	2017-01-17 00:30:31 +00:00
Matt Arsenault	7233344c28	SimplifyLibCalls: Replace fabs libcalls with intrinsics Add missing fabs(fpext) optimzation that worked with the call, and also fixes it creating a second fpext when there were multiple uses. llvm-svn: 292172	2017-01-17 00:10:40 +00:00
Simon Pilgrim	a0b0b96d83	[InstCombine][AVX] Tests showing missed opportunities to pass demanded elts through a permilpd/permilps shuffle mask llvm-svn: 292165	2017-01-16 21:34:22 +00:00
Sanjay Patel	ab8b32de71	[InstCombine] use m_APInt to allow shift-shift folds for vectors with splat constants Some existing 'FIXME' tests are still not folded because of splat holes in value tracking. llvm-svn: 292151	2017-01-16 19:35:45 +00:00
Sanjay Patel	cd06f6fe10	[InstCombine] add tests to show missed vector folds; NFC The shift-shift possibilities became easier to see after: https://reviews.llvm.org/rL292145 llvm-svn: 292150	2017-01-16 19:23:34 +00:00
Simon Pilgrim	87eddf9aaf	[InstCombine][SSE] Tests showing missed opportunities to pass demanded elts through a packss/packus truncation llvm-svn: 292144	2017-01-16 17:26:23 +00:00
Simon Pilgrim	73a68c25a0	[InstCombine][SSE] Add DemandedElts support for PSHUFB instructions Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded. Differential Revision: https://reviews.llvm.org/D28745 llvm-svn: 292101	2017-01-16 11:30:41 +00:00
Sanjay Patel	21347ffddf	[InstCombine] add tests to show missed vector folds; NFC Also, add comments and remove bogus comment. llvm-svn: 292082	2017-01-15 23:45:03 +00:00
Simon Pilgrim	21c2a50237	[InstCombine][SSE] Tests showing missed opportunities to pass demanded elts through a pshufb shuffle mask llvm-svn: 292072	2017-01-15 17:49:04 +00:00
Sanjay Patel	5f8451afad	[InstCombine] use m_APInt to allow ashr folds for vectors with splat constants llvm-svn: 292064	2017-01-15 16:38:19 +00:00
Sanjay Patel	fba2df8d0f	[InstCombine] add explanatory comments to tests; NFC llvm-svn: 292063	2017-01-15 16:22:26 +00:00
Chandler Carruth	0952750fae	[PM] Clean up the testing for IVUsers, especially with the new PM. First, I've moved a test of IVUsers from the LSR tree to a dedicated IVUsers test directory. I've also simplified its RUN line now that the new pass manager's loop PM is providing analyses on their own. No functionality changed, but it makes subsequent changes cleaner. llvm-svn: 292060	2017-01-15 09:29:27 +00:00
Chandler Carruth	5edfd4d99e	[PM] Fix instcombine's analysis preservation in the new pass manager to cover domtree and alias analysis. These are the pretty clear analyses that we would always want to survive this pass. To make these survive, we also need to preserve the assumption cache. Added a test that verifies the important bits of this preservation. llvm-svn: 292037	2017-01-14 23:25:22 +00:00
Sanjay Patel	f8ed0a5e93	[InstCombine] add test to show missed vector fold; NFC llvm-svn: 292035	2017-01-14 23:12:29 +00:00
Daniel Berlin	01831b7db2	NewGVN: Fix PR31613 test regex naming llvm-svn: 291979	2017-01-13 23:54:10 +00:00
Sanjay Patel	40f401776b	[InstCombine] optimize unsigned icmp of increment Allows LLVM to optimize sequences like the following: %add = add nuw i32 %x, 1 %cmp = icmp ugt i32 %add, %y Into: %cmp = icmp uge i32 %x, %y Previously, only signed comparisons were being handled. Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to 'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization seems like it might have far-reaching ramifications so I kept this simple for now. Patch by Matti Niemenmaa! Differential Revision: https://reviews.llvm.org/D24700 llvm-svn: 291975	2017-01-13 23:25:46 +00:00
Sanjay Patel	2d4b456427	[InstCombine] use m_APInt to allow lshr folds for vectors with splat constants llvm-svn: 291972	2017-01-13 23:04:10 +00:00
Sanjay Patel	d511dde2ec	[InstCombine / InstSimplify] add and move tests for lshr transforms; NFC llvm-svn: 291970	2017-01-13 22:54:12 +00:00
Daniel Berlin	c0431fd02d	NewGVN: Move leaders around properly to ensure we have a canonical dominating leader. Fixes PR 31613. Summary: This is a testcase where phi node cycling happens, and because we do not order the leaders by domination or anything similar, the leader keeps changing. Using std::set for the members is too expensive, and we actually don't need them sorted all the time, only at leader changes. We could keep both a set and a vector, and keep them mostly sorted and resort as necessary, or use a set and a fibheap, but all of this seems premature. After running some statistics, we are able to avoid the vast majority of sorting by keeping a "next leader" field. Most congruence classes only have leader changes once or twice during GVN. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28594 llvm-svn: 291968	2017-01-13 22:40:01 +00:00
David Majnemer	bba17390c7	[LoopStrengthReduce] Don't bother rewriting PHIs in catchswitch blocks The catchswitch instruction cannot be split, don't bother trying to rewrite it. This fixes PR31627. llvm-svn: 291966	2017-01-13 22:24:27 +00:00
Sanjay Patel	b22f6c5f26	[InstCombine] use m_APInt to allow shl folds for vectors with splat constants llvm-svn: 291934	2017-01-13 18:39:09 +00:00
Sanjay Patel	bbc1c1e46b	[InstCombine] add tests to show missing transforms for vector shl; NFC llvm-svn: 291926	2017-01-13 18:27:23 +00:00
Sanjay Patel	5178363687	[InstCombine] if the condition of a select may be known via assumes, eliminate the select This is a limited solution for PR31512: https://llvm.org/bugs/show_bug.cgi?id=31512 The motivation is that we will need to increase usage of llvm.assume and/or metadata to solve PR28430: https://llvm.org/bugs/show_bug.cgi?id=28430 ...and this kind of simplification is needed to take advantage of that extra information. The 'not' test case would be handled by: https://reviews.llvm.org/D28485 Differential Revision: https://reviews.llvm.org/D28337 llvm-svn: 291915	2017-01-13 17:02:42 +00:00
Adam Nemet	6117caab58	Move test of lazy BFI with ORE to a generic directory llvm-svn: 291862	2017-01-13 00:16:23 +00:00
Piotr Padlewski	9530883e8c	[Devirtualization] MemDep returns non-local !invariant.group dependencies Summary: Memory Dependence Analysis was limited to return only local dependencies for invariant.group handling. Now it returns NonLocal when it finds it and then by asking getNonLocalPointerDependency we get found dep. Thanks to this we are able to devirtualize loops! void indirect(A &a, int n) { for (int i = 0 ; i < n; i++) a.foo(); } void test(int n) { A a; indirect(a); } After inlining a.foo() will be changed to direct call, even if foo and A::A() is external (but only if vtable definition is be available). Reviewers: nlewycky, dberlin, chandlerc, rsmith Subscribers: mehdi_amini, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D28137 llvm-svn: 291762	2017-01-12 11:33:58 +00:00
Chandler Carruth	b4d9a310d2	Make a test actually test what it set out to test. This test seems to have largely been relying on asserts being tripped. It had a very specific and somewhat uninteresting grep of the output, but it never really did anything to cause SCEV to be preserved across loop simplify, certainly not explicitly. And a later addition to it actually added CHECK lines despite the test never running FileCheck. Now we actually print SCEV before and after loop simplify to make sure it is changing and being updated. Which seems to be much more likely the point of the test. llvm-svn: 291740	2017-01-12 03:49:07 +00:00
Michael Kuperstein	991c2e0e57	Add test that verifies we don't peel loops in optsize functions. NFC. llvm-svn: 291708	2017-01-11 21:42:51 +00:00
Peter Collingbourne	7636532c1b	LowerTypeTests: Represent the memory region size with the constant size-1. This means that we can use a shorter instruction sequence in the case where the size is a power of two and on the boundary between two representations. Differential Revision: https://reviews.llvm.org/D28421 llvm-svn: 291706	2017-01-11 21:32:10 +00:00
Peter Collingbourne	6bca5a0d82	Re-apply r291205, "LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.", with a fix for an off-by-one error. llvm-svn: 291699	2017-01-11 20:28:46 +00:00
Daniel Berlin	f6eba4be2c	NewGVN: Fix PR31594, by tracking the store count of congruence classes, and updating checking to allow for equivalence through reachability. (Sadly, the checking here is not perfect, and can't be made perfect, so we'll have to disable it after we are satisfied with correctness. Right now it is just "very unlikely" to happen.) llvm-svn: 291698	2017-01-11 20:22:36 +00:00
Rong Xu	20f5df1d70	Resubmit "[PGO] Turn off comdat renaming in IR PGO by default" This patch resubmits the changes in r291588. llvm-svn: 291696	2017-01-11 20:19:41 +00:00
Michael Kuperstein	f69e64662b	[SLP] Remove bogus assert. The removed assert seems bogus - it's perfectly legal for the roots of the vectorized subtrees to be equal even if the original scalar values aren't, if the original scalars happen to be equivalent. This fixes PR31599. Differential Revision: https://reviews.llvm.org/D28539 llvm-svn: 291692	2017-01-11 19:23:57 +00:00
Ivan Krasin	42e6b4fd98	Revert rL291205 because it breaks Chrome tests under CFI. Summary: Revert LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase. This change separates how type identifiers are resolved from how intrinsic calls are lowered. All information required to lower an intrinsic call is stored in a new TypeIdLowering data structure. The idea is that this data structure can either be initialized using the module itself during regular LTO, or using the module summary in ThinLTO backends. Original URL: https://reviews.llvm.org/D28341 Reviewers: pcc Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D28532 llvm-svn: 291684	2017-01-11 16:54:04 +00:00
Mohammed Agabaria	81d0f17055	[X86] fixing failed test in commit: r291657 Missing Requires asserts. llvm-svn: 291659	2017-01-11 09:03:11 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Adam Nemet	e2aaf3a35e	[LICM] Report failing to hoist conditionally-executed loads These are interesting again because the user may not be aware that this is a common reason preventing LICM. A const is removed from an instruction pointer declaration in order to pass it to ORE. Differential Revision: https://reviews.llvm.org/D27940 llvm-svn: 291649	2017-01-11 04:39:49 +00:00
Adam Nemet	81941b3195	[LICM] Report failing to hoist a load with an invariant address These are interesting because lack of precision in alias information could be standing in the way of this optimization. An example is the case in the test suite that I showed in the DevMeeting talk: http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/MultiSource/Benchmarks/FreeBench/distray/CMakeFiles/distray.dir/html/_org_test-suite_MultiSource_Benchmarks_FreeBench_distray_distray.c.html#L236 canSinkOrHoistInst is also used from LoopSink, which does not use opt-remarks so we need to take ORE as an optional argument. Differential Revision: https://reviews.llvm.org/D27939 llvm-svn: 291648	2017-01-11 04:39:45 +00:00
Adam Nemet	358433ce1b	[LICM] Report successful hoist/sink/promotion Differential Revision: https://reviews.llvm.org/D27938 llvm-svn: 291646	2017-01-11 04:39:35 +00:00
Matt Arsenault	1e0edbf03c	InstSimplify: Eliminate fabs on known positive llvm-svn: 291624	2017-01-11 00:33:24 +00:00
Rong Xu	acd6360251	Revert "[PGO] Turn off comdat renaming in IR PGO by default" This patch reverts r291588: [PGO] Turn off comdat renaming in IR PGO by default, as we are seeing some hash mismatches in our internal tests. llvm-svn: 291621	2017-01-10 23:54:31 +00:00
Matt Arsenault	fdb78f8bae	InstCombine: fdiv -x, -y -> fdiv x, y llvm-svn: 291611	2017-01-10 23:08:54 +00:00
Michael Kuperstein	ee31cbe35f	[LV] Don't panic when encountering the IV of an outer loop. Bail out instead of asserting when we encounter this situation, which can actually happen. The reason the test uses the new PM is that the "bad" phi, incidentally, gets cleaned up by LoopSimplify. But LICM can create this kind of phi and preserve loop simplify form, so the cleanup has no chance to run. This fixes PR31190. We may want to solve this in a less conservative manner, since this phi is actually uniform within the inner loop (or we may want LICM to output a cleaner promotion to begin with). Differential Revision: https://reviews.llvm.org/D28490 llvm-svn: 291589	2017-01-10 19:32:30 +00:00
Rong Xu	ef1adad938	[PGO] Turn off comdat renaming in IR PGO by default Summary: In IR PGO we append the function hash to comdat functions to avoid the potential hash mismatch. This turns out not legal in some cases: if the comdat function is address-taken and used in comparison. Renaming changes the semantic. This patch turns off comdat renaming by default. To alleviate the hash mismatch issue, we now rename the profile variable for comdat functions. Profile allows co-existing multiple versions of profiles with different hash value. The inlined copy will always has the correct profile counter. The out-of-line copy might not have the correct count. But we will not have the bogus mismatch warning. Reviewers: davidxl Subscribers: llvm-commits, xur Differential Revision: https://reviews.llvm.org/D28416 llvm-svn: 291588	2017-01-10 19:30:20 +00:00
Davide Italiano	f8711f093e	[SimplifyLibCalls] Propagate fast math flags while optimizing pow(). llvm-svn: 291577	2017-01-10 18:02:05 +00:00
Serge Pavlov	0668cd2c95	[StructurizeCfg] Update dominator info. In some cases StructurizeCfg updates root node, but dominator info remains unchanges, it causes crash when expensive checks are enabled. To cope with this problem a new method was added to DominatorTreeBase that allows adding new root nodes, it is called in StructurizeCfg to put dominator tree in sync. This change fixes PR27488. Differential Revision: https://reviews.llvm.org/D28114 llvm-svn: 291530	2017-01-10 02:50:47 +00:00
Davide Italiano	472684eaf5	[SimplifyLibCalls] pow(x, -0.5) -> 1.0 / sqrt(x). Differential Revision: https://reviews.llvm.org/D28479 llvm-svn: 291486	2017-01-09 21:55:23 +00:00
Sanjay Patel	8f4910e26a	[InstCombine] add test to show missed fold using llvm.assume; NFC llvm-svn: 291472	2017-01-09 20:18:30 +00:00
Sanjay Patel	eaa143c98c	[InstCombine] regenerate checks; NFC llvm-svn: 291469	2017-01-09 19:43:26 +00:00
Sanjay Patel	87495eb8ef	[InstCombine] regenerate checks; NFC llvm-svn: 291464	2017-01-09 19:18:46 +00:00
Sanjay Patel	ced8fdd42a	[InstCombine] remove unnecessary attribute comments from test files; NFC llvm-svn: 291463	2017-01-09 19:13:38 +00:00
Matthew Simpson	cf796478e9	[LV] Fix-up external IV users after updating dominator tree This patch delays the fix-up step for external induction variable users until after the dominator tree has been properly updated. This should fix PR30742. The SCEVExpander in InductionDescriptor::transform can generate code in the wrong location if the dominator tree is not up-to-date. We should work towards keeping the dominator tree up-to-date throughout the transformation. Reference: https://llvm.org/bugs/show_bug.cgi?id=30742 Differential Revision: https://reviews.llvm.org/D28168 llvm-svn: 291462	2017-01-09 19:05:29 +00:00
Xin Tong	c13a8e84d1	Intrinsic::Bitreverse is safe to speculate Summary: Intrinsic::Bitreverse is safe to speculate Reviewers: hfinkel, mkuper, arsenm, jmolloy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28471 llvm-svn: 291456	2017-01-09 17:57:08 +00:00
Daniel Berlin	b755aea8eb	NewGVN: Fix PR 31573, a failure to verify memory congruency due to not excluding ourselves when checking if any equivalent stores exist. llvm-svn: 291421	2017-01-09 05:34:29 +00:00
Piotr Padlewski	09ad678bc4	[MemDep] NFC walk invariant.group graph only down Summary: By using stripPointerCasts we can get to the root value and then walk down the bitcast graph Reviewers: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28181 llvm-svn: 291405	2017-01-08 22:26:06 +00:00
Matt Arsenault	a7d2194168	SimplifyLibCalls: Remove incorrect optimization of fabs fabs(x * x) is not generally safe to assume x is positive if x is a NaN. This is also less general than it could be, so this will be replaced with a transformation on the intrinsic. llvm-svn: 291359	2017-01-07 19:55:12 +00:00
Daniel Berlin	32f8d560dd	NewGVN: Make sure we properly lookup operand leaders while creating congruence classes for stores, and then keep them up to date. Add testcases. llvm-svn: 291351	2017-01-07 16:55:14 +00:00
Daniel Berlin	d92e7f9f74	NewGVN: Fix PR 31501. Summary: LLVM's non-standard notion of phi nodes means we can't both try to substitute for undef in phi nodes and use phi nodes as leaders all the time. This changes NewGVN to use the same semantics as SimplifyPHINode to decide which phi nodes are equivalent. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28312 llvm-svn: 291308	2017-01-07 00:01:42 +00:00
David Majnemer	63da0c238b	[InstSimplify] Optimize away udivs in the presence of range metadata We know that udiv %V, C can be optimized away to 0 if %V is ult C. llvm-svn: 291296	2017-01-06 22:58:02 +00:00
David Majnemer	8c0e62f507	[InstSimplify] Optimize away urems in the presence of range metadata We know that urem %V, C can be optimized away to %V if %V is ult C. llvm-svn: 291282	2017-01-06 21:23:51 +00:00
Mehdi Amini	27d224fbbb	Fix LoopLoadElimination to keep original alignment on the inital hoisted store This is fixing a bug where Loop Vectorization is widening a load but with a lower alignment. Hoisting the load without propagating the alignment will allow inst-combine to later deduce a higher alignment that what the pointer actually is. Differential Revision: https://reviews.llvm.org/D28408 llvm-svn: 291281	2017-01-06 21:06:51 +00:00
Sanjay Patel	2715d92389	[InstCombine] add a vector version of a test added in r291262; NFC llvm-svn: 291265	2017-01-06 19:14:05 +00:00
Sanjay Patel	8d4aa10960	[InstCombine] move and add tests for icmp + shl nsw; NFC As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2017-January/108749.html ...we should be able to better optimize this pattern. llvm-svn: 291262	2017-01-06 18:57:54 +00:00
Peter Collingbourne	81271b7bd2	LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase. This change separates how type identifiers are resolved from how intrinsic calls are lowered. All information required to lower an intrinsic call is stored in a new TypeIdLowering data structure. The idea is that this data structure can either be initialized using the module itself during regular LTO, or using the module summary in ThinLTO backends. Differential Revision: https://reviews.llvm.org/D28341 llvm-svn: 291205	2017-01-06 02:22:47 +00:00
Michael Kuperstein	c9acad12e9	[LICM] Allow promotion of some stores that are not guaranteed to execute. Promotion is always legal when a store within the loop is guaranteed to execute. However, this is not a necessary condition - for promotion to be memory model semantics-preserving, it is enough to have a store that dominates every exit block. This is because if the store dominates every exit block, the fact the exit block was executed implies the original store was executed as well. Differential Revision: https://reviews.llvm.org/D28147 llvm-svn: 291171	2017-01-05 20:42:06 +00:00
Mohammed Agabaria	23599ba794	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Peter Collingbourne	b2ce2b6805	IR: Module summary representation for type identifiers; summary test scaffolding for lowertypetests. Set up basic YAML I/O support for module summaries, plumb the summary into the pass and add a few command line flags to test YAML I/O support. Bitcode support to come separately, as will the code in LowerTypeTests that actually uses the summary. Also add a couple of tests that pass by virtue of the pass doing nothing with the summary (which happens to be the correct thing to do for those tests). Differential Revision: https://reviews.llvm.org/D28041 llvm-svn: 291069	2017-01-05 03:39:00 +00:00
Sanjay Patel	95faecb766	[InstSimplify] add tests to show missing select simplifications; NFC llvm-svn: 291043	2017-01-05 00:40:52 +00:00
Wolfgang Pieb	ce13e716c5	[DWARF] Null out the debug locs of load instructions that have been moved by GVN performing partial redundancy elimination (PRE). Not doing so can cause jumpy line tables and confusing (though correct) source attributions. Differential Revision: https://reviews.llvm.org/D27857 llvm-svn: 291037	2017-01-04 23:58:26 +00:00
Matt Arsenault	3bdd75d01e	InstCombine: Fold cos(-x) -> cos(x) Also cos(fabs(x)) -> cos(x) llvm-svn: 291022	2017-01-04 22:49:03 +00:00
Michael Kuperstein	f381f35977	Add positive test for sqrt "partial inlining". NFC. llvm-svn: 291009	2017-01-04 21:24:56 +00:00
Michael Kuperstein	020af9c258	Remove accidentally target-dependent test and pacify bots. llvm-svn: 291004	2017-01-04 21:08:53 +00:00
Michael Kuperstein	fc74da13a9	Add positive test for sqrt "partial inlining". NFC. llvm-svn: 291001	2017-01-04 20:48:30 +00:00
Simon Pilgrim	6cfb5caf05	Revert r290970 [SLPVectorizer] Regenerate test. The check script will use var names before they are declared, which filecheck doesn't like. llvm-svn: 290971	2017-01-04 16:12:07 +00:00
Simon Pilgrim	4629b46bba	[SLPVectorizer] Regenerate test. Missed var name llvm-svn: 290970	2017-01-04 16:01:55 +00:00
Simon Pilgrim	1d5b0377af	Regenerate test. llvm-svn: 290969	2017-01-04 15:52:41 +00:00
Chandler Carruth	96809ae7ea	[Inliner] Fix a test where I typo'ed 'CHECK' as 'CHCEK' when converting to FileCheck. Fortunately, it passes. =] Spotted in review by Bob Wilson! llvm-svn: 290953	2017-01-04 11:15:01 +00:00
David Majnemer	b5e365c970	[InstCombine] Add a test for r290733 llvm-svn: 290929	2017-01-04 02:21:37 +00:00
David Majnemer	cb892e9066	[InstCombine] Move casts around shift operations It is possible to perform a left shift before zero extending if the shift would only shift out zeros. llvm-svn: 290928	2017-01-04 02:21:34 +00:00
David Majnemer	022d2a563b	[InstCombine] Combine adds across a zext We can perform the following: (add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2)) This is only possible if C2 is negative and C2 is greater than or equal to negative C1. llvm-svn: 290927	2017-01-04 02:21:31 +00:00
Matt Arsenault	56ff4839ae	InstCombine: Fold fabs on select of constants llvm-svn: 290913	2017-01-03 22:40:34 +00:00
Sanjay Patel	ada846aff0	[InstCombine] tighten checks for tests of assume -> metadata transform; NFC llvm-svn: 290903	2017-01-03 19:32:11 +00:00
Xin Tong	883dd1b6c4	Enable disabled loopidiom test. Apparently we handle it now Summary: Enable disabled loopidiom test. Apparently we handle it now. Maybe due to improvements to AA. Reviewers: atrick, danielcdh, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28171 llvm-svn: 290900	2017-01-03 19:08:05 +00:00
Matt Arsenault	b264c94963	InstCombine: Add fma with constant transforms DAGCombine already does these. llvm-svn: 290860	2017-01-03 04:32:35 +00:00
Matt Arsenault	1cc294c85d	InstCombine: Add fma + fabs/fneg transforms fma (fneg x), (fneg y), z -> fma x, y, z fma (fabs x), (fabs x), z -> fma x, x, z llvm-svn: 290859	2017-01-03 04:32:31 +00:00
Xin Tong	2940231ff0	Make sure total loop body weight is preserved in loop peeling Summary: Regardless how the loop body weight is distributed, we should preserve total loop body weight. i.e. we should have same weight reaching the body of the loop or its duplicates in peeled and unpeeled case. Reviewers: mkuper, davidxl, anemet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28179 llvm-svn: 290833	2017-01-02 20:27:23 +00:00
Daniel Berlin	aa0ec1e992	NewGVN: Add a test case for equivalent phis. llvm-svn: 290830	2017-01-02 19:55:13 +00:00
Daniel Berlin	43a5f998df	NewGVN: Add forgotten testcase for PR 31483 llvm-svn: 290829	2017-01-02 19:49:20 +00:00
Sanjay Patel	0e3ae439cf	[InstCombine] add explanatory comment to test; NFC The test was added at r290797, and a patch to enable the transform is proposed in D28204. llvm-svn: 290798	2017-01-01 18:20:49 +00:00
Sanjay Patel	07537c2b6e	[InstCombine] add test to show potential nonnull attribute propagation; NFC This will change with the current draft of: https://reviews.llvm.org/D28204 llvm-svn: 290797	2017-01-01 17:18:00 +00:00
Craig Topper	d00db69227	[InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input. This was already done for the SSE/SSE2 version of the intrinsics. llvm-svn: 290776	2016-12-31 00:45:06 +00:00
Piotr Padlewski	da36215017	[MemDep] Handle gep with zeros for invariant.group Summary: gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it to getelementptr because it make SROA can then handle it. Simple case like void g(A &a) { z(a); if (glob) a.foo(); } void testG() { A a; g(a); } was not devirtualized with -fstrict-vtable-pointers because luck of handling for gep 0 in Memory Dependence Analysis Reviewers: dberlin, nlewycky, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28126 llvm-svn: 290763	2016-12-30 18:45:07 +00:00
Michael Kuperstein	76e06c8858	[LICM] When promoting scalars, allow inserting stores to thread-local allocas. This is similar to the allocfn case - if an alloca is not captured, then it's necessarily thread-local. Differential Revision: https://reviews.llvm.org/D28170 llvm-svn: 290738	2016-12-30 01:03:17 +00:00
Dehao Chen	cc76344ef5	Use continuous boosting factor for complete unroll. Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously. The patch also simplified the code by reducing one parameter in UP. The patch only affects code-gen of two speccpu2006 benchmark: 445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26989 llvm-svn: 290737	2016-12-30 00:50:28 +00:00
Daniel Berlin	e0bd37e78f	NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again. llvm-svn: 290724	2016-12-29 22:15:12 +00:00
Craig Topper	b57a84dace	[InstCombine] Fix some of the AVX-512 scalar arithmetic test cases to do a better job of testing what they intended to test. The accidentally had trivially dead code. Also needed to adjust the rounding mode to not CUR_DIRECTION so the intrinsics don't get converted to native operations before going through SimplifyDemandedVectorElts. llvm-svn: 290702	2016-12-29 02:29:04 +00:00
Chandler Carruth	05ca5acc9e	[PM] Introduce a devirtualization iteration layer for the new PM. This is an orthogonal and separated layer instead of being embedded inside the pass manager. While it adds a small amount of complexity, it is fairly minimal and the composability and control seems worth the cost. The logic for this ends up being nicely isolated and targeted. It should be easy to experiment with different iteration strategies wrapped around the CGSCC bottom-up walk using this kind of facility. The mechanism used to track devirtualization is the simplest one I came up with. I think it handles most of the cases the existing iteration machinery handles, but I haven't done a very in depth analysis. It does however match the basic intended semantics, and we can tweak or tune its exact behavior incrementally as necessary. One thing that we may want to revisit is freshly building the value handle set on each iteration. While I don't think this will be a significant cost (it is strictly fewer value handles but more churn of value handes than the old call graph), it is conceivable that we'll want a somewhat more clever tracking mechanism. My hope is to layer that on as a follow up patch with data supporting any implementation complexity it adds. This code also provides for a basic count heuristic: if the number of indirect calls decreases and the number of direct calls increases for a given function in the SCC, we assume devirtualization is responsible. This matches the heuristics currently used in the legacy pass manager. Differential Revision: https://reviews.llvm.org/D23114 llvm-svn: 290665	2016-12-28 11:07:33 +00:00
Chandler Carruth	443e57e01d	[PM] Teach the CGSCC's CG update utility to more carefully invalidate analyses when we're about to break apart an SCC. We can't wait until after breaking apart the SCC to invalidate things: 1) Which SCC do we then invalidate? All of them? 2) Even if we invalidate all of them, a newly created SCC may not have a proxy that will convey the invalidation to functions! Previously we only invalidated one of the SCCs and too late. This led to stale analyses remaining in the cache. And because the caching strategy actually works, they would get used and chaos would ensue. Doing invalidation early is somewhat pessimizing though if we know that the SCC structure won't change. So it turns out that the design to make the mutation API force the caller to know the kind of mutation in advance was indeed 100% correct and we didn't do enough of it. So this change also splits two cases of switching a call edge to a ref edge into two separate APIs so that callers can clearly test for this and take the easy path without invalidating when appropriate. This is particularly important in this case as we expect most inlines to be between functions in separate SCCs and so the common case is that we don't have to so aggressively invalidate analyses. The LCG API change in turn needed some basic cleanups and better testing in its unittest. No interesting functionality changed there other than more coverage of the returned sequence of SCCs. While this seems like an obvious improvement over the current state, I'd like to revisit the core concept of invalidating within the CG-update layer at all. I'm wondering if we would be better served forcing the callers to handle the invalidation beforehand in the cases that they can handle it. An interesting example is when we want to teach the inliner to update and preserve analyses. But we can cross that bridge when we get there. With this patch, the new pass manager an build all of the LLVM test suite at -O3 and everything passes. =D I haven't bootstrapped yet and I'm sure there are still plenty of bugs, but this gives a nice baseline so I'm going to increasingly focus on fleshing out the missing functionality, especially the bits that are just turned off right now in order to let us establish this baseline. llvm-svn: 290664	2016-12-28 10:34:50 +00:00
Chandler Carruth	9900d18bab	[PM] Teach the inliner's call graph update to handle inserting new edges when they are call edges at the leaf but may (transitively) be reached via ref edges. It turns out there is a simple rule: insert everything as a ref edge which is a safe conservative default. Then we let the existing update logic handle promoting some of those to call edges. Note that it would be fairly cheap to make these call edges right away if that is desirable by testing whether there is some existing call path from the source to the target. It just seemed like slightly more complexity in this code path that isn't strictly necessary. If anyone feels strongly about handling this differently I'm happy to change it. llvm-svn: 290649	2016-12-28 03:13:12 +00:00
Michael Kuperstein	cd7ad7130f	[InstCombine] Canonicalize insert splat sequences into an insert + shuffle This adds a combine that canonicalizes a chain of inserts which broadcasts a value into a single insert + a splat shufflevector. This fixes PR31286. Differential Revision: https://reviews.llvm.org/D27992 llvm-svn: 290641	2016-12-28 00:18:08 +00:00
Bryant Wong	7cb744621b	[MemCpyOpt] Don't sink LoadInst below possible clobber. Differential Revision: https://reviews.llvm.org/D26811 llvm-svn: 290611	2016-12-27 17:58:12 +00:00
Chandler Carruth	625038d5d5	[PM] Turn on the new PM's inliner in addition to the current one for most of the inliner test cases. The inliner involves a bunch of interesting code and tends to be where most of the issues I've seen experimenting with the new PM lie. All of these test cases pass, but I'd like to keep some more thorough coverage here so doing a fairly blanket enabling. There are a handful of interesting tests I've not enabled yet because they're focused on the always inliner, or on functionality that doesn't (yet) exist in the inliner. llvm-svn: 290592	2016-12-27 07:18:43 +00:00
Chandler Carruth	141bf5d14d	[PM] Add one of the features left out of the initial inliner patch: skipping indirectly recursive inline chains. To do this, we implicitly build an inline stack for each callsite and check prior to inlining that doing so would not form a cycle. This uses the exact same technique and even shares some code with the legacy PM inliner. This solution remains deeply unsatisfying to me because it means we cannot actually iterate the inliner externally. Doing so would not be able to easily detect and avoid such cycles. Some day I would very much like to have a solution that works without this internal state to detect cycles, but this is not that day. llvm-svn: 290590	2016-12-27 06:46:20 +00:00
Chandler Carruth	db6ced8484	[PM] Wire up another test to the new pass manager. Nothing really interesting here, but I had to improve the test to use variables rather than hard coding value names as we happen to end up with different value names in the new PM. llvm-svn: 290589	2016-12-27 06:46:16 +00:00
George Burgess IV	ed16024a9b	[Analysis] Ignore `nobuiltin` on `allocsize` function calls. We currently ignore the `allocsize` attribute on functions calls with the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We shouldn't care about `nobuiltin` here: `allocsize` is explicitly added by the user, not inferred based on a function's symbol. llvm-svn: 290588	2016-12-27 06:32:14 +00:00
Craig Topper	72f2d4e8d6	[InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use. This builds on r290554 which added supported for 128 and 256-bit. llvm-svn: 290582	2016-12-27 05:30:09 +00:00
Chandler Carruth	03130d981c	[PM] Teach the inliner in the new PM to merge attributes after inlining. Also enable the new PM in the attributes test case which caught this issue. llvm-svn: 290572	2016-12-27 03:39:54 +00:00
Chandler Carruth	62c8b81ea8	[Inliner] Modernize all of the inliner tests that were using grep. This mostly involved converting from grep to FileCheck and tidying up the IR used. In one case (invoke_test-3.ll) the test had become completely pointless as we use 'resume' rather than 'unwind' now, and even then it did not occur at the end of the line. llvm-svn: 290570	2016-12-27 02:47:37 +00:00
Craig Topper	7f8540b5e7	[AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed. llvm-svn: 290566	2016-12-27 01:56:30 +00:00
Craig Topper	a0439377e6	[InstCombine][AVX-512] Add masked scalar add/sub/mul/div intrinsic test cases that don't have a CUR_DIRECTION rounding mode. The CUR_DIRECTION case will be optimized in a future commit so this provides coverage for the other cases. llvm-svn: 290565	2016-12-27 01:56:27 +00:00
Chandler Carruth	0ee8bb11c3	[PM] Move the collection of call sites to a more appropriate place inside of `InlineFunction`. Prior to this, call instructions are specifically being rewritten and replaced within the inlined region, invalidating some of the call sites. Several of these regions are using the same technique to walk the inlined region so this seems clearly safe up to this point. I've also added a short circuit to the scan for call sites based on what other code is doing. With this, the most common crash I've found in the new inliner code is fixed. I've turned it on for another test case that covers this scenario. I'll make my way through most of the other inliner test cases just to get some easy coverage next. llvm-svn: 290562	2016-12-27 01:24:50 +00:00
Craig Topper	020b228155	[AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. llvm-svn: 290559	2016-12-27 00:23:16 +00:00
Chandler Carruth	6e9bb7e064	[PM] Teach the always inliner in the new pass manager to support removing fully-dead comdats without removing dead entries in comdats with live members. This factors the core logic out of the current inliner's internals to a reusable utility and leverages that in both places. The factored out code should also be (minorly) more efficient in cases where we have very few dead functions or dead comdats to consider. I've added a test case to cover this behavior of the always inliner. This is the last significant bug in the new PM's always inliner I've found (so far). llvm-svn: 290557	2016-12-26 23:43:27 +00:00
Simon Pilgrim	c9cf7fc7a4	[InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use. Differential Revision: https://reviews.llvm.org/D28119 llvm-svn: 290554	2016-12-26 23:28:17 +00:00
Daniel Berlin	d59e8010c5	Don't use our own incorrect version of isTriviallyDeadInstruction in NewGVN. Fixes PR/31472 llvm-svn: 290549	2016-12-26 18:44:36 +00:00
Davide Italiano	8ea5e4fcae	[NewGVN] Change test to reflect difference between GVN and NewGVN. The current GVN algorithm folds unconditional branches to, it claims, expose more PRE oportunities. The folding, if really needed, (which is not sure, as it's not really proved it improves analysis) can be done by an earlier cleanup pass instead of GVN itself. Ack'ed/SGTM'd by Daniel Berlin. Differential Revision: https://reviews.llvm.org/D28117 llvm-svn: 290546	2016-12-26 18:10:09 +00:00
Bryant Wong	b5e03b61e2	[InstCombiner] Simplify lib calls to `round{,f}` Differential Revision: https://reviews.llvm.org/D28110 llvm-svn: 290542	2016-12-26 14:29:29 +00:00
Chandler Carruth	80db76d556	Test the different scenarios of GlobalDCE and comdats more systematically and document in the test what all is going on. This replaces the PR-named test that was the only coverage for GlobalDCE and comdats previously. I wrote this because I wasn't certain how comdat DCE was supposed to work and wanted to step through what GlobalDCE did to fully understand it. After talking to folks and reading the code and really staring at things it all makes sense but it seemed good to help write down some of this in a more explicit and fully covering test case. For example, it seemed like a bug that GlobalDCE didn't consider comdat participation of ifuncs. Specifically it seemed like an accident because testing didn't really cover that case. But in fact, ifuncs specifically cannot participate in a comdat despite having that API. The new test case covers this and explicitly documents that DCE gets to fire here even though there are comdats involved. Also, we didn't have any positive tests for the challenging cases such as usage cycles between comdat participants that might make them seem alive except that there is no external edge into the cycle. llvm-svn: 290537	2016-12-26 08:54:01 +00:00
Craig Topper	7b788ada2d	[AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. Summary: I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon. I'll do the same thing for packed add/sub/mul/div in a future patch. Reviewers: delena, RKSimon, zvi, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27879 llvm-svn: 290535	2016-12-26 06:33:19 +00:00
Craig Topper	e328045711	[AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions Summary: This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants. We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that. Reviewers: zvi, delena, spatel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27825 llvm-svn: 290530	2016-12-25 23:58:57 +00:00
Bryant Wong	a07d9b1460	[AliasAnalysis] Teach BasicAA about memcpy. Differential Revision: https://reviews.llvm.org/D27034 llvm-svn: 290526	2016-12-25 22:42:27 +00:00
Daniel Berlin	d7c12ee54c	Value number stores and memory states so we can detect when memory states are equivalent (IE store of same value to memory). Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28084 llvm-svn: 290525	2016-12-25 22:23:49 +00:00
Simon Pilgrim	3265d951b6	[InstCombine][X86] Add tests showing missed opportunities to simplify PMULUDQ/PMULDQ inputs. PMULUDQ/PMULDQ - only the even elements (0, 2, 4, 6) of the vXi32 inputs are required. llvm-svn: 290502	2016-12-24 17:30:19 +00:00
Chandler Carruth	cdfdd4330a	[PM] Remove a bunch of junk that snuck in when I failed at manipulating my editor to close and commit the patch. Sorry for the noise. llvm-svn: 290460	2016-12-23 23:39:31 +00:00
Chandler Carruth	4eaff12ba2	[PM] Teach the always inlining test case to be much more strict about whether functions are removed, and fix the new PM's always inliner to actually pass this test. Without this, the new PM's always inliner leaves all the functions kicking around which won't work out very well given the semantics of always inline. Doing this really highlights how frustrating the current alwaysinline semantic contract is though -- why can we put it on external functions, etc? Also I've added a number of tricky and interesting test cases for removing functions with the always inliner. There is one remaining case not handled -- fully removing comdats -- and I've left a FIXME about this. llvm-svn: 290457	2016-12-23 23:33:35 +00:00
Chandler Carruth	f32f63f222	[PM] Clean up test case and comments a bit. NFC. llvm-svn: 290456	2016-12-23 23:33:32 +00:00
Davide Italiano	34f94384a5	[LICM] Work around LICM needs to maintain state across loops. The pass creates some state which expects to be cleaned up by a later instance of the same pass. opt-bisect happens to expose this not ideal design because calling skipLoop() will result in this state not being cleaned up at times and an assertion firing in `doFinalization()`. Chandler tells me the new pass manager will give us options to avoid these design traps, but until it's not ready, we need a workaround for the current pass infrastructure. Fix provided by Andy Kaylor, see the review for a complete discussion. Differential Revision: https://reviews.llvm.org/D25848 llvm-svn: 290427	2016-12-23 13:12:50 +00:00
Chandler Carruth	eb119ece4a	Fix some DOS-style line endings that I suspect snuck in from one of the frustrating Subversion clients that fails to do line ending translation of text files. llvm-svn: 290404	2016-12-23 02:02:26 +00:00
Evgeniy Stepanov	27d4c9b71b	[cfi] Emit jump tables as a function-level inline asm. Use a dummy private function with inline asm calls instead of module level asm blocks for CFI jumptables. The main advantage is that now jumptable codegen can be affected by the function attributes (like target_cpu on ARM). Module level asm gets the default subtarget based on the target triple, which is often not good enough. This change also uses asm constraints/arguments to reference jumptable targets and aliases directly. We no longer do asm name mangling in an IR pass. Differential Revision: https://reviews.llvm.org/D28012 llvm-svn: 290384	2016-12-22 22:22:35 +00:00
Davide Italiano	e05e3306a3	[NewGVN] Add the pass to PassRegistry.def. We need to hook up here to get it working with the new PM. Add a test while here (and remove a typo). llvm-svn: 290350	2016-12-22 16:35:02 +00:00
Davide Italiano	7e274e02ae	[GVN] Initial check-in of a new global value numbering algorithm. The code have been developed by Daniel Berlin over the years, and the new implementation goal is that of addressing shortcomings of the current GVN infrastructure, i.e. long compile time for large testcases, lack of phi predication, no load/store value numbering etc... The current code just implements the "core" GVN algorithm, although other pieces (load coercion, phi handling, predicate system) are already implemented in a branch out of tree. Once the core is stable, we'll start adding pieces on top of the base framework. The test currently living in test/Transform/NewGVN are a copy of the ones in GVN, with proper `XFAIL` (missing features in NewGVN). A flag will be added in a future commit to enable NewGVN, so that interested parties can exercise this code easily. Differential Revision: https://reviews.llvm.org/D26224 llvm-svn: 290346	2016-12-22 16:03:48 +00:00
Adrian Prantl	1eadba1c8c	Renumber testcase metadata nodes after r290153. This patch renumbers the metadata nodes in debug info testcases after https://reviews.llvm.org/D26769. This is a separate patch because it causes so much churn. This was implemented with a python script that pipes the testcases through llvm-as - \| llvm-dis - and then goes through the original and new output side-by side to insert all comments at a close-enough location. Differential Revision: https://reviews.llvm.org/D27765 llvm-svn: 290292	2016-12-22 00:45:21 +00:00
Adrian Prantl	ec9ebba778	Legalize metadata in legacy testcases llvm-svn: 290288	2016-12-21 23:38:17 +00:00
Adrian Prantl	762e4b72c6	Legalize metadata in legacy testcases llvm-svn: 290287	2016-12-21 23:36:06 +00:00
Adrian Prantl	aad5df484c	Legalize metadata in legacy testcases llvm-svn: 290286	2016-12-21 23:30:35 +00:00
David Majnemer	b0761a0c1b	Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp" This reverts commit r289813, it caused PR31449. llvm-svn: 290266	2016-12-21 19:21:59 +00:00
Adam Nemet	32e6a34c02	[LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute In r267672, where the loop distribution pragma was introduced, I tried it hard to keep the old behavior for opt: when opt is invoked with -loop-distribute, it should distribute the loop (it's off by default when ran via the optimization pipeline). As MichaelZ has discovered this has the unintended consequence of breaking a very common developer work-flow to reproduce compilations using opt: First you print the pass pipeline of clang with -debug-pass=Arguments and then invoking opt with the returned arguments. clang -debug-pass will include -loop-distribute but the pass is invoked with default=off so nothing happens unless the loop carries the pragma. While through opt (default=on) we will try to distribute all loops. This changes opt's default to off as well to match clang. The tests are modified to explicitly enable the transformation. llvm-svn: 290235	2016-12-21 04:07:40 +00:00
George Burgess IV	3f08914e7e	[Analysis] Centralize objectsize lowering logic. We're currently doing nearly the same thing for @llvm.objectsize in three different places: two of them are missing checks for overflow, and one of them could subtly break if InstCombine gets much smarter about removing alloc sites. Seems like a good idea to not do that. llvm-svn: 290214	2016-12-20 23:46:36 +00:00
Chandler Carruth	1d96311447	[PM] Provide an initial, minimal port of the inliner to the new pass manager. This doesn't implement every feature of the existing inliner, but tries to implement the most important ones for building a functional optimization pipeline and beginning to sort out bugs, regressions, and other problems. Notable, but intentional omissions: - No alloca merging support. Why? Because it isn't clear we want to do this at all. Active discussion and investigation is going on to remove it, so for simplicity I omitted it. - No support for trying to iterate on "internally" devirtualized calls. Why? Because it adds what I suspect is inappropriate coupling for little or no benefit. We will have an outer iteration system that tracks devirtualization including that from function passes and iterates already. We should improve that rather than approximate it here. - Optimization remarks. Why? Purely to make the patch smaller, no other reason at all. The last one I'll probably work on almost immediately. But I wanted to skip it in the initial patch to try to focus the change as much as possible as there is already a lot of code moving around and both of these could be skipped without really disrupting the core logic. A summary of the different things happening here: 1) Adding the usual new PM class and rigging. 2) Fixing minor underlying assumptions in the inline cost analysis or inline logic that don't generally hold in the new PM world. 3) Adding the core pass logic which is in essence a loop over the calls in the nodes in the call graph. This is a bit duplicated from the old inliner, but only a handful of lines could realistically be shared. (I tried at first, and it really didn't help anything.) All told, this is only about 100 lines of code, and most of that is the mechanics of wiring up analyses from the new PM world. 4) Updating the LazyCallGraph (in the new PM) based on the newly inlined calls and references. This is very minimal because we cannot form cycles. 5) When inlining removes the last use of a function, eagerly nuking the body of the function so that any "one use remaining" inline cost heuristics are immediately refined, and queuing these functions to be completely deleted once inlining is complete and the call graph updated to reflect that they have become dead. 6) After all the inlining for a particular function, updating the LazyCallGraph and the CGSCC pass manager to reflect the function-local simplifications that are done immediately and internally by the inline utilties. These are the exact same fundamental set of CG updates done by arbitrary function passes. 7) Adding a bunch of test cases to specifically target CGSCC and other subtle aspects in the new PM world. Many thanks to the careful review from Easwaran and Sanjoy and others! Differential Revision: https://reviews.llvm.org/D24226 llvm-svn: 290161	2016-12-20 03:15:32 +00:00
Adrian Prantl	bceaaa9643	[IR] Remove the DIExpression field from DIGlobalVariable. This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades and a change to the Bitcode record for DIGlobalVariable, that makes upgrading the old format unambiguous also for variables without DIExpressions. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 290153	2016-12-20 02:09:43 +00:00
Sanjay Patel	5a443ac000	[InstCombine] use commutative matcher for pattern with commutative operators This is a case that was missed in: https://reviews.llvm.org/rL290067 ...and it would regress if we fix operand complexity (PR28296). llvm-svn: 290127	2016-12-19 18:35:37 +00:00
Sanjay Patel	dd46b52942	[InstCombine] add folds for icmp (umin\|umax X, Y), X This is a follow-up to: https://reviews.llvm.org/rL289855 (https://reviews.llvm.org/D27531) https://reviews.llvm.org/rL290111 llvm-svn: 290118	2016-12-19 17:32:37 +00:00
Florian Hahn	2e03213f90	[LoopVersioning] Require loop-simplify form for loop versioning. Summary: Requiring loop-simplify form for loop versioning ensures that the runtime check block always dominates the exit block. This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958). Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D27469 llvm-svn: 290116	2016-12-19 17:13:37 +00:00
Sanjay Patel	8296c6c96f	[InstCombine] add folds for icmp (smax X, Y), X This is a follow-up to: https://reviews.llvm.org/rL289855 (D27531) llvm-svn: 290111	2016-12-19 16:28:53 +00:00
Daniel Jasper	aec2fa352f	Revert @llvm.assume with operator bundles (r289755-r289757) This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086	2016-12-19 08:22:17 +00:00
Sanjay Patel	2b9d4b4daf	[InstCombine] use commutative matchers for patterns with commutative operators Background/motivation - I was circling back around to: https://llvm.org/bugs/show_bug.cgi?id=28296 I made a simple patch for that and noticed some regressions, so added test cases for those with rL281055, and this is hopefully the minimal fix for just those cases. But as you can see from the surrounding untouched folds, we are missing commuted patterns all over the place, and of course there are no regression tests to cover any of those cases. We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but then we still wouldn't have test coverage, and we'd still miss some fraction of commuted patterns because they require adjustments to the match order. I'm aware of the concern about the potential compile-time performance impact of adding matches like this (currently being discussed on llvm-dev), but I don't think there's any evidence yet to suggest that handling commutative pattern matching more thoroughly is not a worthwhile goal of InstCombine. Differential Revision: https://reviews.llvm.org/D24419 llvm-svn: 290067	2016-12-18 18:49:48 +00:00
Evgeniy Stepanov	95294127d0	Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline." This reverts r289696, which caused TSan perf regression. See PR31382. llvm-svn: 290030	2016-12-17 01:53:15 +00:00
Michael Kuperstein	3ca147ea3d	Preserve loop metadata when folding branches to a common destination. Differential Revision: https://reviews.llvm.org/D27830 llvm-svn: 289992	2016-12-16 21:23:59 +00:00
Jun Bum Lim	90b6b5074a	[CodeGenPrep] Skip merging empty case blocks This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988	2016-12-16 20:38:39 +00:00
Adrian Prantl	73ec065604	Revert "[IR] Remove the DIExpression field from DIGlobalVariable." This reverts commit 289920 (again). I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable has not DIExpression. Unfortunately it is not possible to safely upgrade these variables without adding a flag to the bitcode record indicating which version they are. My plan of record is to roll the planned follow-up patch that adds a unit: field to DIGlobalVariable into this patch before recomitting. This way we only need one Bitcode upgrade for both changes (with a version flag in the bitcode record to safely distinguish the record formats). Sorry for the churn! llvm-svn: 289982	2016-12-16 19:39:01 +00:00
Matthew Simpson	a4964f291a	Reapply "[LV] Enable vectorization of loops with conditional stores by default" This patch reapplies r289863. The original patch was reverted because it exposed a bug causing the loop vectorizer to crash in the Python runtime on PPC. The underlying issue was fixed with r289958. llvm-svn: 289975	2016-12-16 19:12:02 +00:00
Sanjoy Das	089c699743	Fix CodeGenPrepare::stripInvariantGroupMetadata `dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs. The only reason it worked at all is that `getMetadataID` returns something unrelated -- it returns the subclass ID of the receiver (which is used in `dyn_cast` etc.). That does not numerically match `LLVMContext::MD_invariant_group` and ends up dropping `invariant_group` along with every other metadata that does not numerically match `LLVMContext::MD_invariant_group`. llvm-svn: 289973	2016-12-16 18:52:33 +00:00
Jun Bum Lim	f9416af191	Revert "[CodeGenPrep] Skip merging empty case blocks" This reverts commit r289951. llvm-svn: 289960	2016-12-16 17:06:14 +00:00
Sanjay Patel	2d82aa7af7	[InstCombine] auto-generate checks; NFC llvm-svn: 289959	2016-12-16 16:58:54 +00:00
Matthew Simpson	099af810de	[LV] Don't attempt to type-shrink scalarized instructions After r288909, instructions feeding predicated instructions may be scalarized if profitable. Since these instructions will remain scalar, we shouldn't attempt to type-shrink them. We should only truncate vector types to their minimal bit widths. This bug was exposed by enabling the vectorization of loops containing conditional stores by default. llvm-svn: 289958	2016-12-16 16:52:35 +00:00
Jun Bum Lim	85347dde27	[CodeGenPrep] Skip merging empty case blocks This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951	2016-12-16 16:03:31 +00:00
Chandler Carruth	48b4e614d8	Revert r289863: [LV] Enable vectorization of loops with conditional stores by default This uncovers a crasher in the loop vectorizer on PPC when building the Python runtime. I'll send the testcase to the review thread for the original commit. llvm-svn: 289934	2016-12-16 11:31:39 +00:00
Adrian Prantl	74a835cda0	[IR] Remove the DIExpression field from DIGlobalVariable. This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289920	2016-12-16 04:25:54 +00:00
Adrian Prantl	03c6d31a3b	Revert "[IR] Remove the DIExpression field from DIGlobalVariable." This reverts commit 289902 while investigating bot berakage. llvm-svn: 289906	2016-12-16 01:00:30 +00:00
Adrian Prantl	ce13935776	[IR] Remove the DIExpression field from DIGlobalVariable. This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289902	2016-12-16 00:36:43 +00:00
Peter Collingbourne	1398a32e28	IPO: Introduce ThinLTOBitcodeWriter pass. This pass prepares a module containing type metadata for ThinLTO by splitting it into regular and thin LTO parts if possible, and writing both parts to a multi-module bitcode file. Modules that do not contain type metadata are written unmodified as a single module. All globals with type metadata are added to the regular LTO module, and the rest are added to the thin LTO module. Differential Revision: https://reviews.llvm.org/D27324 llvm-svn: 289899	2016-12-16 00:26:30 +00:00
Davide Italiano	67e979e086	[SimplifyLibCalls] Add a test to make sure we lower fls(0) correctly. llvm-svn: 289895	2016-12-15 23:48:07 +00:00
Davide Italiano	85ad36b0e0	[SimplifyLibCalls] Lower fls() to llvm.ctlz(). Differential Revision: https://reviews.llvm.org/D14590 llvm-svn: 289894	2016-12-15 23:45:11 +00:00
Matthew Simpson	6a98bcfe33	[LV] Enable vectorization of loops with conditional stores by default This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863	2016-12-15 20:11:05 +00:00
Sanjay Patel	d640641a61	[InstCombine] add folds for icmp (smin X, Y), X Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. This patch won't solve the example that was attached to that thread, so something else still needs fixing. The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that we want to fold to already exists, but sometimes it's the swapped form of what we want. Corresponding changes for smax/umin/umax to follow. Differential Revision: https://reviews.llvm.org/D27531 llvm-svn: 289855	2016-12-15 19:13:37 +00:00
Teresa Johnson	1b859a2306	[ThinLTO] Ensure callees get hot threshold when first seen on cold path This is split out from D27696, since it turned out to be a bug fix and not part of the NFC efficiency change. Keep the same adjusted (possibly decayed) threshold in both the worklist and the ImportList. Otherwise if we encountered it first along a cold path, the callee would be added to the worklist with a lower decayed threshold than when it is later encountered along a hot path. But the logic uses the threshold recorded in the ImportList entry to check if we should re-add it, and without this patch the threshold recorded there is the same along both paths so we don't re-add it. Using the same possibly decayed threshold in the ImportList ensures we re-add it later with the higher non-decayed hot path threshold. llvm-svn: 289843	2016-12-15 18:21:01 +00:00
Alexey Bataev	4160264e30	[TEST] Initial commit of tests for minmax horizontal reductions. llvm-svn: 289817	2016-12-15 13:21:29 +00:00
Ehsan Amiri	795b0671c5	[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp A number of new patterns for simplifying and/xor of icmp: (icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true: 1- (%x = and %a, %mask) and (%y = and %b, %mask) 2- %mask is a power of 2. (icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true: 1- (%x = and %a, %mask1) and (%y = and %b, %mask2) 2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t. For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8 violates condition (2) above. So this optimization cannot be applied. llvm-svn: 289813	2016-12-15 12:25:13 +00:00
Craig Topper	ab5f355d8c	[AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts. llvm-svn: 289759	2016-12-15 03:49:45 +00:00
Hal Finkel	3ca4a6bcf1	Remove the AssumptionCache After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756	2016-12-15 03:02:15 +00:00
Hal Finkel	cb9f78e1c3	Make processing @llvm.assume more efficient by using operand bundles There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755	2016-12-15 02:53:42 +00:00
Dehao Chen	40dd8c5109	Only sets profile summary when it was not preset. Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass. Reviewers: eraman, davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D27733 llvm-svn: 289725	2016-12-14 22:06:49 +00:00
Geoff Berry	ca11a1e147	[GVNHoist] Move GVNHoist to function simplification part of pipeline. Summary: Move GVNHoist to later in the optimization pipeline, specifically, to the function simplification part of the pipeline. The new pipeline location allows GVNHoist to run on a function after its callees have been inlined but before the function has been considered for inlining into its callers, exposing more opportunities for hoisting. Performance results on AArch64 kryo: Improvements: Benchmarks/CoyoteBench/fftbench -24.952% spec2006/bzip2 -4.071% internal bmark -3.177% Benchmarks/PAQ8p/paq8p -1.754% spec2000/perlbmk -1.328% spec2006/h264ref -1.140% Regressions: internal bmark +1.818% Benchmarks/mafft/pairlocalalign +1.084% Reviewers: sebpop, dberlin, hiraditya Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27722 llvm-svn: 289696	2016-12-14 19:38:22 +00:00
Craig Topper	aeaa52cc11	[X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar floating point to integer conversion intrinsics. llvm-svn: 289639	2016-12-14 07:46:12 +00:00
Craig Topper	268b3abe6d	[X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar add/sub/mul/div/max/min intrinsics better. Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking. llvm-svn: 289636	2016-12-14 06:06:58 +00:00
Anna Thomas	65ca8e91cc	[IRCE] Avoid loop optimizations on pre and post loops Summary: This patch will add loop metadata on the pre and post loops generated by IRCE. Currently, we have metadata for disabling optimizations such as vectorization, unrolling, loop distribution and LICM versioning (and confirmed that these optimizations check for the metadata before proceeding with the transformation). The pre and post loops generated by IRCE need not go through loop opts (since these are slow paths). Added two test cases as well. Reviewers: sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26806 llvm-svn: 289588	2016-12-13 21:05:21 +00:00
Michael Kuperstein	3d23d4a234	[LV] Don't vectorize when we have a small static bound on trip count We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583	2016-12-13 20:38:18 +00:00
Rong Xu	3462cac9af	Fix the test cases committed in r289521. llvm-svn: 289556	2016-12-13 17:34:29 +00:00
David Callahan	ebcf916c5a	[ADCE] Add code to remove dead branches Summary: This is last in of a series of patches to evolve ADCE.cpp to support removing of unnecessary control flow. This patch adds the code to update the control and data flow graphs to remove the dead control flow. Also update unit tests to test the capability to remove dead, may-be-infinite loop which is enabled by the switch -adce-remove-loops. Previous patches: D23824 [ADCE] Add handling of PHI nodes when removing control flow D23559 [ADCE] Add control dependence computation D23225 [ADCE] Modify data structures to support removing control flow D23065 [ADCE] Refactor anticipating new functionality (NFC) D23102 [ADCE] Refactoring for new functionality (NFC) Reviewers: dberlin, majnemer, nadav, mehdi_amini Subscribers: llvm-commits, david2050, freik, twoh Differential Revision: https://reviews.llvm.org/D24918 llvm-svn: 289548	2016-12-13 16:42:18 +00:00
Craig Topper	ac75bca1eb	[X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523	2016-12-13 07:45:45 +00:00
NAKAMURA Takumi	b8ea75a010	llvm/test/Transforms/PGOProfile/noreturncall.ll REQUIRES asserts due to -debug-only. llvm-svn: 289522	2016-12-13 07:04:03 +00:00
Rong Xu	51a1e3c430	[PGO] Fix insane counts due to nonreturn calls Summary: Since we don't break BBs for function calls. We might get some insane counts (wrap of unsigned) in the presence of noreturn calls. This patch sets these counts to zero instead of the wrapped number. Reviewers: davidxl Subscribers: xur, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D27602 llvm-svn: 289521	2016-12-13 06:41:14 +00:00
Matthew Simpson	92ce0230b5	[SLP] Fix sign-extends for type-shrinking This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470	2016-12-12 21:11:04 +00:00
Reid Kleckner	30422eea0f	Revert "[SCEVExpand] do not hoist divisions by zero (PR30935)" Reverts r289412. It caused an OOB PHI operand access in instcombine when ASan is enabled. Reduction in progress. Also reverts "[SCEVExpander] Add a test case related to r289412" llvm-svn: 289453	2016-12-12 18:52:32 +00:00
Sanjay Patel	052220c5c8	remove stale FIXME note from test; NFC llvm-svn: 289445	2016-12-12 16:20:21 +00:00
Sanjay Patel	e730ce87a5	[InstCombine] fix bug when offsetting case values of a switch (PR31260) We could truncate the condition and then try to fold the add into the original condition value causing wrong case constants to be used. Move the offset transform ahead of the truncate transform and return after each transform, so there's no chance of getting confused values. Fix for: https://llvm.org/bugs/show_bug.cgi?id=31260 llvm-svn: 289442	2016-12-12 16:13:52 +00:00
Sanjay Patel	2b060c7700	[InstCombine] add test to show PR31260 miscompile; NFC llvm-svn: 289437	2016-12-12 15:28:44 +00:00
Sebastian Pop	8c9cc8c86b	[SCEVExpand] do not hoist divisions by zero (PR30935) SCEVExpand computes the insertion point for the components of a SCEV to be code generated. When it comes to generating code for a division, SCEVexpand would not be able to check (at compilation time) all the conditions necessary to avoid a division by zero. The patch disables hoisting of expressions containing divisions by anything other than non-zero constants in order to avoid hoisting these expressions past conditions that should hold before doing the division. The patch passes check-all on x86_64-linux. Differential Revision: https://reviews.llvm.org/D27216 llvm-svn: 289412	2016-12-12 02:52:51 +00:00
Craig Topper	7fc6d34ed1	[InstCombine][XOP] The instructions for the scalar frcz intrinsics are defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead. llvm-svn: 289411	2016-12-11 22:32:38 +00:00
Sanjoy Das	3336f681e3	[Verifier] Add verification for TBAA metadata Summary: This change adds some verification in the IR verifier around struct path TBAA metadata. Other than some basic sanity checks (e.g. we get constant integers where we expect constant integers), this checks: - That by the time an struct access tuple `(base-type, offset)` is "reduced" to a scalar base type, the offset is `0`. For instance, in C++ you can't start from, say `("struct-a", 16)`, and end up with `("int", 4)` -- by the time the base type is `"int"`, the offset better be zero. In particular, a variant of this invariant is needed for `llvm::getMostGenericTBAA` to be correct. - That there are no cycles in a struct path. - That struct type nodes have their offsets listed in an ascending order. - That when generating the struct access path, you eventually reach the access type listed in the tbaa tag node. Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26438 llvm-svn: 289402	2016-12-11 20:07:15 +00:00
Sanjay Patel	81ed3499cd	[Constants] don't die processing non-ConstantInt GEP indices in isGEPWithNoNotionalOverIndexing() (PR31262) This should fix: https://llvm.org/bugs/show_bug.cgi?id=31262 llvm-svn: 289401	2016-12-11 20:07:02 +00:00
Craig Topper	23ebd9564f	[X86][InstCombine] Add support for scalar FMA intrinsics to SimplifyDemandedVectorElts. This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them. llvm-svn: 289377	2016-12-11 08:54:52 +00:00
Craig Topper	7a230f4225	[X86][InstCombine] Add the test cases for r289370, r289371, and r289372. I forgot to add the new files before commiting. llvm-svn: 289374	2016-12-11 08:00:51 +00:00
Craig Topper	58917f3508	[AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit. llvm-svn: 289354	2016-12-11 01:59:36 +00:00
Craig Topper	9a63d7ade5	[X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant. llvm-svn: 289348	2016-12-11 00:23:50 +00:00
Davide Italiano	f8f391db16	[SCCP] Make the test added in r289175 more meaningful. Add a comment while here. llvm-svn: 289182	2016-12-09 03:49:20 +00:00
Davide Italiano	824d695231	[SCCP] Teach the pass about `mul %x 0` even if %x is overdefined. The motivating example is: extern int patatino; int goo() { int x = 0; for (int i = 0; i < 1000000; ++i) { x *= patatino; } return x; } Currently SCCP will not realize that this function returns always zero, therefore will try to unroll and vectorize the loop at -O3 producing an awful lot of (useless) code. With this change, it will just produce: 0000000000000000 <g>: xor %eax,%eax retq llvm-svn: 289175	2016-12-09 03:08:42 +00:00
Peter Collingbourne	8786754cc3	WholeProgramDevirt: Teach the pass to handle structs of arrays. This will become necessary in some cases once D22296 lands. llvm-svn: 289165	2016-12-09 01:10:11 +00:00
Peter Collingbourne	7a1e5bbe4e	Make WholeProgramDevirt understand ConstStruct vtables. Based on a patch by LemonBoy! Differential Revision: https://reviews.llvm.org/D26581 llvm-svn: 289162	2016-12-09 00:33:27 +00:00
Sanjay Patel	568196bf7b	[InstCombine] add tests for umin+icmp; NFC llvm-svn: 289157	2016-12-08 23:44:58 +00:00
Sanjay Patel	73d8bd9905	[InstCombine] add tests for umax+icmp; NFC llvm-svn: 289156	2016-12-08 23:36:57 +00:00
Zia Ansari	394cef803a	[InstSimplify] Add "X / 1.0" to SimplifyFDivInst. Differential Revision: https://reviews.llvm.org/D27587 llvm-svn: 289153	2016-12-08 23:27:40 +00:00
Sanjay Patel	b641aa3f14	[InstCombine] add tests for smax+icmp; NFC llvm-svn: 289151	2016-12-08 23:16:06 +00:00
Davide Italiano	54c683f9e7	[SCCP] Make sure SCCP and ConstantFolding agree on undef >> a. Currently SCCP folds the value to -1, while ConstantProp folds to 0. This changes SCCP to do what ConstantFolding does. llvm-svn: 289147	2016-12-08 22:28:53 +00:00
Sanjay Patel	2580c95dc1	[InstSimplify] add fdiv x/1.0 test and update checks; NFC llvm-svn: 289098	2016-12-08 20:23:56 +00:00
Peter Collingbourne	235c275b20	IR, X86: Understand !absolute_symbol metadata on global variables. Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087	2016-12-08 19:01:00 +00:00
Alexey Bataev	4f0d469d45	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043	2016-12-08 11:57:51 +00:00
Evgeniy Stepanov	0c8957c198	CFI-icall on Thumb Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb. Use b.w branch instruction. Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set. This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc. Differential Revision: https://reviews.llvm.org/D27499 llvm-svn: 289008	2016-12-08 00:32:26 +00:00
Davide Italiano	1ed5396304	[BDCE] Skip metadata while replacing uses. The fix committed in r288851 doesn't cover all the cases. In particular, if we have an instruction with side effects which has a no non-dbg use not depending on the bits, we still perform RAUW destroying the dbg.value's first argument. Prevent metadata from being replaced here to avoid the issue. Differential Revision: https://reviews.llvm.org/D27534 llvm-svn: 288987	2016-12-07 21:47:32 +00:00
Matt Arsenault	624e1b348c	InstCombine: Fold bitcast of vector to FP scalar llvm-svn: 288978	2016-12-07 20:56:11 +00:00
Eli Friedman	c6885fc369	[GVNHoist] Invalidate MemDep when an instruction is moved. See also r279907. Fixes https://llvm.org/bugs/show_bug.cgi?id=30991 . Differential Revision: https://reviews.llvm.org/D27493 llvm-svn: 288968	2016-12-07 19:55:59 +00:00
Sanjay Patel	964c735f86	[InstCombine] add tests for smin+icmp; NFC The tests that already work are folded in InstSimplify, so those tests should be redundant and we can remove them if they don't seem worthwhile for completeness. llvm-svn: 288957	2016-12-07 18:56:55 +00:00
Matthew Simpson	364da7e527	[LV] Scalarize operands of predicated instructions This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909	2016-12-07 15:03:32 +00:00
Andrea Di Biagio	ae5780104f	When GVN removes a redundant load, it should not modify the debug location of the dominating load. In the case of a fully redundant load LI dominated by an equivalent load V, GVN should always preserve the original debug location of V. Otherwise, we risk to introduce an incorrect stepping. If V has debug info, then clearly it should not be modified. If V has a null debugloc, then it is still potentially incorrect to propagate LI's debugloc because LI may not post-dominate V. Differential Revision: https://reviews.llvm.org/D27468 llvm-svn: 288903	2016-12-07 12:31:36 +00:00
Peter Collingbourne	67ec0eb531	LowerTypeTests: Add a test that covers "unsatisfiable" type metadata. llvm-svn: 288881	2016-12-07 03:04:34 +00:00
Sanjay Patel	5369775a84	[InstSimplify] fixed (?) to not mutate icmps As Eli noted in the post-commit thread for r288833, the use of swapOperands() may not be allowed in InstSimplify, so I'm removing those calls here pending further review. The swap mutates the icmp, and there doesn't appear to be precedent for instruction mutation in InstSimplify. I didn't actually have any tests for those cases, so I'm adding a few here. llvm-svn: 288855	2016-12-06 22:09:52 +00:00
Davide Italiano	043e66137c	[BDCE/DebugInfo] Preserve llvm.dbg.value's argument. BDCE has two phases: 1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so, replaces all its uses with the constant zero. 2. Then, it asks SimplifyDemandedBits again if the instruction is really dead (no side effects etc..) and if so, eliminates it. Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use: %call = tail call i32 (...) @g() #4, !dbg !15 tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17 -> %call = tail call i32 (...) @g() #4, !dbg !15 tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17 but not eliminating the call because it may have arbitrary side effects. In other words, we lose some debug informations. This patch fixes the problem making sure that BDCE does nothing with the instruction if it has side effects and no non-dbg uses. Differential Revision: https://reviews.llvm.org/D27471 llvm-svn: 288851	2016-12-06 21:52:47 +00:00
Sanjay Patel	9b1b2de348	[InstSimplify] add folds for and-of-icmps with same operands All of these (and a few more) are already handled by InstCombine, but we shouldn't have to wait until then to simplify these because they're cheap to deal with here in InstSimplify. This is the 'and' sibling of the earlier 'or' patch: https://reviews.llvm.org/rL288833 llvm-svn: 288841	2016-12-06 19:05:46 +00:00
Sanjay Patel	827414876f	[InstSimplify] add tests for and-of-icmps; NFC llvm-svn: 288837	2016-12-06 18:46:54 +00:00
Sanjay Patel	d0ccdb46b9	[InstSimplify] add folds for or-of-icmps with same operands All of these (and a few more) are already handled by InstCombine, but we shouldn't have to wait until then to simplify these because they're cheap to deal with here in InstSimplify. llvm-svn: 288833	2016-12-06 18:09:37 +00:00
Sanjay Patel	6d4444f931	[InstSimplify] add tests for or-of-icmps; NFC llvm-svn: 288830	2016-12-06 17:49:10 +00:00
Simon Pilgrim	e633741c3a	[SLPVectorizer][X86] Tests to show missed buildvector sitofp/fptosi vectorizations e.g. buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32)) llvm-svn: 288807	2016-12-06 13:29:55 +00:00
Keno Fischer	92f377bd74	[LAA] Prevent invalid IR for loop-invariant bound in loop body Summary: If LAA expands a bound that is loop invariant, but not hoisted out of the loop body, it used to use that value anyway, causing a non-domination error, because the memcheck block is of course not dominated by the scalar loop body. Detect this situation and expand the SCEV expression instead. Fixes PR31251 Reviewers: anemet Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D27397 llvm-svn: 288705	2016-12-05 21:25:03 +00:00
Adrian Prantl	941fa7588b	[DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation so we can stop using DW_OP_bit_piece with the wrong semantics. The entire back story can be found here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's offset field to mean the offset into the source variable rather than the offset into the location at the top the DWARF expression stack. In order to be able to fix this in a subsequent patch, this patch introduces a dedicated DW_OP_LLVM_fragment operation with the semantics that we used to apply to DW_OP_bit_piece, which is what we actually need while inside of LLVM. This patch is complete with a bitcode upgrade for expressions using the old format. It does not yet fix the DWARF backend to use DW_OP_bit_piece correctly. Implementation note: We discussed several options for implementing this, including reserving a dedicated field in DIExpression for the fragment size and offset, but using an custom operator at the end of the expression works just fine and is more efficient because we then only pay for it when we need it. Differential Revision: https://reviews.llvm.org/D27361 rdar://problem/29335809 llvm-svn: 288683	2016-12-05 18:04:47 +00:00
Sanjay Patel	b7f8cb698c	[InstCombine] change select type to eliminate bitcasts This solves a secondary problem seen in PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137#c6 This is similar to the bitwise logic op fold added with: https://reviews.llvm.org/rL287707 And like that patch, I'm artificially restricting the transform from vector <-> scalar types until we're sure that the backend can handle that. llvm-svn: 288584	2016-12-03 15:25:16 +00:00
Guozhi Wei	835de1f3ab	[ppc] Correctly compute the cost of loading 32/64 bit memory into VSR VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial. Differential Revision: https://reviews.llvm.org/D26713 llvm-svn: 288560	2016-12-03 00:41:43 +00:00
Rong Xu	a5b5745a62	[PGO] Fix PGO use ICE when there are unreachable BBs For -O0 there might be unreachable BBs, which breaks the assumption that all the BBs have an auxiliary data structure. In this patch, we add another interface called findBBInfo() so that a nullptr can be returned for the unreachable BBs (and the callers can ignore those BBs). This fixes the bug reported https://llvm.org/bugs/show_bug.cgi?id=31209 Differential Revision: https://reviews.llvm.org/D27280 llvm-svn: 288528	2016-12-02 19:10:29 +00:00
Matt Arsenault	d4da0edd98	AMDGPU: Implement isCheapAddrSpaceCast llvm-svn: 288523	2016-12-02 18:12:53 +00:00
Simon Pilgrim	b2116d9b94	[InstCombine] Add vector urem tests Demonstrate missed opportunity for urem -> and combine for powerof2 or zero non-uniform constant dividers llvm-svn: 288510	2016-12-02 17:16:21 +00:00
Simon Pilgrim	43bc269ffa	[InstCombine] Regenerate vector srem tests llvm-svn: 288509	2016-12-02 17:12:56 +00:00
Renato Golin	5b8e7ecdb3	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508	2016-12-02 16:56:26 +00:00
Alexey Bataev	e8e94a7176	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497	2016-12-02 12:20:22 +00:00
Simon Pilgrim	c70d3796fb	[SLPVectorizer][X86] Add tests for vectorization of buildvector of scalar fp-ops (PR6246) llvm-svn: 288492	2016-12-02 10:54:46 +00:00
Artem Belevich	704395a25a	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431	2016-12-01 22:52:15 +00:00
Philip Reames	89e92d21b4	[PR29121] Don't fold if it would produce atomic vector loads or stores The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal. Differential Revision: https://reviews.llvm.org/D24365 llvm-svn: 288415	2016-12-01 20:17:06 +00:00
Alexey Bataev	2c01af5904	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412	2016-12-01 20:06:53 +00:00
Alexey Bataev	62af7252f1	[SLP] Fixed cost model for horizontal reduction. Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398	2016-12-01 18:42:42 +00:00
Adam Nemet	4ddb8c01b1	[GVN, OptDiag] Print the interesting instructions involved in missed load-elimination [recommitting after the fix in r288307] This includes the intervening store and the load/store that we're trying to forward from in the optimization remark for the missed load elimination. This is hooked up under a new mode in ORE that allows for compile-time budget for a bit more analysis to print more insightful messages. This mode is currently enabled for -fsave-optimization-record (-Rpass is trickier since it is controlled in the front-end). With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 Differential Revision: https://reviews.llvm.org/D26490 llvm-svn: 288381	2016-12-01 17:34:50 +00:00
Adam Nemet	8b5fba8081	[GVN, OptDiag] Include the value that is forwarded in load elimination [recommitting after the fix in r288307] This requires some changes to the opt-diag API. Hal and I have discussed this at the Dev Meeting and came up with a streaming delimiter (setExtraArgs) to solve this. Arguments after this delimiter are only included in the optimization records and not in the remarks printed in the compiler output. (Note, how in the test the content of the YAML file changes but the remarks on the compiler output don't.) This implements the green GVN message with a bug fix at line http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 The fix is that now we properly include the constant value in the message: "load of type i32 eliminated in favor of 7" Differential Revision: https://reviews.llvm.org/D26489 llvm-svn: 288380	2016-12-01 17:34:44 +00:00
Alexey Bataev	fc617690ab	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288377	2016-12-01 17:26:54 +00:00
Alexey Bataev	e59a8351d0	Revert "[SLP] Additional tests with the cost of vector operations." This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e. llvm-svn: 288371	2016-12-01 16:45:04 +00:00
Adam Nemet	4d2a6e5998	[GVN] Basic optimization remark support [recommitting after the fix in r288307] Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288370	2016-12-01 16:40:32 +00:00
Alexey Bataev	2ff768475d	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288369	2016-12-01 16:11:48 +00:00
Adam Nemet	feafcd9688	[GVN] When merging blocks update LoopInfo if it's available If LoopInfo is available during GVN, BasicAA will use it. However MergeBlockIntoPredecessor does not update LI as it merges blocks. This didn't use to cause problems because LI was freed before GVN/BasicAA. Now with OptimizationRemarkEmitter, the lifetime of LI is extended so LI needs to be kept up-to-date during GVN. Differential Revision: https://reviews.llvm.org/D27288 llvm-svn: 288307	2016-12-01 03:56:43 +00:00
Michael Kuperstein	b151a641aa	[LoopUnroll] Implement profile-based loop peeling This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274	2016-11-30 21:13:57 +00:00

... 3 4 5 6 7 ...

8297 Commits