llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	e59a8351d0	Revert "[SLP] Additional tests with the cost of vector operations." This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e. llvm-svn: 288371	2016-12-01 16:45:04 +00:00
Alexey Bataev	2ff768475d	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288369	2016-12-01 16:11:48 +00:00
Sanjay Patel	8ca30ab0c5	[InstSimplify] allow integer vector types to use computeKnownBits Note that the non-splat lshr+lshr test folded, but that does not work in general. Something is missing or wrong in computeKnownBits as the non-splat shl+shl test still shows. llvm-svn: 288005	2016-11-27 21:07:28 +00:00
Sanjay Patel	dc2917b969	add tests to show missing analysis; NFC llvm-svn: 287998	2016-11-27 15:54:45 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00
Simon Pilgrim	e5dbdbefca	[CostModel][X86] Add v2f32 -> v2i64 fptosi/fptoui cost tests llvm-svn: 287756	2016-11-23 11:43:00 +00:00
Simon Pilgrim	d1aed9a9e6	[CostModel][X86] Updated sitofp/uitofp scalar/vector cost tests Better coverage of all legal types + special cases. Removed old fptoui tests which are all handled in fptoui.ll llvm-svn: 287678	2016-11-22 18:55:49 +00:00
Yaxun Liu	02f75f31e0	Fix known zero bits for addrspacecast. Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1. This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand. Differential Revision: https://reviews.llvm.org/D26803 llvm-svn: 287545	2016-11-21 15:42:31 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Simon Pilgrim	779da8e5ea	[CostModel][X86] Added mul costs for vXi8 vectors More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838	2016-11-14 15:54:24 +00:00
Simon Pilgrim	27fed8e5d6	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832	2016-11-14 14:45:16 +00:00
Peter Collingbourne	d93620bf4d	IR: Introduce inrange attribute on getelementptr indices. If the inrange keyword is present before any index, loading from or storing to any pointer derived from the getelementptr has undefined behavior if the load or store would access memory outside of the bounds of the element selected by the index marked as inrange. This can be used, e.g. for alias analysis or to split globals at element boundaries where beneficial. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html Differential Revision: https://reviews.llvm.org/D22793 llvm-svn: 286514	2016-11-10 22:34:55 +00:00
Tobias Grosser	455b9bd65c	[RegionInfo] Add three tests that include infinite loops These examples are variations that were inspired from a small subgraph taken from paper.ll which are interesting as they show certain issues with infinite loops. llvm-svn: 286450	2016-11-10 13:56:19 +00:00
Andrew Kaylor	9604f34996	[BasicAA] Teach BasicAA to handle the inaccessiblememonly and inaccessiblemem_or_argmemonly attributes Differential Revision: https://reviews.llvm.org/D26382 llvm-svn: 286294	2016-11-08 21:07:42 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Chad Rosier	8f348017b0	[AliasSetTracker] Make AST smarter about assume intrinsics that don't actually affect memory. Differential Revision: https://reviews.llvm.org/D26252 llvm-svn: 286108	2016-11-07 14:11:45 +00:00
Alexey Bataev	d07c731d86	Improved cost model for FDIV and FSQRT, by Andrew Tischenko There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564	2016-10-31 12:10:53 +00:00
Tom Stellard	13068995b9	[Loads] Fix crash in is isDereferenceableAndAlignedPointer() Summary: We were trying to add APInt values with different bit sizes after visiting an addrspacecast instruction which changed the bit width of the pointer. Reviewers: majnemer, hfinkel Subscribers: hfinkel, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24774 llvm-svn: 285407	2016-10-28 15:32:28 +00:00
Simon Pilgrim	d23219b9ee	[X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets llvm-svn: 285329	2016-10-27 18:32:06 +00:00
Simon Pilgrim	820e1326d7	[X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304	2016-10-27 15:27:00 +00:00
Chad Rosier	4447d7a816	Revert "[AliasSetTracker] Make AST smarter about intrinsics that don't actually affect memory." This reverts commit r285191. LICM appears to rely on the Alias Set Tracker hitting lifetime markers to prevent code from being moved outside of the original scope. llvm-svn: 285227	2016-10-26 19:18:19 +00:00
Chad Rosier	1408628ffa	[AliasSetTracker] Make AST smarter about intrinsics that don't actually affect memory. Differential Revision: https://reviews.llvm.org/D25969 llvm-svn: 285191	2016-10-26 12:42:11 +00:00
Eli Friedman	c5b7262073	Fix regression from my recent GlobalsAA fix. There are two fixes here: one, AnalyzeUsesOfPointer can't return false until it has checked all the uses of the pointer. Two, if a global uses another global, we have to assume the address of the first global escapes. Fixes https://llvm.org/bugs/show_bug.cgi?id=30707 . Differential Revision: https://reviews.llvm.org/D25798 llvm-svn: 285034	2016-10-24 21:47:44 +00:00
Simon Pilgrim	d09c04d267	[CostModel][X86] Added tests for current integer signed/unsigned remainder costs llvm-svn: 284940	2016-10-23 18:35:02 +00:00
Simon Pilgrim	6ac1e98b09	[X86][SSE] Add SSE41/AVX1 costs for vector shifts. We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939	2016-10-23 16:49:04 +00:00
Simon Pilgrim	e16b1e2271	[CostModel][X86] Added tests for current integer trunc costs llvm-svn: 284938	2016-10-23 15:17:52 +00:00
Gerolf Hoflehner	9e2afa8bd7	[BasicAA] Fix - missed alias in GEP expressions In BasicAA GEP operand values get adjusted ("wrap-around") based on the pointersize. Otherwise, in non-64b modes, AA could report false negatives. However, a wrap-around is valid only for a fully evaluated expression. It had been introduced to fix an alias problem in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160118/326163.html. This commit restricts the wrap-around to constant gep operands only where the value is known at compile-time. llvm-svn: 284908	2016-10-22 02:41:39 +00:00
Li Huang	faa857dba7	[SCEV] Memoize visitMulExpr results in SCEVRewriteVisitor. Summary: When SCEVRewriteVisitor traverses the SCEV DAG, it may visit the same SCEV multiple times if this SCEV is referenced by multiple other SCEVs. This has exponential time complexity in the worst case. Memoizing the results will avoid re-visiting the same SCEV. Add a map to save the results, and override the visit function of SCEVVisitor. Now SCEVRewriteVisitor only visit each SCEV once and thus returns the same result for the same input SCEV. This patch fixes PR18606, PR18607. Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin Differential Revision: https://reviews.llvm.org/D25810 llvm-svn: 284868	2016-10-21 20:05:21 +00:00
John Brawn	84b21835f1	[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818	2016-10-21 11:08:48 +00:00
Li Huang	fcfe8cd3ae	[SCEV] Add a threshold to restrict number of mul operands to be inlined into SCEV This is to avoid inlining too many multiplication operands into a SCEV, which could take exponential time in the worst case. Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin Differential Revision: https://reviews.llvm.org/D25794 llvm-svn: 284784	2016-10-20 21:38:39 +00:00
Simon Pilgrim	365be4f95c	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755	2016-10-20 18:00:35 +00:00
Simon Pilgrim	1388c0acc1	[CostModel][X86] Added tests for sdiv/udiv costs for uniform const and uniform const power-of-2 Shows poor costings in AVX1/AVX512BW for certain vector types llvm-svn: 284748	2016-10-20 17:16:38 +00:00
Simon Pilgrim	025e26dd32	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744	2016-10-20 16:39:11 +00:00
Simon Pilgrim	16cc616ebc	[CostModel][X86] Added tests for sdiv/udiv costs for scalar and 128/256/512 bit integer vectors Shows current bug in AVX1/AVX512BW costs for 256 bit vector types llvm-svn: 284723	2016-10-20 12:34:00 +00:00
Chad Rosier	6e3a92ec88	[AliasSetTracker] Add support for memcpy and memmove. Differential Revision: https://reviews.llvm.org/D25776 llvm-svn: 284630	2016-10-19 19:09:03 +00:00
John Brawn	ecf79300dd	[SCEV] More accurate calculation of max backedge count of some less-than loops In loops that look something like i = n; do { ... } while(i++ < n+k); where k is a constant, the maximum backedge count is k (in fact the backedge count will be either 0 or k, depending on whether n+k wraps). More generally for LHS < RHS if RHS-(LHS of first comparison) is a constant then the loop will iterate either 0 or that constant number of times. This allows for more loop unrolling with the recent upper bound loop unrolling changes, and I'm working on a patch that will let loop unrolling additionally make use of the loop being executed either 0 or k times (we need to retain the loop comparison only on the first unrolled iteration). Differential Revision: https://reviews.llvm.org/D25607 llvm-svn: 284465	2016-10-18 10:10:53 +00:00
Simon Pilgrim	4ddc92b6cd	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Tobias Grosser	2bbec0ee7f	[SCEV] Consider delinearization pattern with extension with identity factor Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor. Reviewers: sanjoy, grosser Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser Differential Revision: https://reviews.llvm.org/D16492 llvm-svn: 284378	2016-10-17 11:56:26 +00:00
Tom Stellard	17eb3413cd	[ValueTracking] Fix crash in GetPointerBaseWithConstantOffset() Summary: While walking defs of pointer operands we were assuming that the pointer size would remain constant. This is not true, because addresspacecast instructions may cast the pointer to an address space with a different pointer width. This partial reverts r282612, which was a more conservative solution to this problem. Reviewers: reames, sanjoy, apilipenko Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24772 llvm-svn: 283557	2016-10-07 14:23:29 +00:00
Bjorn Pettersson	3961603921	[ValueTracking] Teach computeKnownBits and ComputeNumSignBits to look through ExtractElement. Summary: The computeKnownBits and ComputeNumSignBits functions in ValueTracking can now do a simple look-through of ExtractElement. Reviewers: majnemer, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24955 llvm-svn: 283434	2016-10-06 09:56:21 +00:00
Eli Friedman	74bed9d757	Make GlobalsAA ignore dead constant expressions. Slightly improves the precision of GlobalsAA in certain situations, and makes the behavior of optimization passes more predictable. Differential Revision: https://reviews.llvm.org/D24104 llvm-svn: 283165	2016-10-04 00:03:55 +00:00
Sanjay Patel	d27a21874b	[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433) This should fix: https://llvm.org/bugs/show_bug.cgi?id=30433 There are a couple of open questions about the codegen: 1. Should we let scalar ops be scalars and avoid vector constant loads/splats? 2. Should we have a pass to combine constants such as the inverted pair that we have here? Differential Revision: https://reviews.llvm.org/D25165 llvm-svn: 283119	2016-10-03 16:38:27 +00:00
Simon Pilgrim	1ec20e9b0a	[CostModel][X86] Added tests for current fptosi/fptoui costs llvm-svn: 283047	2016-10-01 19:09:59 +00:00
Simon Pilgrim	e0ec5c1f05	[CostModel][X86] Added fcopysign costs llvm-svn: 283044	2016-10-01 16:41:52 +00:00
Simon Pilgrim	8b021c382d	[CostModel][X86] Added fabs costs llvm-svn: 283042	2016-10-01 16:30:13 +00:00
Jun Bum Lim	3822939ba7	Enhance calcColdCallHeuristics for InvokeInst Summary: When identifying cold blocks, consider only the edge to the normal destination if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst. Reviewers: mcrosier, hfinkel, davidxl Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D24868 llvm-svn: 282262	2016-09-23 17:26:14 +00:00
Simon Pilgrim	9178059826	[CostModel][X86] Added scalar float op costs llvm-svn: 281864	2016-09-18 21:01:20 +00:00
Sanjay Patel	e104a5f35a	auto-generate checks llvm-svn: 281756	2016-09-16 17:54:52 +00:00
David L Kreitzer	8bbabee21a	Reapplying r278731 after fixing the problem that caused it to be reverted. Enhance SCEV to compute the trip count for some loops with unknown stride. Patch by Pankaj Chawla Differential Revision: https://reviews.llvm.org/D22377 llvm-svn: 281732	2016-09-16 14:38:13 +00:00
Wei Mi	24662395df	Create a getelementptr instead of sub expr for ValueOffsetPair if the value is a pointer. This patch is to fix PR30213. When expanding an expr based on ValueOffsetPair, if the value is of pointer type, we can only create a getelementptr instead of sub expr. Differential Revision: https://reviews.llvm.org/D24088 llvm-svn: 281439	2016-09-14 04:39:50 +00:00
Elena Demikhovsky	3622fbfc68	[Loop Vectorizer] Fixed memory confilict checks. Fixed a bug in run-time checks for possible memory conflicts inside loop. The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size". Differential revision: https://reviews.llvm.org/D23176 llvm-svn: 279930	2016-08-28 08:53:53 +00:00
Wei Mi	59ca96636d	[UNROLL] Postpone ScalarEvolution::forgetLoop after TripCountSC is expanded when unroll runtime iteration loop. In llvm::UnrollRuntimeLoopRemainder, if the loop to be unrolled is the inner loop inside a loop nest, the scalar evolution needs to be dropped for its parent loop which is done by ScalarEvolution::forgetLoop. However, we can postpone forgetLoop to the end of UnrollRuntimeLoopRemainder so TripCountSC expansion can still reuse existing value. Differential Revision: https://reviews.llvm.org/D23572 llvm-svn: 279748	2016-08-25 16:17:18 +00:00
Evgeny Stupachenko	d7f9c3564a	The patch improves ValueTracking on left shift with nsw flag. Summary: The patch fixes PR28946. Reviewers: majnemer, sanjoy Differential Revision: http://reviews.llvm.org/D23296 From: Li Huang llvm-svn: 279684	2016-08-24 23:01:33 +00:00
Artur Pilipenko	a1d9a67496	Remove missing file from r279433 reversal llvm-svn: 279434	2016-08-22 13:18:19 +00:00
Simon Pilgrim	89e375a95e	[CostModel][X86] Removed shift tests There are more thorough tests found in vshift-*-cost.ll llvm-svn: 279406	2016-08-21 19:56:02 +00:00
Simon Pilgrim	6ad12ec629	[CostModel][X86] Added costs for vXi16 and vXi8 vectors for add/sub/mul/and/or/xor tests llvm-svn: 279405	2016-08-21 19:44:44 +00:00
Simon Pilgrim	b0a0576ffc	[CostModel][X86] Replaced SSSE3 with SSE2 costs to create a better baseline llvm-svn: 279404	2016-08-21 19:14:48 +00:00
Simon Pilgrim	07d7a21ea1	[CostModel][X86] Added fsqrt and fma costs llvm-svn: 279403	2016-08-21 19:06:25 +00:00
Simon Pilgrim	3cd61a084f	[CostModel][X86] Split off float arithmetic cost tests llvm-svn: 279402	2016-08-21 18:34:47 +00:00
Simon Pilgrim	054e7d2ec1	[CostModel][X86] Added sub, or, and, fadd and fsub costs and missing 512-bit mul costs llvm-svn: 279301	2016-08-19 19:07:10 +00:00
Simon Pilgrim	fbfa3ee4f6	[CostModel][X86] Added some AVX512 and 512-bit vector cost tests llvm-svn: 279291	2016-08-19 18:24:10 +00:00
Simon Pilgrim	e309d2d0c3	[CostModel][X86] Add fdiv + frem cost tests llvm-svn: 279283	2016-08-19 17:39:00 +00:00
Michael Kuperstein	41898f0396	[AliasSetTracker] Degrade AliasSetTracker when may-alias sets get too large. Repeated inserts into AliasSetTracker have quadratic behavior - inserting a pointer into AST is linear, since it requires walking over all "may" alias sets and running an alias check vs. every pointer in the set. We can avoid this by tracking the total number of pointers in "may" sets, and when that number exceeds a threshold, declare the tracker "saturated". This lumps all pointers into a single "may" set that aliases every other pointer. (This is a stop-gap solution until we migrate to MemorySSA) This fixes PR28832. Differential Revision: https://reviews.llvm.org/D23432 llvm-svn: 279274	2016-08-19 17:05:22 +00:00
Hans Wennborg	3879035e66	SCEV: Don't assert about non-SCEV-able value in isSCEVExprNeverPoison() (PR28932) Differential Revision: https://reviews.llvm.org/D23594 llvm-svn: 278999	2016-08-17 22:50:18 +00:00
Reid Kleckner	b99b709068	Revert "Enhance SCEV to compute the trip count for some loops with unknown stride." This reverts commit r278731. It caused http://crbug.com/638314 llvm-svn: 278853	2016-08-16 21:02:04 +00:00
Sanjoy Das	78db2963f6	Revert "[ValueTracking] Improve ValueTracking on left shift with nsw flag" This reverts commit r278172. It causes PR28946. llvm-svn: 278740	2016-08-15 21:01:31 +00:00
David L Kreitzer	7fe18251a5	Enhance SCEV to compute the trip count for some loops with unknown stride. Patch by Pankaj Chawla Differential Revision: https://reviews.llvm.org/D22377 llvm-svn: 278731	2016-08-15 20:21:41 +00:00
Eli Friedman	a6707f56b5	[DSE] Don't remove stores made live by a call which unwinds. Issue exposed by noalias or more aggressive alias analysis. Fixes http://llvm.org/PR25422. Differential revision: https://reviews.llvm.org/D21007 llvm-svn: 278451	2016-08-12 01:09:53 +00:00
Andrew Kaylor	b10f6876cd	[ValueTracking] An improvement to IR ValueTracking on Non-negative Integers Patch by Li Huang Differential Revision: https://reviews.llvm.org/D18777 llvm-svn: 278267	2016-08-10 18:47:19 +00:00
Andrew Kaylor	3c05edfd5e	[ValueTracking] Improve ValueTracking on left shift with nsw flag Patch by Li Huang Differential Revison: https://reviews.llvm.org/D23296 llvm-svn: 278172	2016-08-09 22:41:35 +00:00
Wei Mi	575435012c	Fix the runtime error caused by "Use ValueOffsetPair to enhance value reuse during SCEV expansion". The patch is to fix the bug in PR28705. It was caused by setting wrong return value for SCEVExpander::findExistingExpansion. The return values of findExistingExpansion have different meanings when the function is used in different ways so it is easy to make mistake. The fix creates two new interfaces to replace SCEVExpander::findExistingExpansion, and specifies where each interface is expected to be used. Differential Revision: https://reviews.llvm.org/D22942 llvm-svn: 278161	2016-08-09 20:40:03 +00:00
Wei Mi	785858cf6c	Recommit "Use ValueOffsetPair to enhance value reuse during SCEV expansion". The fix for PR28705 will be committed consecutively. In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion. However, const folding and sext/zext distribution can make the reuse still difficult. A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and S1 = S2 + C_a S3 = S2 + C_b where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused by the fact that S3 is generated from S1 after const folding. In order to do that, we represent ExprValueMap as a mapping from SCEV to ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to V1 - C_a + C_b. Differential Revision: https://reviews.llvm.org/D21313 llvm-svn: 278160	2016-08-09 20:37:50 +00:00
Sanjoy Das	d4c85af7fd	[SCEV] Un-grep'ify tests; NFC llvm-svn: 277861	2016-08-05 20:33:49 +00:00
Sanjoy Das	b0b4e86215	[SCEV] Don't infinitely recurse on unreachable code llvm-svn: 277848	2016-08-05 18:34:14 +00:00
Michael Kuperstein	3ceac2bbd5	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782	2016-08-04 22:48:03 +00:00
Simon Pilgrim	c8fe132756	[X86] Dropped XOP ctbits checks - they match the AVX checks llvm-svn: 277718	2016-08-04 11:04:13 +00:00
Simon Pilgrim	5d5ca9c0cb	[X86][SSE] Add initial costs for vector CTTZ/CTLZ llvm-svn: 277716	2016-08-04 10:51:41 +00:00
George Burgess IV	5f0e76dca6	[CFLAA] Remove modref queries from CFLAA. As it turns out, modref queries are broken with CFLAA. Specifically, the data source we were using for determining modref behaviors explicitly ignores operations on non-pointer values. So, it wouldn't note e.g. storing an i32 to an i32* (or loading an i64 from an i64). It also ignores external function calls, rather than acting conservatively for them. (N.B. These operations, where necessary, are* tracked by CFLAA; we just use a different mechanism to do so. Said mechanism is relatively imprecise, so it's unlikely that we can provide reasonably good modref answers with it as implemented.) Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22978 llvm-svn: 277366	2016-08-01 18:47:28 +00:00
Adam Nemet	aa3506c5f0	[BPI] Add new LazyBPI analysis Summary: The motivation is the same as in D22141: In order to add the hotness attribute to optimization remarks we need BFI to be available in all passes that emit optimization remarks. BFI depends on BPI so unless we make this lazy as well we would still compute BPI unconditionally. The solution is to use the new LazyBPI pass in LazyBFI and only compute BPI when computation of BFI is requested by the client. I extended the laziness test using a LoopDistribute test to also cover BPI. Reviewers: hfinkel, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22835 llvm-svn: 277083	2016-07-28 23:31:12 +00:00
George Burgess IV	dbd35c44d4	[CFLAA] Add getModRefBehavior to CFLAnders. This patch lets CFLAnders respond to mod-ref queries. It also includes a small bugfix to CFLSteens. Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22823 llvm-svn: 276939	2016-07-27 23:07:07 +00:00
Hans Wennborg	685e8ff953	Revert r276136 "Use ValueOffsetPair to enhance value reuse during SCEV expansion." It causes Clang tests to fail after Windows self-host (PR28705). (Also reverts follow-up r276139.) llvm-svn: 276822	2016-07-26 23:25:13 +00:00
Wei Mi	97de034e18	Remove useless pass from the pipeline in test/Analysis/Dominators/2007-01-14-BreakCritEdges.ll. llvm-svn: 276644	2016-07-25 16:27:34 +00:00
Sanjoy Das	a7d9ec8751	[SCEV] Make isImpliedCondOperandsViaRanges smarter This change lets us prove things like "{X,+,10} s< 5000" implies "{X+7,+,10} does not sign overflow" It does this by replacing replacing getConstantDifference by computeConstantDifference (which is smarter) in isImpliedCondOperandsViaRanges. llvm-svn: 276505	2016-07-23 00:54:36 +00:00
Wei Mi	e04d0eff29	[PM] Port BreakCriticalEdges to the new PM. Differential Revision: https://reviews.llvm.org/D22688 llvm-svn: 276449	2016-07-22 18:04:25 +00:00
Wei Mi	481232e991	Fix test/Analysis/ScalarEvolution/scev-expander-existing-value-offset.ll for rL276136. The content in this testcase was accidentally duplicated. Fix the error. llvm-svn: 276139	2016-07-20 16:54:58 +00:00
Wei Mi	db80c0c77f	Use ValueOffsetPair to enhance value reuse during SCEV expansion. In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion. However, const folding and sext/zext distribution can make the reuse still difficult. A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and S1 = S2 + C_a S3 = S2 + C_b where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused by the fact that S3 is generated from S1 after const folding. In order to do that, we represent ExprValueMap as a mapping from SCEV to ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to V1 - C_a + C_b. Differential Revision: https://reviews.llvm.org/D21313 llvm-svn: 276136	2016-07-20 16:40:33 +00:00
Simon Pilgrim	1b4f511aaa	[X86][SSE] Add cost model values for CTPOP of vectors This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104	2016-07-20 10:41:28 +00:00
George Burgess IV	8b85321bae	[CFLAA] Make a test tell the truth. NFC. Dishonesty noted by Jia Chen. llvm-svn: 276028	2016-07-19 20:56:41 +00:00
George Burgess IV	3b059841ff	[CFLAA] Add some interproc. analysis to CFLAnders. This patch adds function summary support to CFLAnders. It also comes with a lot of tests! Woohoo! Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22450 llvm-svn: 276026	2016-07-19 20:47:15 +00:00
Simon Pilgrim	47638635cc	[X86] Add CTPOP/CTLZ/CTTZ scalar cost tests llvm-svn: 275725	2016-07-17 18:29:19 +00:00
George Burgess IV	22682e293b	[CFLAA] Add attributes handling for CFLAnders. This patch adds proper handling of stratified attributes into our anders-style CFLAA implementation. It also comes bundled with more CFLAnders tests. :) Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22325 llvm-svn: 275604	2016-07-15 20:02:49 +00:00
George Burgess IV	6d30aa03a0	[CFLAA] Add an initial CFLAnders implementation. This adds an incomplete anders-style implementation for CFLAA. It's incomplete in that it's missing interprocedural analysis, attrs handling, etc. and that it needs more tests. More tests and features will be added in future commits. Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22291 llvm-svn: 275602	2016-07-15 19:53:25 +00:00
Tom Stellard	1b5cf6217e	GlobalsAA: Functions with the argmemonly attribute won't read arbitrary globals Summary: In preparation for changing GlobalsAA to stop assuming that intrinsics can't read arbitrary globals, we need to make sure GlobalsAA is querying function attributes rather than relying on this assumption. This patch was inspired by: http://reviews.llvm.org/D20206 Reviewers: jmolloy, hfinkel Subscribers: eli.friedman, llvm-commits Differential Revision: https://reviews.llvm.org/D21318 llvm-svn: 275433	2016-07-14 15:50:27 +00:00
Adam Nemet	c2f791d8a7	[BFI] Add new LazyBFI analysis pass Summary: This is necessary for D21771. In order to add the hotness attribute to optimization remarks we need BFI to be available in all passes that emit optimization remarks. However we don't want to pay for computing BFI unless the hotness attribute is requested. This is achieved by making BFI lazy at the very high-level through a new analysis pass -- BFI is not calculated unless requested. I am adding a test to check the laziness under D21771 where the first user of the analysis is added. Reviewers: hfinkel, dexonsmith, davidxl Subscribers: davidxl, dexonsmith, llvm-commits Differential Revision: http://reviews.llvm.org/D22141 llvm-svn: 275250	2016-07-13 05:01:48 +00:00
Keno Fischer	1efc3b70c5	Fix ScalarEvolutionExpander step scaling bug The expandAddRecExprLiterally function incorrectly transforms `[Start + Step * X]` into `Step * [Start + X]` instead of the correct transform of `[Step * X] + Start`. This caused https://github.com/JuliaLang/julia/issues/14704#issuecomment-174126219 due to what appeared to be sufficiently complicated loop interactions. Patch by Jameson Nash (jameson@juliacomputing.com). Reviewers: sanjoy Differential Revision: http://reviews.llvm.org/D16505 llvm-svn: 275239	2016-07-13 01:28:12 +00:00
Michael Kuperstein	f0c59330e9	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106	2016-07-11 21:39:44 +00:00
Hal Finkel	bf3957a553	Teach isDereferenceablePointer to look through returned-argument functions For functions which are known to return their argument, isDereferenceableAndAlignedPointer can examine the argument value. Differential Revision: http://reviews.llvm.org/D9384 llvm-svn: 275038	2016-07-11 03:08:49 +00:00
Hal Finkel	e186debb8b	Teach SCEV to look through returned-argument functions When building SCEVs, if a function is known to return its argument, then we can build the SCEV using the corresponding argument value. Differential Revision: http://reviews.llvm.org/D9381 llvm-svn: 275037	2016-07-11 02:48:23 +00:00
Hal Finkel	5c12d8fe8f	BasicAA should look through functions with returned arguments Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look through returned-argument functions when answering queries. This is essential so that we don't loose all other AA information when supplementing with llvm.noalias. Differential Revision: http://reviews.llvm.org/D9383 llvm-svn: 275035	2016-07-11 01:32:20 +00:00
Adam Nemet	f836067cc0	[LAA] Port test to the new PM This is a follow-on to r274452. The LAA with the new PM is a loop pass so we go from inner to outer loops. Also using a CHECK-NOT didn't make much sense because we print something in either case; whether an invariant is 'found' or 'not found'. llvm-svn: 274935	2016-07-08 21:24:06 +00:00
Justin Bogner	a466cc33fa	NVPTX: Remove the legacy ptx intrinsics - Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but not all of these registers were already accessible via the nvvm name. - Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0. There's a fair amount of code motion here, but it's all very mechanical. llvm-svn: 274769	2016-07-07 16:40:17 +00:00
Sean Silva	284b0324e2	[PM] Avoid getResult on a higher level in LoopAccessAnalysis Note that require<domtree> and require<loops> aren't needed because they come in implicitly via the loop pass manager. llvm-svn: 274712	2016-07-07 01:01:53 +00:00
Sanjay Patel	04b3496d9b	[x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434) This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658	2016-07-06 19:15:54 +00:00
Michael Kuperstein	aa71bdd3af	[TTI] The cost model should not assume vector casts get completely scalarized The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642	2016-07-06 17:30:56 +00:00
George Burgess IV	bfa401e5ad	[CFLAA] Split into Anders+Steens analysis. StratifiedSets (as implemented) is very fast, but its accuracy is also limited. If we take a more aggressive andersens-like approach, we can be way more accurate, but we'll also end up being slower. So, we've decided to split CFLAA into CFLSteensAA and CFLAndersAA. Long-term, we want to end up in a place where CFLSteens is queried first; if it can provide an answer, great (since queries are basically map lookups). Otherwise, we'll fall back to CFLAnders, BasicAA, etc. This patch splits everything out so we can try to do something like that when we get a reasonable CFLAnders implementation. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21910 llvm-svn: 274589	2016-07-06 00:26:41 +00:00
Nemanja Ivanovic	44513e545f	[PowerPC] - Legalize vector types by widening instead of integer promotion This patch corresponds to review: http://reviews.llvm.org/D20443 It changes the legalization strategy for illegal vector types from integer promotion to widening. This only applies for vectors with elements of width that is a multiple of a byte since we have hardware support for vectors with 1, 2, 3, 8 and 16 byte elements. Integer promotion for vectors is quite expensive on PPC due to the sequence of breaking apart the vector, extending the elements and reconstituting the vector. Two of these operations are expensive. This patch causes between minor and major improvements in performance on most benchmarks. There are very few benchmarks whose performance regresses. These regressions can be handled in a subsequent patch with a DAG combine (similar to how this patch handles int -> fp conversions of illegal vector types). llvm-svn: 274535	2016-07-05 09:22:29 +00:00
Nicolai Haehnle	84c9f9919a	Add writeonly IR attribute Summary: This complements the earlier addition of IntrWriteMem and IntrWriteArgMem LLVM intrinsic properties, see D18291. Also start using the attribute for memset, memcpy, and memmove intrinsics, and remove their special-casing in BasicAliasAnalysis. Reviewers: reames, joker.eph Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18714 llvm-svn: 274485	2016-07-04 08:01:29 +00:00
Xinliang David Li	8a021317a2	[PM] Port LoopAccessInfo analysis to new PM It is implemented as a LoopAnalysis pass as discussed and agreed upon. llvm-svn: 274452	2016-07-02 21:18:40 +00:00
Sanjoy Das	0da2d14766	[SCEV] Compute max be count from shift operator only if all else fails In particular, check to see if we can compute a precise trip count by exhaustively simulating the loop first. llvm-svn: 274199	2016-06-30 02:47:28 +00:00
George Burgess IV	d86e38e1db	[CFLAA] Add support for ModRef queries. This patch makes CFLAA answer some ModRef queries. Because we don't distinguish between reading/writing when making StratifiedSets, we're unable to offer any of the readonly-related answers. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21858 llvm-svn: 274197	2016-06-30 02:11:26 +00:00
Artur Pilipenko	7ad95ec22d	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043	2016-06-28 18:27:25 +00:00
Artur Pilipenko	72f76b8805	Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895	2016-06-27 16:54:33 +00:00
Artur Pilipenko	a36aa41519	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892	2016-06-27 16:29:26 +00:00
George Burgess IV	a3d62be733	[CFLAA] Propagate StratifiedAttrs in interproc. analysis. This patch also has a refactor that kills StratifiedAttr, and leaves us with StratifiedAttrs, because having both was mildly redundant. This patch makes us correctly handle stratified attributes when doing interprocedural analysis. It also adds another attribute, AttrCaller, which acts like AttrUnknown. We can filter out AttrCaller values when during interprocedural analysis, since the caller should have information about what arguments it's passing to its callee. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21645 llvm-svn: 273636	2016-06-24 01:00:03 +00:00
George Burgess IV	1f99da54c2	[CFLAA] Use better interprocedural function summaries. Previously, we just unified any arguments that seemed to be related to each other. With this patch, we now respect dereference levels, etc. which should make us substantially more accurate. Proper handling of StratifiedAttrs will be done in a later patch. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21536 llvm-svn: 273596	2016-06-23 18:55:23 +00:00
Michael Kuperstein	78028b84d2	[X86] Make arithmetic operations cost model test saner. NFC. llvm-svn: 273316	2016-06-21 20:41:40 +00:00
George Burgess IV	9fdbfe17a8	[CFLAA] Be more aggressive with interprocedural analysis. This patch makes us perform interprocedural analysis on functions that don't have internal linkage. It also removes a test that should've been deleted in an earlier commit (since other tests now cover everything that the newly-removed test covers). Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21513 llvm-svn: 273229	2016-06-21 01:42:47 +00:00
George Burgess IV	87b2e41416	[CFLAA] Add interprocedural function summaries. This patch adds function summaries, so that we don't need to recompute various properties about function parameters/return values at each callsite of a function. It also adds many interprocedural tests for CFLAA. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21475#inline-182390 llvm-svn: 273219	2016-06-20 23:10:56 +00:00
Simon Pilgrim	356e823b51	[X86][SSE] Add cost model for BSWAP of vectors The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217	2016-06-20 23:08:21 +00:00
Sanjoy Das	e8fd9561cb	[SCEV] Fix incorrect trip count computation The way we elide max expressions when computing trip counts is incorrect -- it breaks cases like this: ``` static int wrapping_add(int a, int b) { return (int)((unsigned)a + (unsigned)b); } void test() { volatile int end_buf = 2147483548; // INT_MIN - 100 int end = end_buf; unsigned counter = 0; for (int start = wrapping_add(end, 200); start < end; start++) counter++; print(counter); } ``` Note: the `NoWrap` variable that was being tested has little to do with the values flowing into the max expression; it is a property of the induction variable. test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test functionality I'm reverting in this change, so I've deleted the test fully. llvm-svn: 273079	2016-06-18 04:38:31 +00:00
George Burgess IV	24eb0daf7c	[CFLAA] Tag arguments as escaped instead of unknown. This patch also includes some refactoring. Prior to this patch, we tagged all CFLAA attributes as unknown. This is suboptimal, since it meant that any Value used as an argument would be considered to alias any other Value that existed. Now that we have the machinery to tag sets below the set for an arbitrary value with attributes, it's okay to be less conservative with arguments. (Specifically, we still tag the set under an argument with unknown). Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21262 llvm-svn: 272690	2016-06-14 18:12:28 +00:00
Simon Pilgrim	3fc09f7be6	[CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets To account for the fast PSHUFB implementation now available llvm-svn: 272484	2016-06-11 19:23:02 +00:00
Michael Kuperstein	9a0542a792	[X86] Add costs for SSE zext/sext to v4i64 to TTI The costs are somewhat hand-wavy, but should be much closer to the truth than what we get from BasicTTI. Differential Revision: http://reviews.llvm.org/D21156 llvm-svn: 272406	2016-06-10 17:01:05 +00:00
George Burgess IV	652ec4f595	[CFLAA] Handle global/arg attrs more sanely. Prior to this patch, we used argument/global stratified attributes in order to note that a value could have come from either dereferencing a global/arg, or from the assignment from a global/arg. Now, AttrUnknown is placed on sets when we see a dereference, instead of the global/arg attributes. This allows us to be more aggressive in the future when we see global/arg attributes without AttrUnknown. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21110 llvm-svn: 272335	2016-06-09 23:15:04 +00:00
Sanjoy Das	c7f69b921f	Be wary of abnormal exits from loop when exploiting UB We can safely rely on a NoWrap add recurrence causing UB down the road only if we know the loop does not have a exit expressed in a way that is opaque to ScalarEvolution (e.g. by a function call that conditionally calls exit(0)). I believe with this change PR28012 is fixed. Note: I had to change some llvm-lit tests in LoopReroll, since it looks like they were depending on this incorrect behavior. llvm-svn: 272237	2016-06-09 01:13:59 +00:00
Sanjoy Das	8598412e24	[SCEV] Track no-abnormal-exits instead of no-throw calls Absence of may-unwind calls is not enough to guarantee that a UB-generating use of an add-rec poison in the loop latch will actually cause UB. We also need to guard against calls that terminate the thread or infinite loop themselves. This partially addresses PR28012. llvm-svn: 272181	2016-06-08 17:48:42 +00:00
Sanjoy Das	9a65cd214d	Teach isGuarantdToTransferExecToSuccessor about debug info intrinsics Calls to `@llvm.dbg.*` can be assumed to terminate. llvm-svn: 272180	2016-06-08 17:48:36 +00:00
Sanjoy Das	a19edc4d15	Fix a bug in SCEV's poison value propagation The worklist algorithm introduced in rL271151 didn't check to see if the direct users of the post-inc add recurrence propagates poison. This change fixes the problem and makes the code structure more obvious. Note for release managers: correctness wise, this bug wasn't a regression introduced by rL271151 -- the behavior of SCEV around post-inc add recurrences was strictly improved (in terms of correctness) in rL271151. llvm-svn: 272179	2016-06-08 17:48:31 +00:00
George Burgess IV	a1f9a2daeb	[CFLAA] Add AttrEscaped, remove bit twiddling functions. This patch does a few things: - Unifies AttrAll and AttrUnknown (since they were used for more or less the same purpose anyway). - Introduces AttrEscaped, an attribute that notes that a value escapes our analysis for a given set, but not that an unknown value flows into said set. - Removes functions that take bit indices, since we also had functions that took bitsets, and the use of both (with similar names) was unclear and bug-prone. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21000 llvm-svn: 272040	2016-06-07 18:35:37 +00:00
Andrey Turetskiy	9f02c58670	[LAA] Improve non-wrapping pointer detection by handling loop-invariant case. This fixes PR26314. This patch adds new helper “isNoWrap” with detection of loop-invariant pointer case. Patch by Roman Shirokiy. Ref: https://llvm.org/bugs/show_bug.cgi?id=26314 Differential Revision: http://reviews.llvm.org/D17268 llvm-svn: 272014	2016-06-07 14:55:27 +00:00
Easwaran Raman	019e0bf592	Reapply r271728 after adding move cobstructor for ProfileSummaryInfo llvm-svn: 271745	2016-06-03 22:54:26 +00:00
Easwaran Raman	94edaaaefb	Revert r271728 as it breaks Windows build llvm-svn: 271738	2016-06-03 21:14:26 +00:00
Easwaran Raman	d142050f3a	Analysis pass to access profile summary info Differential Revision: http://reviews.llvm.org/D20648 llvm-svn: 271728	2016-06-03 20:37:19 +00:00
Daniel Berlin	73694bb92b	Revert "Claim NoAlias if two GEPs index different fields of the same struct" This reverts commit 2d5d6493f43eb68493a3852b8c226ac9fafdc7eb. llvm-svn: 271422	2016-06-01 18:55:32 +00:00
George Burgess IV	18b83fe6cf	[CFLAA] Recognize builtin allocation functions. This patch extends CFLAA to recognize allocation functions such as malloc, free, etc, so we can treat them more aggressively. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D20776 llvm-svn: 271421	2016-06-01 18:39:54 +00:00
Daniel Berlin	e846c9dc52	Claim NoAlias if two GEPs index different fields of the same struct Patch by Taewook Oh Summary: Patch for Bug 27478. Make BasicAliasAnalysis claims NoAlias if two GEPs index different fields of the same structure. Reviewers: hfinkel, dberlin Subscribers: dberlin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20665 llvm-svn: 271415	2016-06-01 18:12:01 +00:00
Sanjoy Das	10df497a1f	Reduce dependence on pointee types when deducing dereferenceability Summary: Change some of the internal interfaces in Loads.cpp to keep track of the number of bytes we're trying to prove dereferenceable using an explicit `Size` parameter. Before this, the `Size` parameter was implicitly inferred from the pointee type of the pointer whose dereferenceability we were trying to prove, causing us to be conservative around bitcasts. This was unfortunate since bitcast instructions are no-ops and should never break optimizations. With an explicit `Size` parameter, we're more precise (as shown in the test cases), and the code is simpler. We should eventually move towards a `DerefQuery` struct that groups together a base pointer, an offset, a size and an alignment; but this patch is a first step. Reviewers: apilipenko, dblaikie, hfinkel, reames Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20764 llvm-svn: 271406	2016-06-01 16:47:45 +00:00
George Burgess IV	a880146925	[CFLAA] Don't link GEP pointers to GEP indices. Code like the following is considered broken, and doesn't need to be supported by our AA magicks: void getFoo(int P) { int PAlias = (int )((char )NULL + (uintptr_t)P); } This patch makes CFLAA drop support for code like this. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D20775 llvm-svn: 271322	2016-05-31 19:55:05 +00:00
Sanjoy Das	f49ca52b9d	[SCEV] See through op.with.overflow intrinsics (re-apply) Summary: This change teaches SCEV to see reduce `(extractvalue 0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if possible). This was first checked in at r265912 but reverted in r265950 because it exposed some issues around how SCEV handled post-inc add recurrences. Those issues have now been fixed. Reviewers: atrick, regehr Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18684 llvm-svn: 271152	2016-05-29 00:34:42 +00:00
Sanjoy Das	7e4a64167d	[SCEV] Don't always add no-wrap flags to post-inc add recs Fixes PR27315. The post-inc version of an add recurrence needs to "follow the same rules" as a normal add or subtract expression. Otherwise we miscompile programs like ``` int main() { int a = 0; unsigned a_u = 0; volatile long last_value; do { a_u += 3; last_value = (long) ((int) a_u); if (will_add_overflow(a, 3)) { // Leave, and don't actually do the increment, so no UB. printf("last_value = %ld\n", last_value); exit(0); } a += 3; } while (a != 46); return 0; } ``` This patch changes SCEV to put no-wrap flags on post-inc add recurrences only when the poison from a potential overflow will go ahead to cause undefined behavior. To avoid regressing performance too much, I've assumed infinite loops without side effects is undefined behavior to prove poison<->UB equivalence in more cases. This isn't ideal, but is not new to LLVM as a whole, and far better than the situation I'm trying to fix. llvm-svn: 271151	2016-05-29 00:32:17 +00:00
Sanjoy Das	70c2bbd29c	[ValueTracking] ICmp instructions propagate poison This is a stripped down version of D19211, leaving out the questionable "branching in poison is UB" bit. llvm-svn: 271150	2016-05-29 00:31:18 +00:00
Michael Kuperstein	ae21491819	[BasicAA] Extend inbound GEP negative offset logic to GlobalVariables r270777 improved the precision of alloca vs. inbounbds GEP alias queries: if we have (a) an inbounds GEP and (b) a pointer based on an alloca, and the beginning of the object the GEP points to would have a negative offset with respect to the alloca, then the GEP can not alias pointer (b). This makes the same logic fire when (b) is based on a GlobalVariable instead of an alloca. Differential Revision: http://reviews.llvm.org/D20652 llvm-svn: 270893	2016-05-26 19:30:49 +00:00
Michael Kuperstein	82069c44ca	[BasicAA] Improve precision of alloca vs. inbounds GEP alias queries If a we have (a) a GEP and (b) a pointer based on an alloca, and the beginning of the object the GEP points would have a negative offset with repsect to the alloca, then the GEP can not alias pointer (b). For example, consider code like: struct { int f0, int f1, ...} foo; ... foo alloca; foo random = bar(alloca); int f0 = &alloca.f0 int f1 = &random->f1; Which is lowered, approximately, to: %alloca = alloca %struct.foo %random = call %struct.foo @random(%struct.foo* %alloca) %f0 = getelementptr inbounds %struct, %struct.foo* %alloca, i32 0, i32 0 %f1 = getelementptr inbounds %struct, %struct.foo* %random, i32 0, i32 1 Assume %f1 and %f0 alias. Then %f1 would point into the object allocated by %alloca. Since the %f1 GEP is inbounds, that means %random must also point into the same object. But since %f0 points to the beginning of %alloca, the highest %f1 can be is (%alloca + 3). This means %random can not be higher than (%alloca - 1), and so is not inbounds, a contradiction. Differential Revision: http://reviews.llvm.org/D20495 llvm-svn: 270777	2016-05-25 22:23:08 +00:00
Oleg Ranevskyy	eb4eccae5c	[SCEV] No-wrap flags are not propagated when folding "{S,+,X}+T ==> {S+T,+,X}" Summary: Description This makes `WidenIV::widenIVUse` (IndVarSimplify.cpp) fail to widen narrow IV uses in some cases. The latter affects IndVarSimplify which may not eliminate narrow IV's when there actually exists such a possibility, thereby producing ineffective code. When `WidenIV::widenIVUse` gets a NarrowUse such as `{(-2 + %inc.lcssa),+,1}<nsw><%for.body3>`, it first tries to get a wide recurrence for it via the `getWideRecurrence` call. `getWideRecurrence` returns recurrence like this: `{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<nsw><%for.body3>`. Then a wide use operation is generated by `cloneIVUser`. The generated wide use is evaluated to `{(-2 + (sext i32 %inc.lcssa to i64))<nsw>,+,1}<nsw><%for.body3>`, which is different from the `getWideRecurrence` result. `cloneIVUser` sees the difference and returns nullptr. This patch also fixes the broken LLVM tests by adding missing <nsw> entries introduced by the correction. Minimal reproducer: ``` int foo(int a, int b, int c); int baz(); void bar() { int arr[20]; int i = 0; for (i = 0; i < 4; ++i) arr[i] = baz(); for (; i < 20; ++i) arr[i] = foo(arr[i - 4], arr[i - 3], arr[i - 2]); } ``` Clang command line: ``` clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf test.cpp -o test.ir ``` Expected result: The ` -mllvm -debug` log shows that all the IV's for the second `for` loop have been eliminated. Reviewers: sanjoy Subscribers: atrick, asl, aemerson, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D20058 llvm-svn: 270695	2016-05-25 13:01:33 +00:00
Simon Pilgrim	14000b3cea	[CostModel][X86][XOP] Added XOP costmodel for BITREVERSE Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well. llvm-svn: 270537	2016-05-24 08:17:50 +00:00
Matthew Simpson	6feebe9847	[LAA] Check independence of strided accesses before forward case This patch changes the order in which we attempt to prove the independence of strided accesses. We previously did this after we knew the dependence distance was positive. With this change, we check for independence before handling the negative distance case. The patch prevents LAA from reporting forward dependences for independent strided accesses. This change was requested in the review of D19984. llvm-svn: 270072	2016-05-19 15:37:19 +00:00
Sanjoy Das	f5d40d5350	[SCEV] Be more aggressive in proving NUW ... for AddRec's in loops for which SCEV is unable to compute a max tripcount. This is the NUW variant of r269211 and fixes PR27691. (Note: PR27691 is not a correct or stability bug, it was created to track a pending task). llvm-svn: 269790	2016-05-17 17:51:14 +00:00
Simon Pilgrim	2ea513847c	[CostModel][X86] Tidied up checks llvm-svn: 269770	2016-05-17 14:43:41 +00:00
Simon Pilgrim	6e9898f362	[CostModel][X86] Added scalar bitreverse tests llvm-svn: 269594	2016-05-15 17:40:48 +00:00
Adam Nemet	c62e554e9a	[LAA] Include MaxSafeDepDistBytes in the analysis print-out llvm-svn: 269508	2016-05-13 22:49:13 +00:00
Sanjoy Das	4e8c80382f	[SCEVExpander] Fix a failed cast<> assertion SCEVExpander::replaceCongruentIVs assumes the backedge value of an SCEV-analysable PHI to always be an instruction, when this is not necessarily true. For now address this by bailing out of the optimization if the backedge value of the PHI is a non-Instruction. llvm-svn: 269213	2016-05-11 17:41:41 +00:00
Sanjoy Das	abb7b93eb9	[SCEVExpander] Don't break SSA in replaceCongruentIVs `SCEVExpander::replaceCongruentIVs` bypasses `hoistIVInc` if both the original and the isomorphic increments are PHI nodes. Doing this can break SSA if the isomorphic increment is not dominated by the original increment. Get rid of the bypass, and let `hoistIVInc` do the right thing. Fixes PR27232 (compile time crash/hang). llvm-svn: 269212	2016-05-11 17:41:34 +00:00
Sanjoy Das	787c2460c2	[SCEV] Be more aggressive around proving no-wrap ... for AddRec's in loops for which SCEV is unable to compute a max tripcount. This is not a problem for "normal" loops[0] that don't have guards or assumes, but helps in cases where we have guards or assumes in the loop that can be used to constrain incoming values over the backedge. This partially fixes PR27691 (we still don't handle the NUW case). [0]: for "normal" loops, in the cases where we'd be able to prove no-wrap via isKnownPredicate, we'd also be able to compute a max tripcount. llvm-svn: 269211	2016-05-11 17:41:26 +00:00
Vedant Kumar	ee20294af5	[BasicAA] Compare GEP indices based on value (Fix PR27418) Equivalent GEP indices with different types are treated as different indices altogether, leading to an incorrect AA result. Fix the issue by comparing indices based on their values. Thanks to Mikael Holmén for reporting the issue! Differential Revision: http://reviews.llvm.org/D19935 llvm-svn: 269197	2016-05-11 15:45:43 +00:00
Sanjoy Das	d47f42435a	[BasicAA] Guard intrinsics don't write to memory Summary: The idea is very close to what we do for assume intrinsics: we mark the guard intrinsics as writing to arbitrary memory to maintain control dependence, but under the covers we teach AA that they do not mod any particular memory location. Reviewers: chandlerc, hfinkel, gbiv, reames Subscribers: george.burgess.iv, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19575 llvm-svn: 269007	2016-05-10 02:35:41 +00:00
Sanjoy Das	2512d0c837	[SCEV] Use guards to prove predicates We can use calls to @llvm.experimental.guard to prove predicates, relying on the fact that in all locations domianted by a call to @llvm.experimental.guard the predicate it is guarding is known to be true. llvm-svn: 268997	2016-05-10 00:31:49 +00:00
Simon Pilgrim	eec3a95f95	[X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targets As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton. This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42. Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each. Differential Revision: http://reviews.llvm.org/D20057 llvm-svn: 268972	2016-05-09 21:14:38 +00:00
Matt Arsenault	1af53a91c0	DivergenceAnalysis: Fix crash with no return blocks The post dominator tree does not have a root node in this case. llvm-svn: 268933	2016-05-09 16:57:08 +00:00
Simon Pilgrim	4a9d32c5ba	[CostModel][X86] Extended comparison instruction cost model tests to include SSE2/SSE3/SSSE3/SSE41/SSE42 targets llvm-svn: 268877	2016-05-08 15:24:53 +00:00
Simon Pilgrim	420852e8d4	[CostModel][X86] Split BSWAP/BITREVERSE cost tests from CTPOP/CTLZ/CTTZ 'bit count' cost tests llvm-svn: 268859	2016-05-07 16:34:16 +00:00
Simon Pilgrim	b3f5cb7a65	[CostModel][X86] Tweak 'SSE2-only' test CPU as it was only disabling SSE41 not SSE3/SSSE3 etc. llvm-svn: 268763	2016-05-06 17:50:07 +00:00
Simon Pilgrim	93d9b96bdb	[CostModel][X86] Added ctlz/cttz undef-zero costmodel tests llvm-svn: 268761	2016-05-06 17:48:35 +00:00
Simon Pilgrim	5122c64fd8	[CostModel][X86] Added costmodel tests for vector ctpop/ctlz/cttz/bitreverse/bswap llvm-svn: 268738	2016-05-06 14:38:14 +00:00
Xinliang David Li	28a932742c	[PM] port Branch Frequency Analaysis pass to new PM llvm-svn: 268687	2016-05-05 21:13:27 +00:00
Xinliang David Li	6e5dd41481	[PM] Port Branch Probability Analysis pass to the new pass manager. Differential Revision: http://reviews.llvm.org/D19839 llvm-svn: 268601	2016-05-05 02:59:57 +00:00
Sanjoy Das	013a4ac4aa	[SCEV] Tweak the output format and content of -analyze In the "LoopDispositions:" section: - Instead of printing out a list, print out a "dictionary" to make it obvious by inspection which disposition is for which loop. This is just a cosmetic change. - Print dispositions for parent _and_ sibling loops. I will use this to write a test case. llvm-svn: 268405	2016-05-03 17:49:57 +00:00
Nicolai Haehnle	119d3d80cb	AMDGPU: llvm.SI.fs.constant is a source of divergence Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 llvm-svn: 268259	2016-05-02 17:37:01 +00:00
Sanjoy Das	f2f00fb11a	[SCEV] When printing via -analysis, dump loop disposition There are currently some bugs in tree around SCEV caching an incorrect loop disposition. Printing out loop dispositions will let us write whitebox tests as those are fixed. The dispositions are printed as a list in "inside out" order, i.e. innermost loop first. llvm-svn: 268177	2016-05-01 04:51:05 +00:00
Matt Arsenault	790eb1c490	DivergenceAnalysis: Fix crash with unreachable blocks Unreachable blocks may not be in the dominator tree, so don't crash on them. llvm-svn: 268001	2016-04-29 06:17:47 +00:00
Artur Pilipenko	c97eac6555	Use DL preferred alignment for alloca in Value::getPointerAlignment Teach Value::getPointerAlignment that allocas with no explicit alignment are aligned to preferred alignment of the allocated type. Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D17569 llvm-svn: 267689	2016-04-27 10:42:29 +00:00
Silviu Baranga	795c629ec9	[SCEV] Improve the run-time checking of the NoWrap predicate Summary: This implements a new method of run-time checking the NoWrap SCEV predicates, which should be easier to optimize and nicer for targets that don't correctly handle multiplication/addition of large integer types (like i128). If the AddRec is {a,+,b} and the backedge taken count is c, the idea is to check that \|b\| * c doesn't have unsigned overflow, and depending on the sign of b, that: a + \|b\| * c >= a (b >= 0) or a - \|b\| * c <= a (b <= 0) where the comparisons above are signed or unsigned, depending on the flag that we're checking. The advantage of doing this is that we avoid extending to a larger type and we avoid the multiplication of large types (multiplying i128 can be expensive). Reviewers: sanjoy Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D19266 llvm-svn: 267389	2016-04-25 09:27:16 +00:00
Sanjoy Das	a6155b659a	Have isKnownNotFullPoison be smarter around control flow Summary: (... while still not using a PostDomTree) The way we use isKnownNotFullPoison from SCEV today, the new CFG walking logic will not trigger for any realistic cases -- it will kick in only for situations where we could have merged the contiguous basic blocks anyway[0], since the poison generating instruction dominates all of its non-PHI uses (which are the only uses we consider right now). However, having this change in place will allow a later bugfix to break fewer llvm-lit tests. [0]: i.e. cases where block A branches to block B and B is A's only successor and A is B's only predecessor. Reviewers: broune, bjarke.roune Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19212 llvm-svn: 267175	2016-04-22 17:41:06 +00:00
Ashutosh Nema	468558a061	[X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode. Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim llvm-svn: 267123	2016-04-22 08:34:05 +00:00
Michael Kuperstein	de16b44f74	Port DemandedBits to the new pass manager. Differential Revision: http://reviews.llvm.org/D18679 llvm-svn: 266699	2016-04-18 23:55:01 +00:00
Sanjoy Das	432c1c3fb3	[BPI] Consider deoptimize calls as "unreachable" Summary: Calls to @llvm.experimental.deoptimize are expected to "never execute", so optimize them as such. Reviewers: chandlerc Subscribers: junbuml, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19095 llvm-svn: 266654	2016-04-18 19:01:28 +00:00
Nicolai Haehnle	13d90f324c	[DivergenceAnalysis] Treat PHI with incoming undef as constant Summary: If a PHI has an incoming undef, we can pretend that it is equal to one non-undef, non-self incoming value. This is particularly relevant in combination with the StructurizeCFG pass, which introduces PHI nodes with undefs. Previously, this lead to branch conditions that were uniform before StructurizeCFG to become non-uniform afterwards, which confused the SIAnnotateControlFlow pass. This fixes a crash when Mesa radeonsi compiles a shader from dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex Reviewers: arsenm, tstellarAMD, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19013 llvm-svn: 266347	2016-04-14 17:42:47 +00:00
Silviu Baranga	b77365b595	[SCEV][LAA] Add tests for SCEV expression transformations performed during LAA Summary: Add a print method to Predicated Scalar Evolution which prints all interesting transformations done by PSE. Loop Access Analysis will now print this as part of the analysis output. We now use this to check the exact expression transformations that were done by PSE in LAA. The additional checking also acts as white-box testing for the getAsAddRec method. Reviewers: anemet, sanjoy Subscribers: sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18792 llvm-svn: 266334	2016-04-14 16:08:45 +00:00
Adam Nemet	7aab648831	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282	2016-04-14 08:47:17 +00:00
Matt Arsenault	b34eea9cb5	AMDGPU: Remove leftover ShaderType attributes in tests llvm-svn: 266155	2016-04-13 00:39:48 +00:00
Artur Pilipenko	dbe0bc8df4	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 266086	2016-04-12 15:58:04 +00:00
Sanjoy Das	f9d88e650b	This reverts commit r265913 and r265912 See PR27315 r265913: "[IndVars] Eliminate op.with.overflow when possible" r265912: "[SCEV] See through op.with.overflow intrinsics" llvm-svn: 265950	2016-04-11 15:26:18 +00:00
Sanjoy Das	3c529a40ca	[SCEV] See through op.with.overflow intrinsics Summary: This change teaches SCEV to see reduce `(extractvalue 0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if possible). Reviewers: atrick, regehr Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18684 llvm-svn: 265912	2016-04-10 22:50:26 +00:00
Silviu Baranga	6f444dfd55	Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV This re-commits r265535 which was reverted in r265541 because it broke the windows bots. The problem was that we had a PointerIntPair which took a pointer to a struct allocated with new. The problem was that new doesn't provide sufficient alignment guarantees. This pattern was already present before r265535 and it just happened to work. To fix this, we now separate the PointerToIntPair from the ExitNotTakenInfo struct into a pointer and a bool. Original commit message: Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265786	2016-04-08 14:29:09 +00:00
Sanjoy Das	5ce3272833	Don't IPO over functions that can be de-refined Summary: Fixes PR26774. If you're aware of the issue, feel free to skip the "Motivation" section and jump directly to "This patch". Motivation: I define "refinement" as discarding behaviors from a program that the optimizer has license to discard. So transforming: ``` void f(unsigned x) { unsigned t = 5 / x; (void)t; } ``` to ``` void f(unsigned x) { } ``` is refinement, since the behavior went from "if x == 0 then undefined else nothing" to "nothing" (the optimizer has license to discard undefined behavior). Refinement is a fundamental aspect of many mid-level optimizations done by LLVM. For instance, transforming `x == (x + 1)` to `false` also involves refinement since the expression's value went from "if x is `undef` then { `true` or `false` } else { `false` }" to "`false`" (by definition, the optimizer has license to fold `undef` to any non-`undef` value). Unfortunately, refinement implies that the optimizer cannot assume that the implementation of a function it can see has all of the behavior an unoptimized or a differently optimized version of the same function can have. This is a problem for functions with comdat linkage, where a function can be replaced by an unoptimized or a differently optimized version of the same source level function. For instance, FunctionAttrs cannot assume a comdat function is actually `readnone` even if it does not have any loads or stores in it; since there may have been loads and stores in the "original function" that were refined out in the currently visible variant, and at the link step the linker may in fact choose an implementation with a load or a store. As an example, consider a function that does two atomic loads from the same memory location, and writes to memory only if the two values are not equal. The optimizer is allowed to refine this function by first CSE'ing the two loads, and the folding the comparision to always report that the two values are equal. Such a refined variant will look like it is `readonly`. However, the unoptimized version of the function can still write to memory (since the two loads //can// result in different values), and selecting the unoptimized version at link time will retroactively invalidate transforms we may have done under the assumption that the function does not write to memory. Note: this is not just a problem with atomics or with linking differently optimized object files. See PR26774 for more realistic examples that involved neither. This patch: This change introduces a new set of linkage types, predicated as `GlobalValue::mayBeDerefined` that returns true if the linkage type allows a function to be replaced by a differently optimized variant at link time. It then changes a set of IPO passes to bail out if they see such a function. Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18634 llvm-svn: 265762	2016-04-08 00:48:30 +00:00
Nicolai Haehnle	df3a20cd80	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Silviu Baranga	a393baf1fd	Revert r265535 until we know how we can fix the bots llvm-svn: 265541	2016-04-06 14:06:32 +00:00
Silviu Baranga	72b4a4a330	[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535	2016-04-06 13:18:26 +00:00
George Burgess IV	7e5404cc20	[CFLAA] Fix PR27213; incorrect tagging of args/globals Prior to this patch, CFLAA wouldn't tag arguments/globals properly if it didn't find any "interesting" edges on them. This means that, if all you do is store constants to a global or argument, we would never actually treat it as a global/argument. Test case: define void @foo(i32* %A, i32* %B) #0 { entry: store i32 0, i32* %A, align 4 store i32 0, i32* %B, align 4 ret void } CFLAA would say that %A can't alias %B, because neither pointer was used in an interesting way. This patch makes us note whether something is an argument, global, ... regardless of how interesting CFLAA thinks its uses are. (For the record, using a value in an interesting way means loading from it, using it in a GEP, ...) llvm-svn: 265474	2016-04-05 21:40:45 +00:00
Brendon Cahoon	86f783e315	[DependenceAnalysis] Check if result of getConstantPart is null A seg-fault occurs due to a reference of a null pointer, which is the value returned by getConstantPart. This function returns null if the constant part is not found. The code that calls this function needs to check for the null return value. Differential Revision: http://reviews.llvm.org/D18718 llvm-svn: 265319	2016-04-04 18:13:18 +00:00
Benjamin Kramer	cad9a8a6bb	[TTI] Let the cost model estimate ctpop costs based on legality PPC has a vector popcount, this lets the vectorizer use the correct cost for it. Tweak X86 test to use an intrinsic that's actually scalarized (we have a somewhat efficient lowering for vector popcount using SSE, the cost model finds that now). llvm-svn: 265005	2016-03-31 10:42:40 +00:00
Matt Arsenault	8c8fcb2585	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376	2016-03-25 01:16:40 +00:00
Matt Arsenault	9651813ee0	AMDGPU: Partially implement getArithmeticInstrCost for FP ops llvm-svn: 264374	2016-03-25 01:00:32 +00:00
Matt Arsenault	51d702812d	TTI: Report 0 cost for free addrspacecasts llvm-svn: 264369	2016-03-25 00:26:29 +00:00
Matt Arsenault	8e9aa0acc8	TTI: Use 0 for cost of fabs if free Ideally this would also happen for fneg, but that isn't a distinct operation in the IR. llvm-svn: 264368	2016-03-25 00:26:22 +00:00
Matt Arsenault	59767cea79	AMDGPU: TTI: Make insertelement free. We don't want to have a cost to scalarizing operations. llvm-svn: 264364	2016-03-25 00:14:11 +00:00
Adam Nemet	279784ffc4	[LAA] Support memchecks involving loop-invariant addresses We used to only allow SCEVAddRecExpr for pointer expressions in order to be able to compute the bounds. However this is also trivially possible for loop-invariant addresses (scUnknown) since then the bounds are the address itself. Interestingly, we used allow this for the special case when the loop-invariant address happens to also be an SCEVAddRecExpr (in an outer loop). There are a couple more loops that are vectorized in SPEC after this. My guess is that the main reason we don't see more because for example a loop-invariant load is vectorized into a splat vector with several vector-inserts. This is likely to make the vectorization unprofitable. I.e. we don't notice that a later LICM will move all of this out of the loop so the cost estimate should really be 0. llvm-svn: 264243	2016-03-24 04:28:47 +00:00
Matthias Braun	68bb2931cc	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 llvm-svn: 264088	2016-03-22 20:24:34 +00:00
Nicolai Haehnle	ad63638f6d	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	79cad857a0	AMDGPU: mark atomic instructions as sources of divergence Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719	2016-03-17 16:21:59 +00:00
Nicolai Haehnle	74127fe8d7	AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440	2016-03-14 15:37:18 +00:00
Chandler Carruth	45a9c203a0	[PM/AA] Teach the AAManager how to handle module analyses in addition to function analyses, and use it to wire up globals-aa to the new pass manager. llvm-svn: 263211	2016-03-11 09:15:11 +00:00
Artur Pilipenko	3c8fc57e16	Support arbitrary addrspace pointers in masked load/store intrinsics This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158	2016-03-10 20:39:22 +00:00
Chandler Carruth	4c660f7087	[CG] Add a new pass manager printer pass for the old call graph and actually finish wiring up the old call graph. There were bugs in the old call graph that hadn't been caught because it wasn't being tested. It wasn't being tested because it wasn't in the pipeline system and we didn't have a printing pass to run in tests. This fixes all of that. As for why I'm still keeping the old call graph alive its so that I can port GlobalsAA to the new pass manager with out forking it to work with the lazy call graph. That's clearly the right eventual design, but it seems pragmatic to defer that until its necessary. The old call graph works just fine for GlobalsAA. llvm-svn: 263104	2016-03-10 11:24:11 +00:00
Chandler Carruth	b95def7491	[LCG] Spell the printing pass pipeline name for the lazy call graph 'lcg' instead of just 'cg'. This makes it consistent with the analysis name of 'lcg'. No functionality changed. llvm-svn: 263103	2016-03-10 11:24:06 +00:00
Matthias Braun	c31032d607	InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2 As part of r251146 InstCombine was extended to call computeKnownBits on every value in the function to determine whether it happens to be constant. This increases typical compiletime by 1-3% (5% in irgen+opt time) in my measurements. On the other hand this case did not trigger once in the whole llvm-testsuite. This patch introduces the notion of ExpensiveCombines which are only enabled for OptLevel > 2. I removed the check in InstructionSimplify as that is called from various places where the OptLevel is not known but given the rarity of the situation I think a check in InstCombine is enough. Differential Revision: http://reviews.llvm.org/D16835 llvm-svn: 263047	2016-03-09 18:47:11 +00:00
Sanjoy Das	84216672da	Remove trailing newline from test case; NFC llvm-svn: 262980	2016-03-09 01:51:44 +00:00
Sanjoy Das	97d19bd95f	[SCEV] Slightly generalize getRangeViaFactoring Building on the previous change, this generalizes ScalarEvolution::getRangeViaFactoring to work with {Ext(C?A:B)+k0,+,Ext(C?A:B)+k1} where Ext can be a zero extend, sign extend or truncate operation, and k0 and k1 are constants. llvm-svn: 262979	2016-03-09 01:51:02 +00:00
Sanjoy Das	d3488c6060	[SCEV] Slightly generalize getRangeViaFactoring This change generalizes ScalarEvolution::getRangeViaFactoring to work with {Ext(C?A:B),+,Ext(C?A:B)} where Ext can be a zero extend, sign extend or truncate operation. llvm-svn: 262978	2016-03-09 01:50:57 +00:00
Adam Nemet	4896c7a82a	[ScopedNoAliasAA] Make test basic.ll less confusing Summary: This testcase had me confused. It made me believe that you can use alias scopes and alias scopes list interchangeably with alias.scope and noalias. Both langref and the other testcase use scope lists so I went looking. Turns out using scope directly only happens to work by chance. When ScopedNoAliasAAResult::mayAliasInScopes traverses this as a scope list: !1 = !{!1, !0, !"some scope"} , the first entry is in fact a scope but only because the scope is happened to be defined self-referentially to make it unique globally. The remaining elements in the tuple (!0, !"some scope") are considered as scopes but AliasScopeNode::getDomain will just bail on those without any error. This change avoids this ambiguity in the test but I've also been wondering if we should issue some sort of a diagnostics. Reviewers: dexonsmith, hfinkel Subscribers: mssimpso, llvm-commits Differential Revision: http://reviews.llvm.org/D16670 llvm-svn: 262841	2016-03-07 17:49:10 +00:00
Philip Reames	146307eb52	[ValueTracking] Remove dead code from an old experiment This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment. Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree. For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach. llvm-svn: 262646	2016-03-03 19:44:06 +00:00
Sanjoy Das	724f5cf278	[SCEV] Prove no-overflow via constant ranges Exploit ScalarEvolution::getRange's newly acquired smartness (since r262438) by using that to infer nsw and nuw when possible. llvm-svn: 262639	2016-03-03 18:31:29 +00:00
Sanjoy Das	11ef606f1d	[SCEV] Be less eager about demoting zexts to sexts After r262438 we can have provably positive NSW SCEV expressions whose zero extensions cannot be simplified (since r262438 makes SCEV better at computing constant ranges). This means demoting sexts of positive add recurrences eagerly can result in an unsimplified zero extension where we could have had a simplified sign extension. This change fixes the issue by teaching SCEV to demote sext of a positive SCEV expression to a zext only if the sext could not be simplified. llvm-svn: 262638	2016-03-03 18:31:23 +00:00
Sanjoy Das	bf73098472	[SCEV] Make getRange smarter around selects Have ScalarEvolution::getRange re-consider cases like "{C?A:B,+,C?P:Q}" by factoring out "C" and computing RangeOf{A,+,P} union RangeOf({B,+,Q}) instead. The latter can be easier to compute precisely in cases like "{C?0:N,+,C?1:-1}" N is the backedge taken count of the loop; since in such cases the latter form simplifies to [0,N+1) union [0,N+1). llvm-svn: 262438	2016-03-02 00:57:54 +00:00
Hongbin Zheng	b8bb0d8813	Another fix the testcase introduced by r261903 - Add the missing matches llvm-svn: 261971	2016-02-26 03:41:47 +00:00
Hongbin Zheng	8c70ab75a0	Use regex in testcase, do not fail windows bots llvm-svn: 261922	2016-02-25 19:16:40 +00:00
Hongbin Zheng	bc53977a0d	Introduce RegionInfoAnalysis, which compute Region Tree in the new PassManager. NFC Differential Revision: http://reviews.llvm.org/D17571 llvm-svn: 261904	2016-02-25 17:54:25 +00:00
Hongbin Zheng	751337faa7	Introduce DominanceFrontierAnalysis to the new PassManager to compute DominanceFrontier. NFC Differential Revision: http://reviews.llvm.org/D17570 llvm-svn: 261903	2016-02-25 17:54:15 +00:00
Hongbin Zheng	3f97840721	Introduce analysis pass to compute PostDominators in the new pass manager. NFC Differential Revision: http://reviews.llvm.org/D17537 llvm-svn: 261902	2016-02-25 17:54:07 +00:00
Hongbin Zheng	66b19fbc4e	Revert "Introduce analysis pass to compute PostDominators in the new pass manager. NFC" This reverts commit a3e5cc6a51ab5ad88d1760c63284294a4e34c018. llvm-svn: 261891	2016-02-25 16:45:53 +00:00
Hongbin Zheng	ad782ce3f7	Revert "Introduce DominanceFrontierAnalysis to the new PassManager to compute DominanceFrontier. NFC" This reverts commit 109c38b2226a87b0be73fa7a0a8c1a81df20aeb2. llvm-svn: 261890	2016-02-25 16:45:46 +00:00
Hongbin Zheng	921fabf34b	Revert "Introduce RegionInfoAnalysis, which compute Region Tree in the new PassManager. NFC" This reverts commit 8228b4d374edeb4cc0c5fddf6e1ab876918ee126. llvm-svn: 261889	2016-02-25 16:45:37 +00:00
Hongbin Zheng	2fa386fd6c	Introduce RegionInfoAnalysis, which compute Region Tree in the new PassManager. NFC Differential Revision: http://reviews.llvm.org/D17571 llvm-svn: 261884	2016-02-25 16:33:26 +00:00
Hongbin Zheng	237197ba63	Introduce DominanceFrontierAnalysis to the new PassManager to compute DominanceFrontier. NFC Differential Revision: http://reviews.llvm.org/D17570 llvm-svn: 261883	2016-02-25 16:33:15 +00:00
Hongbin Zheng	a0273a04f5	Introduce analysis pass to compute PostDominators in the new pass manager. NFC Differential Revision: http://reviews.llvm.org/D17537 llvm-svn: 261882	2016-02-25 16:33:06 +00:00
Chandler Carruth	c1dc384b54	[PM/AA] Wire up TBAA to the new pass manager's registry and test it. llvm-svn: 261411	2016-02-20 04:04:52 +00:00
Chandler Carruth	d6091a0344	[PM/AA] Wire up the scoped-no-alias AA to the new pass manager's registry and test it. llvm-svn: 261410	2016-02-20 04:03:06 +00:00
Chandler Carruth	2b3d0446f4	[PM/AA] Wire up SCEVAA to the new pass manager's registry and test it. llvm-svn: 261409	2016-02-20 04:01:45 +00:00
Chandler Carruth	342c671b66	[PM/AA] Wire up CFLAA to the new pass manager fully, and port one of its tests over to exercise this code. This uncovered a few missing bits here and there in the analysis, but nothing interesting. llvm-svn: 261404	2016-02-20 03:52:02 +00:00
Chandler Carruth	4f846a5f15	[PM/AA] Port alias analysis evaluator to the new pass manager, and use it to actually test the new pass manager AA wiring. This patch was extracted from the (somewhat too large) D12357 and rebosed on top of the slightly different design of the new pass manager AA wiring that I just landed. With this we can start testing the AA in a thorough way with the new pass manager. Some minor cleanups to the code in the pass was necessitated here, but otherwise it is a very minimal change. Differential Revision: http://reviews.llvm.org/D17372 llvm-svn: 261403	2016-02-20 03:46:03 +00:00
Matthew Simpson	921ad01a1d	[AArch64] Reduce vector insert/extract cost for Kryo Differential Revision: http://reviews.llvm.org/D17379 llvm-svn: 261237	2016-02-18 18:35:45 +00:00
Chandler Carruth	e5944d97d8	[LCG] Construct an actual call graph with call-edge SCCs nested inside reference-edge SCCs. This essentially builds a more normal call graph as a subgraph of the "reference graph" that was the old model. This allows both to exist and the different use cases to use the aspect which addresses their needs. Specifically, the pass manager and other ordering constrained logic can use the reference graph to achieve conservative order of visit, while analyses reasoning about attributes and other properties derived from reachability can reason about the direct call graph. Note that this isn't necessarily complete: it doesn't model edges to declarations or indirect calls. Those can be found by scanning the instructions of the function if desirable, and in fact every user currently does this in order to handle things like calls to instrinsics. If useful, we could consider caching this information in the call graph to save the instruction scans, but currently that doesn't seem to be important. An important realization for why the representation chosen here works is that the call graph is a formal subset of the reference graph and thus both can live within the same data structure. All SCCs of the call graph are necessarily contained within an SCC of the reference graph, etc. The design is to build 'RefSCC's to model SCCs of the reference graph, and then within them more literal SCCs for the call graph. The formation of actual call edge SCCs is not done lazily, unlike reference edge 'RefSCC's. Instead, once a reference SCC is formed, it directly builds the call SCCs within it and stores them in a post-order sequence. This is used to provide a consistent platform for mutation and update of the graph. The post-order also allows for very efficient updates in common cases by bounding the number of nodes (and thus edges) considered. There is considerable common code that I'm still looking for the best way to factor out between the various DFS implementations here. So far, my attempts have made the code harder to read and understand despite reducing the duplication, which seems a poor tradeoff. I've not given up on figuring out the right way to do this, but I wanted to wait until I at least had the system working and tested to continue attempting to factor it differently. This also requires introducing several new algorithms in order to handle all of the incremental update scenarios for the more complex structure involving two edge colorings. I've tried to comment the algorithms sufficiently to make it clear how this is expected to work, but they may still need more extensive documentation. I know that there are some changes which are not strictly necessarily coupled here. The process of developing this started out with a very focused set of changes for the new structure of the graph and algorithms, but subsequent changes to bring the APIs and code into consistent and understandable patterns also ended up touching on other aspects. There was no good way to separate these out without causing massive merge conflicts. Ultimately, to a large degree this is a rewrite of most of the core algorithms in the LCG class and so I don't think it really matters much. Many thanks to the careful review by Sanjoy Das! Differential Revision: http://reviews.llvm.org/D16802 llvm-svn: 261040	2016-02-17 00:18:16 +00:00
Chandler Carruth	632d208c78	[attrs] Move the norecurse deduction to operate on the node set rather than the SCC object, and have it scan the instruction stream directly rather than relying on call records. This makes the behavior of this routine consistent between libc routines and LLVM intrinsics for libc routines. We can go and start teaching it about those being norecurse, but we should behave the same for the intrinsic and the libc routine rather than differently. I chatted with James Molloy and the inconsistency doesn't seem intentional and likely is due to intrinsic calls not being modelled in the call graph analyses. This also fixes a bug where we would deduce norecurse on optnone functions, when generally we try to handle optnone functions as-if they were replaceable and thus unanalyzable. llvm-svn: 260813	2016-02-13 08:47:51 +00:00
Matt Arsenault	fe26def35c	AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis llvm-svn: 260491	2016-02-11 05:32:51 +00:00
Sanjoy Das	1c481f50d2	Add an "addUsedAAAnalyses" helper function Summary: Passes that call `getAnalysisIfAvailable<T>` also need to call `addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the legacy pass manager that it uses `T`. This contract was being violated by passes that used `createLegacyPMAAResults`. This change fixes this by exposing a helper in AliasAnalysis.h, `addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults and does the right thing when called from `getAnalysisUsage`. Reviewers: chandlerc Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17010 llvm-svn: 260183	2016-02-09 01:21:57 +00:00
Wei Mi	fc1cab305f	This patch is to fix PR26529 caused by r259736. IndVarSimplify assumes scAddRecExpr to be expanded in literal form instead of canonical form by calling disableCanonicalMode after it creates SCEVExpander. When CanonicalMode is disabled, SCEVExpander::expand should always return PHI node for scAddRecExpr. r259736 broke the assumption. The fix is to let SCEVExpander::expand skip the reuse Value logic if CanonicalMode is false. In addition, Besides IndVarSimplify, LSR pass also calls disableCanonicalMode before doing rewrite. We can remove the original check of LSRMode in reuse Value logic and use CanonicalMode instead. llvm-svn: 260174	2016-02-09 00:07:08 +00:00
Silviu Baranga	ea63a7f512	[SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory sanitizer issue. The PredicatedScalarEvolution's copy constructor wasn't copying the Generation value, and was leaving it un-initialized. Original commit message: [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260112	2016-02-08 17:02:45 +00:00
Silviu Baranga	41b4973329	Revert r260086 and r260085. They have broken the memory sanitizer bots. llvm-svn: 260087	2016-02-08 11:56:15 +00:00
Silviu Baranga	70a98bb9e8	[LoopVersioning] Don't assert when there are no memchecks We shouldn't assert when there are no memchecks, since we can have SCEV checks. There is already an assert covering the case where there are no SCEV checks or memchecks. This also changes the LAA pointer wrapping versioning test to use the loop versioning pass (this was how I managed to trigger the assert in the loop versioning pass). llvm-svn: 260086	2016-02-08 11:15:29 +00:00
Silviu Baranga	a35fadc7c4	[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260085	2016-02-08 10:45:50 +00:00
Wei Mi	a49559befb	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259736	2016-02-04 01:27:38 +00:00
Wei Mi	97de385868	Revert r259662, which caused regressions on polly tests. llvm-svn: 259675	2016-02-03 18:05:57 +00:00
Wei Mi	ed133978a0	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259662	2016-02-03 17:05:12 +00:00
James Molloy	6e518a3b50	[DemandedBits] Revert r249687 due to PR26071 This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken. From the writeup in PR26071: What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs. This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails. The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one. llvm-svn: 259649	2016-02-03 15:05:06 +00:00
Chandler Carruth	a4499e9f73	[LCG] Build an edge abstraction for the LazyCallGraph and use it to differentiate between indirect references to functions an direct calls. This doesn't do a whole lot yet other than change the print out produced by the analysis, but it lays the groundwork for a very major change I'm working on next: teaching the call graph to actually be a call graph, modeling both the indirect reference graph and the call graph simultaneously. More details on that in the next patch though. The rest of this is essentially a bunch of over-engineering that won't be interesting until the next patch. But this also isolates essentially all of the churn necessary to introduce the edge abstraction from the very important behavior change necessary in order to separately model the two graphs. So it should make review of the subsequent patch a bit easier at the cost of making this patch seem poorly motivated. ;] Differential Revision: http://reviews.llvm.org/D16038 llvm-svn: 259463	2016-02-02 03:57:13 +00:00
Gerolf Hoflehner	87ddb65fa6	[BasicAA] Fix for missing must alias (D16343) llvm-svn: 259299	2016-01-30 05:52:53 +00:00
James Molloy	121de0bcfa	[DemandedBits] Fix computation of demanded bits for ICmps The computation of ICmp demanded bits is independent of the individual operand being evaluated. We simply return a mask consisting of the minimum leading zeroes of both operands. We were incorrectly passing "I" to ComputeKnownBits - this should be "UserI->getOperand(0)". In cases where we were evaluating the 1th operand, we were taking the minimum leading zeroes of it and itself. This should fix PR26266. llvm-svn: 258690	2016-01-25 14:49:36 +00:00
James Molloy	c5eded5c1e	Revert "[ValueTracking] Understand more select patterns in ComputeKnownBits" This reverts commit r257769. Backing this out because of stage2 failures. llvm-svn: 257773	2016-01-14 15:49:32 +00:00
James Molloy	a9497f53c9	[ValueTracking] Understand more select patterns in ComputeKnownBits Some patterns of select+compare allow us to know exactly the value of the uppermost bits in the select result. For example: %b = icmp ugt i32 %a, 5 %c = select i1 %b, i32 2, i32 %a Here we know that %c is bounded by 5, and therefore KnownZero = ~APInt(5).getActiveBits() = ~7. There are several such patterns, and this patch attempts to understand a reasonable subset of them - namely when the base values are the same (as above), and when they are related by a simple (add nsw), for example (add nsw %a, 4) and %a. llvm-svn: 257769	2016-01-14 15:23:19 +00:00

... 3 4 5 6 7 ...

1302 Commits