llvm-project

Commit Graph

Author	SHA1	Message	Date
Jun Bum Lim	51a247065e	Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst When identifying blocks post-dominated by an unreachable-terminated block in BranchProbabilityInfo, consider only the edge to the normal destination block if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst. llvm-svn: 256028	2015-12-18 20:53:47 +00:00
Vaivaswatha Nagaraj	ed237938da	GlobalsAA: Take advantage of ArgMemOnly, InaccessibleMemOnly and InaccessibleMemOrArgMemOnly attributes Summary: 1. Modify AnalyzeCallGraph() to retain function info for external functions if the function has [InaccessibleMemOr]ArgMemOnly flags. 2. When analyzing the use of a global is function parameter at a call site, mark the callee also as modifying the global appropriately. 3. Add additional test cases. Depends on D15499 Reviewers: hfinkel, jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15605 llvm-svn: 255994	2015-12-18 11:02:52 +00:00
Matt Arsenault	e05ff15186	AMDGPU: Override getCFInstrCost The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796	2015-12-16 18:37:19 +00:00
Cong Hou	59898d8c68	[X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1. Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers are counted from the result of running llc on the new test case in this patch. Differential revision: http://reviews.llvm.org/D15132 llvm-svn: 255315	2015-12-11 00:31:39 +00:00
Cong Hou	94620278a4	Don't punish vectorized arithmetic instruction whose type will be split to multiple registers Currently in LLVM's cost model, a vectorized arithmetic instruction will have high cost if its type is split into multiple registers. However, this punishment is too heavy and unnecessary. The overhead of the split should not be on arithmetic instructions but instructions that implement the split. Note that during vectorization we have calculated the register pressure, and we only choose proper interleaving factor (and also vectorization factor) so that we don't use more registers than the maximum number. Here is a very simple example: if a vadd has the cost 1, and if we double VF so that we need two registers to perform it, then its cost will become 4 with the current implementation, which will prevent us to use larger VF. Differential revision: http://reviews.llvm.org/D15159 llvm-svn: 254671	2015-12-04 00:36:58 +00:00
Elena Demikhovsky	a1a40cce9f	AVX-512: Updated cost of FP/SINT/UINT conversion operations I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode. Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets). Differential Revision: http://reviews.llvm.org/D15074 llvm-svn: 254494	2015-12-02 08:59:47 +00:00
Matt Arsenault	e830f5427b	AMDGPU: Report extractelement as free in cost model The cost for scalarized operations is computed as N * (scalar operation cost + 1 extractelement + 1 insertelement). This partially fixes inflating the cost of scalarized operations since every operation is scalarized and free. I don't think we want any cost asociated with scalarization, but for now insertelement is still counted. I'm not sure if we should pretend that insertelement is also free, or add a way to compute a custom scalarization cost. llvm-svn: 254438	2015-12-01 19:08:39 +00:00
Elena Demikhovsky	0781d7b2b4	Fixed a failure in cost calculation for vector GEP Cost calculation for vector GEP failed with due to invalid cast to GEP index operand. The bug is fixed, added a test. http://reviews.llvm.org/D14976 llvm-svn: 254408	2015-12-01 12:08:36 +00:00
Pete Cooper	67cf9a723b	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	72bc23ef02	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Igor Laevsky	7310c68e85	Revert "Revert "Strip metadata when speculatively hoisting instructions (r252604)" Failing clang test is now fixed by the r253458. llvm-svn: 253459	2015-11-18 14:50:18 +00:00
Charlie Turner	7968b981bf	[ARM] Don't pessimize i32 vselect. The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad scalarization that is still happening there. I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements. From my benchmarks, I saw these improvements in A57 (T32) spec.cpu2000.ref.177_mesa 5.95% lnt.SingleSource/Benchmarks/Shootout/strcat 12.93% lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89% I also measured A57 A32, A53 T32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change Differential Revision: http://reviews.llvm.org/D14743 llvm-svn: 253349	2015-11-17 17:25:15 +00:00
James Molloy	7e9bdd5d01	Revert "Revert "[FunctionAttrs] Identify norecurse functions"" This reapplies this patch, with test fixes. llvm-svn: 252871	2015-11-12 10:55:20 +00:00
James Molloy	9a32da74f7	Revert "[FunctionAttrs] Identify norecurse functions" This reverts commit r252862. This introduced test failures and I'm reverting while I investigate how this happened. llvm-svn: 252863	2015-11-12 09:05:43 +00:00
James Molloy	b14994e752	[FunctionAttrs] Identify norecurse functions A function can be marked as norecurse if: * The SCC to which it belongs has cardinality 1; and either a) It does not call any non-norecurse function. This includes self-recursion; or b) It only has one callsite and the function that callsite is within is marked norecurse. a) is best propagated bottom-up and b) is best propagated top-down. We build up the norecurse attributes bottom-up using the existing SCC pass, and mark functions with no obvious recursion (but not provably norecurse) to sweep later, top-down. llvm-svn: 252862	2015-11-12 08:53:04 +00:00
Akira Hatanaka	5dda592643	Sort the enums in Attributes.h in case insensitive alphabetical order. Sort the enums in preparation for moving the attributes to a table-gen file. rdar://problem/19836465 llvm-svn: 252692	2015-11-11 02:11:46 +00:00
Renato Golin	0e77d72b0a	Revert "Strip metadata when speculatively hoisting instructions" This reverts commit r252604, as it broke all ARM and AArch64 buildbots, as well as some x86, et al. llvm-svn: 252623	2015-11-10 18:01:16 +00:00
Igor Laevsky	01c3692a10	Strip metadata when speculatively hoisting instructions This is fix for PR24059. When we are hoisting instruction above some condition it may turn out that metadata on this instruction was control dependant on the condition. This metadata becomes invalid and we need to drop it. This patch should cover most obvious places of speculative execution (which I have found by greping isSafeToSpeculativelyExecute). I think there are more cases but at least this change covers the severe ones. Differential Revision: http://reviews.llvm.org/D14398 llvm-svn: 252604	2015-11-10 14:10:31 +00:00
Mehdi Amini	afd135197b	Fix LoopAccessAnalysis when potentially nullptr check are involved Summary: GetUnderlyingObjects() can return "null" among its list of objects, we don't want to deduce that two pointers can point to the same memory in this case, so filter it out. Reviewers: anemet Subscribers: dexonsmith, llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 252149	2015-11-05 05:49:43 +00:00
Adam Nemet	a2df750fb3	[LAA] LLE 3/6: Rename InterestingDependence to Dependences, NFC Summary: We now collect all types of dependences including lexically forward deps not just "interesting" ones. Reviewers: hfinkel Subscribers: rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13256 llvm-svn: 251985	2015-11-03 21:39:52 +00:00
Adam Nemet	d7037c56d3	[LAA] LLE 2/6: Fix a NoDep case that should be a Forward dependence Summary: When the dependence distance in zero then we have a loop-independent dependence from the earlier to the later access. No current client of LAA uses forward dependences so other than potentially hitting the MaxDependences threshold earlier, this change shouldn't affect anything right now. This and the previous patch were tested together for compile-time regression. None found in LNT/SPEC. Reviewers: hfinkel Subscribers: rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13255 llvm-svn: 251973	2015-11-03 20:13:43 +00:00
Adam Nemet	b45516e875	[LAA] LLE 1/6: Expose Forward dependences Summary: Before this change, we didn't use to collect forward dependences since none of the current clients (LV, LDist) required them. The motivation to also collect forward dependences is a new pass LoopLoadElimination (LLE) which discovers store-to-load forwarding opportunities across the loop's backedge. The pass uses both lexically forward or backward loop-carried dependences to detect these opportunities. The new pass also analyzes loop-independent (forward) dependences since they can conflict with the loop-carried dependences in terms of how the data flows through memory. The newly added test only covers loop-carried forward dependences because loop-independent ones are currently categorized as NoDep. The next patch will fix this. The two patches were tested together for compile-time regression. None found in LNT/SPEC. Note that with this change LAA provides all dependences rather than just "interesting" ones. A subsequent NFC patch will remove the now trivial isInterestingDependence and rename the APIs. Reviewers: hfinkel Subscribers: jmolloy, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13254 llvm-svn: 251972	2015-11-03 20:13:23 +00:00
Sanjoy Das	52bfa0faa4	[SCEV] Fix PR25369 Have `getConstantEvolutionLoopExitValue` work correctly with multiple entry loops. As far as I can tell, `getConstantEvolutionLoopExitValue` never did the right thing for multiple entry loops; and before r249712 it would silently return an incorrect answer. r249712 changed SCEV to fail an assert on a multiple entry loop, and this change fixes the underlying issue. llvm-svn: 251770	2015-11-02 02:06:01 +00:00
Sanjoy Das	337d4786e1	[SCEV] Don't create SCEV expressions that break LCSSA Prevent `createNodeFromSelectLikePHI` from creating SCEV expressions that break LCSSA. A better fix for the same issue is to teach SCEVExpander to not break LCSSA by inserting PHI nodes at appropriate places. That's planned for the future. Fixes PR25360. llvm-svn: 251756	2015-10-31 23:21:40 +00:00
Silviu Baranga	f91c80720d	[SCEV] Generalize the SCEV algorithm for creating expressions for PHI nodes Summary: When forming expressions for phi nodes having an incoming value from outside the loop A and a value coming from the previous iteration B we were forming an AddRec if: - B was an AddRec - the value A was equal to the value for B at iteration -1 (or equal to the value of B shifted by one iteration, at iteration 0) In this case, we were computing the expression to be the expression of B, shifted by one iteration. This changes generalizes the logic above by removing the restriction that B needs to be an AddRec. For this we introduce two expression rewriters that allow us to - shift an expression by one iteration - get the value of an expression at iteration 0 This allows us to get SCEV expressions for PHI nodes when these expressions are not AddRecExprs. Reviewers: sanjoy Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D14175 llvm-svn: 251700	2015-10-30 15:02:28 +00:00
Sanjoy Das	c88f5d3c2c	[SCEV] Compute max backedge count for loops with "shift ivs" This teaches SCEV to compute //max// backedge taken counts for loops like for (int i = k; i != 0; i >>>= 1) whatever(); SCEV yet cannot represent the exact backedge count for these loops, and this patch does not change that. This is really geared towards teaching SCEV that loops like the above are not infinite. llvm-svn: 251558	2015-10-28 21:27:14 +00:00
Igor Laevsky	559d170021	[AliasAnalysis] Take into account readnone attribute for the function arguments Differential Revision: http://reviews.llvm.org/D13992 llvm-svn: 251535	2015-10-28 17:54:48 +00:00
Igor Laevsky	36e84c0fc7	[AliasAnalysis] Take into account readonly attribute for the function arguments In getArgModRefInfo we consider all arguments as having MRI_ModRef. However for arguments marked with readonly attribute we can return more precise answer - MRI_Ref. Differential Revision: http://reviews.llvm.org/D13992 llvm-svn: 251525	2015-10-28 16:42:00 +00:00
Chad Rosier	855233655c	Add newline to test. NFC. llvm-svn: 251511	2015-10-28 12:30:08 +00:00
James Molloy	c67dec690f	[GlobalsAA] An indirect global that is initialized is not fair game When checking if an indirect global (a global with pointer type) is only assigned by allocation functions, first check if the global is itself initialized. If it is, it's not only assigned by allocation functions. This fixes PR25309. Thanks to David Majnemer for reducing the test case! llvm-svn: 251508	2015-10-28 10:41:29 +00:00
Sanjoy Das	1d1929aace	[ValueTracking] Use !range metadata more aggressively in KnownBits Summary: Teach `computeKnownBitsFromRangeMetadata` to use `!range` metadata more aggressively. Reviewers: majnemer, nlewycky, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14100 llvm-svn: 251487	2015-10-28 03:20:15 +00:00
James Molloy	493e57de01	[ValueTracking] Extend r251146 to catch a fairly common case Even though we may not know the value of the shifter operand, it's possible we know the shifter operand is non-zero. This can allow us to infer more known bits - for example: %1 = load %p !range {1, 5} %2 = shl %q, %1 We don't know %1, but we do know that it is nonzero so %2[0] is known zero, and importantly %2 is known non-zero. Calling isKnownNonZero is nontrivially expensive so use an Optional to run it lazily and cache its result. llvm-svn: 251294	2015-10-26 14:10:46 +00:00
James Molloy	05a896a8d1	[BasicAA] Bugfix for r251016 If the loaded type sizes don't match the element type of the sequential type, all bets are off and the addresses may, indeed, overlap. Surprisingly, this just got caught in one test, on one builder, out of the 30+ builders testing this change. Congratulations go to http://lab.llvm.org:8011/builders/clang-aarch64-lnt/builds/5205. llvm-svn: 251112	2015-10-23 14:17:03 +00:00
Sanjoy Das	eeca9f6fd4	[SCEV] Commute zero extends through <nuw> additions llvm-svn: 251052	2015-10-22 19:57:38 +00:00
Sanjoy Das	a060e602fd	[SCEV] Commute sign extends through nsw additions Summary: Depends on D13613. Reviewers: atrick, hfinkel, reames, nlewycky Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13685 llvm-svn: 251049	2015-10-22 19:57:25 +00:00
Sanjoy Das	8f27415c05	[SCEV] Mark AddExprs as nsw or nuw if legal Summary: This uses `ScalarEvolution::getRange` and not potentially control dependent `nsw` and `nuw` bits on the arithmetic instruction. Reviewers: atrick, hfinkel, nlewycky Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D13613 llvm-svn: 251048	2015-10-22 19:57:19 +00:00
James Molloy	5b2a732fac	[GlobalsAA] Loosen an overly conservative bailout Instead of bailing out when we see loads, analyze them. If we can prove that the loaded-from address must escape, then we can conclude that a load from that address must escape too and therefore cannot alias a non-addr-taken global. When checking if a Value can alias a non-addr-taken global, if the Value is a LoadInst of a non-global, recurse instead of bailing. If we can follow a trail of loads up to some base that is captured, we know by inference that all the loads we followed are also captured. llvm-svn: 251017	2015-10-22 13:44:26 +00:00
James Molloy	5a4d8cd519	[BasicAA] Non-equal indices in a GEP of a SequentialType don't overlap If the final indices of two GEPs can be proven to not be equal, and the GEP is of a SequentialType (not a StructType), then the two GEPs do not alias. llvm-svn: 251016	2015-10-22 13:28:18 +00:00
James Molloy	1d88d6f289	[ValueTracking] Add a new predicate: isKnownNonEqual() isKnownNonEqual(A, B) returns true if it can be determined that A != B. At the moment it only knows two facts, that a non-wrapping add of nonzero to a value cannot be that value: A + B != A [where B != 0, addition is nsw or nuw] and that contradictory known bits imply two values are not equal. This patch also hooks this up to InstSimplify; InstSimplify had a peephole for the first fact but not the second so this teaches InstSimplify a new trick too (alas no measured performance impact!) llvm-svn: 251012	2015-10-22 13:18:42 +00:00
Simon Pilgrim	a18ae9bd70	[CostModel] Fixed AVX integer shift costs Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts. llvm-svn: 250611	2015-10-17 13:23:38 +00:00
James Molloy	860507f838	[GlobalsAA] Don't assume anything about functions that may be overridden Weak linkage and friends allow a symbol to be overriden outside the code generator's model, so GlobalsAA shouldn't assume that anything it can compute about such a symbol is valid. llvm-svn: 250156	2015-10-13 10:43:33 +00:00
Tobias Grosser	374bce0c22	SCEV: Allow simple AddRec * Parameter products in delinearization This patch also allows the -delinearize pass to delinearize expressions that do not have an outermost SCEVAddRec expression. The SCEV::delinearize infrastructure allowed this since r240952, but the -delinearize pass was not updated yet. llvm-svn: 250018	2015-10-12 08:02:00 +00:00
Simon Pilgrim	18a048e1cd	[X86] Completed SHL cost model tests As discussed in D8690. llvm-svn: 249990	2015-10-11 18:33:48 +00:00
Simon Pilgrim	3bcf5bb79e	[X86] Renamed SHL cost model tests Matches naming conventions for ASHR/LSHR cost tests As discussed in D8690. llvm-svn: 249984	2015-10-11 17:34:32 +00:00
Simon Pilgrim	acbf51ab60	[X86] Added LSHR cost model tests There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing. As discussed in D8690. llvm-svn: 249983	2015-10-11 17:29:26 +00:00
Simon Pilgrim	602b0e1f0b	[X86] Added ASHR cost model tests There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing. As discussed in D8690. llvm-svn: 249981	2015-10-11 17:08:05 +00:00
Reid Kleckner	14e773500e	[WinEH] Delete the old landingpad implementation of Windows EH The new implementation works at least as well as the old implementation did. Also delete the associated preparation tests. They don't exercise interesting corner cases of the new implementation. All the codegen tests of the EH tables have already been ported. llvm-svn: 249918	2015-10-09 23:34:53 +00:00
James Molloy	e9d50dc9f7	Compute demanded bits for icmp instructions Instead of bailing out when we see an icmp, we can instead at least say that if the upper bits of both operands are known zero, they are not demanded. This doesn't help with signed comparisons, but it's at least better than bailing out. llvm-svn: 249687	2015-10-08 12:40:06 +00:00
James Molloy	bcd7f0ac98	Treat Mul just like Add and Subtract Like adds and subtracts, muls ripple only to the left so we can use the same logic. While we're here, add a print method to DemandedBits so it can be used with -analyze, which we'll use in the testcase. llvm-svn: 249686	2015-10-08 12:39:59 +00:00
Mehdi Amini	044cb34bdc	Revert "Revert "This patch builds on top of D13378 to handle constant condition."" This reverts commit r249528 and reapply r249431. The fix for the fallout has been commited in r249575. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 249581	2015-10-07 18:14:25 +00:00
James Molloy	47efaeb36e	Revert "This patch builds on top of D13378 to handle constant condition." This reverts commit r249431. This caused failures in sqlite3: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/14453 llvm-svn: 249528	2015-10-07 09:03:34 +00:00
Mehdi Amini	cf2513b352	This patch builds on top of D13378 to handle constant condition. With this patch, clang -O3 optimizes correctly providing > 1000x speedup on this artificial benchmark): for (a=0; a<n; a++) for (b=0; b<n; b++) for (c=0; c<n; c++) for (d=0; d<n; d++) for (e=0; e<n; e++) for (f=0; f<n; f++) x++; From test-suite/SingleSource/Benchmarks/Shootout/nestedloop.c Reviewers: sanjoyd Differential Revision: http://reviews.llvm.org/D13390 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 249431	2015-10-06 17:19:20 +00:00
Sanjoy Das	55015d210f	[SCEV] Recognize simple br-phi patterns Summary: Teach SCEV to match patterns like ``` br %cond, label %left, label %right left: br label %merge right: br label %merge merge: V = phi [ %x, %left ], [ %y, %right ] ``` as "select %cond, %x, %y". Before this SCEV would match PHI nodes exclusively to add recurrences. This addresses PR25005. Reviewers: joker.eph, joker-eph, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13378 llvm-svn: 249211	2015-10-02 23:09:44 +00:00
Jeroen Ketema	ab99b59e8c	[ARM][NEON] Use address space in vld([1234]\|[234]lane) and vst([1234]\|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887	2015-09-30 10:56:37 +00:00
Simon Pilgrim	3d11c994f7	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878	2015-09-30 08:17:50 +00:00
James Molloy	897048bee3	[ValueTracking] Teach isKnownNonZero about monotonically increasing PHIs If a PHI starts at a non-negative constant, monotonically increases (only adds of a constant are supported at the moment) and that add does not wrap, then the PHI is known never to be zero. llvm-svn: 248796	2015-09-29 14:08:45 +00:00
Artur Pilipenko	b4d009042b	Introduce !align metadata for load instruction Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D12853 llvm-svn: 248721	2015-09-28 17:41:08 +00:00
Sanjoy Das	b174f9a316	[SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts' Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 The original checkin, r248608 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248638	2015-09-25 23:53:50 +00:00
Cong Hou	15ea016346	Use fixed-point representation for BranchProbability. BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change: Pros: 1. It uses only a half space of the current one. 2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer. Cons: 1. Constructing a probability using arbitrary numerator and denominator needs additional calculations. 2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio. One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern. Differential revision: http://reviews.llvm.org/D12603 llvm-svn: 248633	2015-09-25 23:09:59 +00:00
Sanjoy Das	4a39b97671	Revert two SCEV changes that caused test failures in clang. r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible" r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts." llvm-svn: 248614	2015-09-25 21:16:50 +00:00
Sanjoy Das	d706fa8a0c	[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts. Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248608	2015-09-25 19:59:57 +00:00
James Molloy	eb46641c28	[GlobalsAA] Teach GlobalsAA about nocapture Arguments to function calls marked "nocapture" can be marked as non-escaping. However, nocapture is defined in terms of the lifetime of the callee, and if the callee can directly or indirectly recurse to the caller, the semantics of nocapture are invalid. Therefore, we eagerly discover which SCC each function belongs to, and later can check if callee and caller of a callsite belong to the same SCC, in which case there could be recursion. This means that we can't be so optimistic in getModRefInfo(ImmutableCallsite) - previously we assumed all call arguments never aliased with an escaping global. Now we need to check, because a global could now be passed as an argument but still not escape. This also solves a related conformance problem: MemCpyOptimizer can turn non-escaping stores of globals into calls to intrinsics like llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the global can't escape and so returns NoModRef when queried, when obviously a memcpy/memset call does indeed reference and modify its arguments. This fixes PR24800, PR24801, and PR24802. llvm-svn: 248576	2015-09-25 15:39:29 +00:00
James Molloy	b6be1ebb7d	[ValueTracking] Teach isKnownNonZero a new trick If the shifter operand is a constant, and all of the bits shifted out are known to be zero, then if X is known non-zero at least one non-zero bit must remain. llvm-svn: 248508	2015-09-24 16:06:32 +00:00
Philip Reames	963febd4f8	Fix for pr24866 Turns out that not every basic block is guaranteed to have a node within the DominatorTree. This is really hard to trigger, but the test case from the PR managed to do so. There's active discussion continuing about what documentation and/or invariants needed cleaned up. llvm-svn: 248216	2015-09-21 22:04:10 +00:00
Artur Pilipenko	84bc62f7a3	Support align attribute for return values Reviewed By: reames Differential Revision: http://reviews.llvm.org/D12844 llvm-svn: 247984	2015-09-18 12:33:31 +00:00
David Blaikie	2f40830dde	[opaque pointer type] Add textual IR support for explicit type parameter for global aliases update.py: import fileinput import sys import re alias_match_prefix = r"(.(?:=\|:\|^)\s(?:external \|)(?:(?:private\|internal\|linkonce\|linkonce_odr\|weak\|weak_odr\|common\|appending\|extern_weak\|available_externally) )?(?:default \|hidden \|protected )?(?:dllimport \|dllexport )?(?:unnamed_addr \|)(?:thread_local(?:$[a-z]$)? )?alias" plain = re.compile(alias_match_prefix + r" (.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|addrspacecast\|\[\[[a-zA-Z]\|\{\{).$)") cast = re.compile(alias_match_prefix + r") ((?:bitcast\|inttoptr\|addrspacecast)\s$. to (.?)(\| addrspace\(\d+$ )\\)\s(?:;.)?$)") gep = re.compile(alias_match_prefix + r") ((?:getelementptr)\s(?:inbounds)?\s$(?P<type>.), (?P=type)(?:\saddrspace\(\d+$\s)?\* .\)\s(?:;.)?$)") def conv(line): m = re.match(cast, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(gep, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(plain, line) if m: return m.group(1) + ", " + m.group(2) + m.group(3) + "" + m.group(4) + "\n" return line for line in sys.stdin: sys.stdout.write(conv(line)) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name .ll \| xargs ./apply.sh llvm-svn: 247378	2015-09-11 03:22:04 +00:00
Matthew Simpson	ddb4d9741f	[SCEV] Consistently Handle Expressions That Cannot Be Divided This patch addresses the issue of SCEV division asserting on some input expressions (e.g., non-affine expressions) and quietly giving up on others. When giving up, we set the quotient to be equal to zero and the remainder to be equal to the numerator. With this patch, we always quietly give up when we cannot perform the division. This patch also adds a test case for DependenceAnalysis that previously caused an assertion. Differential Revision: http://reviews.llvm.org/D11725 llvm-svn: 247314	2015-09-10 18:12:47 +00:00
Sanjoy Das	f3132d3b03	[ScalarEvolution] Fix PR24757. Summary: PR24757 was caused by some incorect math in `ScalarEvolution::HowFarToZero` -- the smallest unsigned solution for X in 2^N * A = 2^N * X is not necessarily A. Reviewers: atrick, majnemer, meheff Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D12721 llvm-svn: 247242	2015-09-10 05:27:38 +00:00
Piotr Padlewski	0dde00d239	ScalarEvolution assume hanging bugfix http://reviews.llvm.org/D12719 llvm-svn: 247184	2015-09-09 20:47:30 +00:00
Chandler Carruth	7b560d40bd	[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are within a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167	2015-09-09 17:55:00 +00:00
Silviu Baranga	a3e27edb5d	[CostModel][AArch64] Remove amortization factor for some of the vector select instructions Summary: We are not scalarizing the wide selects in codegen for i16 and i32 and therefore we can remove the amortization factor. We still have issues with i64 vectors in codegen though. Reviewers: mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12724 llvm-svn: 247156	2015-09-09 15:35:02 +00:00
Diego Novillo	f9aa39b0cf	Fix PR 24723 - Handle 0-mass backedges in irreducible loops This corner case happens when we have an irreducible SCC that is deeply nested. As we work down the tree, the backedge masses start getting smaller and smaller until we reach one that is down to 0. Since we distribute the incoming mass using the backedge masses as weight, the distributor does not allow zero weights. So, we simply ignore them (which will just use the weights of the non-zero nodes). llvm-svn: 247050	2015-09-08 19:22:17 +00:00
Hal Finkel	f11bc761d8	[PowerPC] Include the permutation cost for unaligned vector loads Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX types), even when accounting for the combining that takes place for multiple consecutive such loads, there is at least one load instructions and one permutation for each load. Make sure the cost reported reflects the cost of the permutes as well. llvm-svn: 246807	2015-09-03 21:23:18 +00:00
Hal Finkel	79dbf5b562	[PowerPC] Cleanup cost model for unaligned vector loads/stores I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. llvm-svn: 246712	2015-09-02 21:03:28 +00:00
Quentin Colombet	5989bc6f41	[BasicAA] Fix the handling of sext and zext in the analysis of GEPs. Hopefully this will end the GEPs saga! This commit reverts r245394, i.e., it reapplies r221876 while incorporating the fixes from D11847. r221876 was not reapplied alone because it was not safe and D11847 was not applied alone because it needs r221876 to produce correct results. This should fix PR24596. Original commit message for r221876: Let's try this again... This reverts r219432, plus a bug fix. Description of the bug in r219432 (by Nick): The bug was using AllPositive to break out of the loop; if the loop break condition i != e is changed to i != e && AllPositive then the test_modulo_analysis_with_global test I've added will fail as the Modulo will be calculated incorrectly (as the last loop iteration is skipped, so Modulo isn't updated with its Scale). Nick also adds this comment: ComputeSignBit is safe to use in loops as it takes into account phi nodes, and the == EK_ZeroEx check is safe in loops as, no matter how the variable changes between iterations, zero-extensions will always guarantee a zero sign bit. The isValueEqualInPotentialCycles check is therefore definitely not needed as all the variable analysis holds no matter how the variables change between loop iterations. And this patch also adds another enhancement to GetLinearExpression - basically to convert ConstantInts to Offsets (see test_const_eval and test_const_eval_scaled for the situations this improves). Original commit message: This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick): The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! Message from D11847: Un-revert of r241981 and fix for PR23626. The 'Or' case of GetLinearExpression delegates to 'Add' if possible, and if not it returns an Opaque value. Unfortunately the Scale and Offsets weren't being set (and so defaulted to 0) - and a scale of zero effectively removes the variable from the GEP instruction. This meant that BasicAA would return MustAliases when it should have been returning PartialAliases (and PR23626 was an example of the GVN pass using an incorrect MustAlias to merge loads from what should have been different pointers). Differential Revision: http://reviews.llvm.org/D11847 Patch by Nick White <n.j.white@gmail.com>! llvm-svn: 246502	2015-08-31 22:32:47 +00:00
George Burgess IV	68b36e01da	Fix: CFLAA -- Mark no-args returns as unknown Prior to this patch, we hadn't been marking StratifiedSets with the appropriate StratifiedAttrs when handling the result of no-args call instructions. This caused us to report NoAlias when handed, for example, an escaped alloca and a result from an opaque function. Now we properly mark the return value of said functions. Thanks again to Chandler, Richard, and Nick for pinging me about this. Differential review: http://reviews.llvm.org/D12408 llvm-svn: 246240	2015-08-28 00:16:18 +00:00
Hal Finkel	0ef2b10f16	Fix how DependenceAnalysis calls delinearization Fix how DependenceAnalysis calls delinearization, mirroring what is done in Delinearization.cpp (mostly by making sure to call getSCEVAtScope before delinearizing, and by removing the unnecessary 'Pairs == 1' check). Patch by Vaivaswatha Nagaraj! llvm-svn: 245408	2015-08-19 02:56:36 +00:00
Quentin Colombet	861ad97e6f	[BasicAA] Add a test for PR24468 to be sure we won't regress when we finally get the GEP aliasing right. llvm-svn: 245395	2015-08-19 00:08:26 +00:00
Quentin Colombet	b700e357b5	[BasicAA] Revert r221876 because it can produce incorrect aliasing information: see PR24468. llvm-svn: 245394	2015-08-19 00:07:20 +00:00
Silviu Baranga	d5ac26937c	[CostModel][ARM] Increase cost of insert/extract operations Summary: This change limits the minimum cost of an insert/extract element operation to 2 in cases where this would result in mixing of NEON and VFP code. Reviewers: rengolin Subscribers: mssimpso, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12030 llvm-svn: 245225	2015-08-17 15:57:05 +00:00
Artur Pilipenko	34d8ba84c8	Take alignment into account in isSafeToSpeculativelyExecute and isSafeToLoadUnconditionally. Reviewed By: hfinkel, sanjoy, MatzeB Differential Revision: http://reviews.llvm.org/D9791 llvm-svn: 245223	2015-08-17 15:54:26 +00:00
Michael Kuperstein	adc4e9c414	[GMR] isNonEscapingGlobalNoAlias() should look through Bitcasts/GEPs when looking at loads. This fixes yet another case from PR24288. Differential Revision: http://reviews.llvm.org/D12064 llvm-svn: 245207	2015-08-17 10:06:08 +00:00
Chandler Carruth	2f1fd1658f	[PM] Port ScalarEvolution to the new pass manager. This change makes ScalarEvolution a stand-alone object and just produces one from a pass as needed. Making this work well requires making the object movable, using references instead of overwritten pointers in a number of places, and other refactorings. I've also wired it up to the new pass manager and added a RUN line to a test to exercise it under the new pass manager. This includes basic printing support much like with other analyses. But there is a big and somewhat scary change here. Prior to this patch ScalarEvolution was never actually invalidated!!! Re-running the pass just re-wired up the various other analyses and didn't remove any of the existing entries in the SCEV caches or clear out anything at all. This might seem OK as everything in SCEV that can uses ValueHandles to track updates to the values that serve as SCEV keys. However, this still means that as we ran SCEV over each function in the module, we kept accumulating more and more SCEVs into the cache. At the end, we would have a SCEV cache with every value that we ever needed a SCEV for in the entire module!!! Yowzers. The releaseMemory routine would dump all of this, but that isn't realy called during normal runs of the pipeline as far as I can see. To make matters worse, there is actually a key that we don't update with value handles -- there is a map keyed off of Loops. Because LoopInfo does* release its memory from run to run, it is entirely possible to run SCEV over one function, then over another function, and then lookup a Loop* from the second function but find an entry inserted for the first function! Ouch. To make matters still worse, there are plenty of updates that don't trip a value handle. It seems incredibly unlikely that today GVN or another pass that invalidates SCEV can update values in just such a way that a subsequent run of SCEV will incorrectly find lookups in a cache, but it is theoretically possible and would be a nightmare to debug. With this refactoring, I've fixed all this by actually destroying and recreating the ScalarEvolution object from run to run. Technically, this could increase the amount of malloc traffic we see, but then again it is also technically correct. ;] I don't actually think we're suffering from tons of malloc traffic from SCEV because if we were, the fact that we never clear the memory would seem more likely to have come up as an actual problem before now. So, I've made the simple fix here. If in fact there are serious issues with too much allocation and deallocation, I can work on a clever fix that preserves the allocations (while clearing the data) between each run, but I'd prefer to do that kind of optimization with a test case / benchmark that shows why we need such cleverness (and that can test that we actually make it faster). It's possible that this will make some things faster by making the SCEV caches have higher locality (due to being significantly smaller) so until there is a clear benchmark, I think the simple change is best. Differential Revision: http://reviews.llvm.org/D12063 llvm-svn: 245193	2015-08-17 02:08:17 +00:00
Bjarke Hammersholt Roune	9791ed4705	[SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl Summary: http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions. This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit: for (int i = 0; i < numIterations; ++i) sum += ptr[i - offset]; for (int i = 0; i < numIterations; ++i) sum += ptr[i * stride]; for (int i = 0; i < numIterations; ++i) sum += ptr[3 * (i << 7)]; Reviewers: atrick, sanjoy Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben Differential Revision: http://reviews.llvm.org/D11860 llvm-svn: 245118	2015-08-14 22:45:26 +00:00
Igor Laevsky	30143aee11	Emit argmemonly attribute for intrinsics. Differential Revision: http://reviews.llvm.org/D11352 llvm-svn: 244920	2015-08-13 17:40:04 +00:00
Adam Nemet	abc794d3db	[LAA] Fix typo in test llvm-svn: 244690	2015-08-11 23:03:09 +00:00
Michael Kuperstein	07f31d92ca	[GMR] Be a bit smarter about which globals don't alias when doing recursive lookups Should hopefully fix the remainder of PR24288. Differential Revision: http://reviews.llvm.org/D11900 llvm-svn: 244575	2015-08-11 08:06:44 +00:00
Jonathan Roelofs	49e46ce8e2	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481	2015-08-10 19:01:27 +00:00
Chandler Carruth	405e4f9051	[GMR] Teach the conservative path of GMR to catch even more easy cases. In PR24288 it was pointed out that the easy case of a non-escaping global and something that obviously required an escape sometimes is hidden behind PHIs (or selects in theory). Because we have this binary test, we can easily just check that all possible input values satisfy the requirement. This is done with a (very small) recursion through PHIs and selects. With this, the specific example from the PR is correctly folded by GVN. Differential Revision: http://reviews.llvm.org/D11707 llvm-svn: 244078	2015-08-05 17:58:30 +00:00
Simon Pilgrim	86478c6909	[X86][SSE] Vectorize i64 ASHR operations This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed. Differential Revision: http://reviews.llvm.org/D11439 llvm-svn: 243569	2015-07-29 20:31:45 +00:00
Jingyue Wu	42f1d67a45	[SCEV] Apply NSW and NUW flags via poison value analysis Summary: Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX. There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial. This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this: for (int i = 0; i < limit; ++i) { sum += ptr[i + offset]; } Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX. There are more details in this discussion on llvmdev. June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234 July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392 Patch by Bjarke Roune Reviewers: eliben, atrick, sanjoy Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits Differential Revision: http://reviews.llvm.org/D11212 llvm-svn: 243460	2015-07-28 18:22:40 +00:00
Silviu Baranga	4825060059	[LAA] Add clarifying comments for the checking pointer grouping algorithm. NFC llvm-svn: 243416	2015-07-28 13:44:08 +00:00
Chandler Carruth	99ad7bb49c	[GMR] Teach GlobalsModRef to distinguish an important and safe case of no-alias with non-addr-taken globals: they cannot alias a captured pointer. If the non-global underlying object would have been a capture were it to alias the global, we can firmly conclude no-alias. It isn't reasonable for a transformation to introduce a capture in a way observable by an alias analysis. Consider, even if it were to temporarily capture one globals address into another global and then restore the other global afterward, there would be no way for the load in the alias query to observe that capture event correctly. If it observes it then the temporary capturing would have changed the meaning of the program, making it an invalid transformation. Even instrumentation passes or a pass which is synthesizing stores to global variables to expose race conditions in programs could not trigger this unless it queried the alias analysis infrastructure mid-transform, in which case it seems reasonable to return results from before the transform started. See the comments in the change for a more detailed outlining of the theory here. This should address the primary performance regression found when the non-conservatively-correct path of the alias query was disabled. Differential Revision: http://reviews.llvm.org/D11410 llvm-svn: 243405	2015-07-28 11:11:11 +00:00
Adam Nemet	54f0b83ee2	[LAA] Split out a helper to print a collection of memchecks This is effectively an NFC but we can no longer print the index of the pointer group so instead I print its address. This still lets us cross-check the section that list the checks against the section that list the groups (see how I modified the test). E.g. before we printed this: Run-time memory checks: Check 0: Comparing group 0: %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc Against group 1: %arrayidxA = getelementptr i16, i16* %a, i64 %ind %arrayidxA1 = getelementptr i16, i16* %a, i64 %add ... Grouped accesses: Group 0: (Low: %c High: (78 + %c)) Member: {%c,+,4}<%for.body> Member: {(2 + %c),+,4}<%for.body> Now we print this (changes are underlined): Run-time memory checks: Check 0: Comparing group (0x7f9c6040c320): ~~~~~~~~~~~~~~ %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind Against group (0x7f9c6040c358): ~~~~~~~~~~~~~~ %arrayidxA1 = getelementptr i16, i16* %a, i64 %add %arrayidxA = getelementptr i16, i16* %a, i64 %ind ... Grouped accesses: Group 0x7f9c6040c320: ~~~~~~~~~~~~~~ (Low: %c High: (78 + %c)) Member: {(2 + %c),+,4}<%for.body> Member: {%c,+,4}<%for.body> llvm-svn: 243354	2015-07-27 23:54:41 +00:00
Jingyue Wu	bfefff555e	Roll forward r243250 r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build slaves, but I couldn't reproduce this failure locally. Probably a false positive as I saw this test was broken by r243246 or r243247 too but passed later without people fixing anything. llvm-svn: 243253	2015-07-26 19:10:03 +00:00
Jingyue Wu	84879b71a9	Revert r243250 breaks tests llvm-svn: 243251	2015-07-26 18:30:13 +00:00
Jingyue Wu	bf485f059c	[TTI/CostModel] improve TTI::getGEPCost and use it in CostModel::getInstructionCost Summary: This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider addressing modes. It now returns TCC_Free when the GEP can be completely folded to an addresing mode. I started this patch as I refactored SLSR. Function isGEPFoldable looks common and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost. Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best testing bed seems CostModel, but its getInstructionCost method invokes getAddressComputationCost for GEPs which provides very coarse estimation. So this patch also makes getInstructionCost call the updated getGEPCost for GEPs. This change inevitably breaks some tests because the cost model changes, but nothing looks seriously wrong -- if we believe the new cost model is the right way to go, these tests should be updated. This patch is not perfect yet -- the comments in some tests need to be updated. I want to know whether this is a right approach before fixing those details. Reviewers: chandlerc, hfinkel Subscribers: aschwaighofer, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D9819 llvm-svn: 243250	2015-07-26 17:28:13 +00:00
Igor Laevsky	2ba8374612	NFC. Explicitly specify attributes in BasicAA/cs-cs.ll test. This will simplify verifying correctness for a changes which modify attributes. llvm-svn: 243016	2015-07-23 14:31:18 +00:00
Jingyue Wu	d058ea927f	[MDA] change BlockScanLimit into a command line option. Summary: In the benchmark (https://github.com/vetter/shoc) we are researching, the duplicated load is not eliminated because MemoryDependenceAnalysis hit the BlockScanLimit. This patch change it into a command line option instead of a hardcoded value. Patched by Xuetian Weng. Test Plan: test/Analysis/MemoryDependenceAnalysis/memdep-block-scan-limit.ll Reviewers: jingyue, reames Subscribers: reames, llvm-commits Differential Revision: http://reviews.llvm.org/D11366 llvm-svn: 242842	2015-07-21 21:50:39 +00:00
Simon Pilgrim	59764dccfb	[X86][SSE] Updated SHL/LSHR i64 vectorization costs. This was missed in D8416. llvm-svn: 242621	2015-07-18 20:06:30 +00:00
Chandler Carruth	f55803f761	[PM/AA] Disable the core unsafe aspect of GlobalsModRef in the face of basic changes to the IR such as folding pointers through PHIs, Selects, integer casts, store/load pairs, or outlining. This leaves the feature available behind a flag. This flag's default could be flipped if necessary, but the real-world performance impact of this particular feature of GMR may not be sufficiently significant for many folks to want to run the risk. Currently, the risk here is somewhat mitigated by half-hearted attempts to update GlobalsModRef when the rest of the optimizer changes something. However, I am currently trying to remove that update mechanism as it makes migrating the AA infrastructure to a form that can be readily shared between new and old pass managers very challenging. Without this update mechanism, it is possible that this still unlikely failure mode will start to trip people, and so I wanted to try to proactively avoid that. There is a lengthy discussion on the mailing list about why the core approach here is flawed, and likely would need to look totally different to be both reasonably effective and resilient to basic IR changes occuring. This patch is essentially the first of two which will enact the result of that discussion. The next patch will remove the current update mechanism. Thanks to lots of folks that helped look at this from different angles. Especial thanks to Michael Zolotukhin for doing some very prelimanary benchmarking of LTO without GlobalsModRef to get a rough idea of the impact we could be facing here. So far, it looks very small, but there are some concerns lingering from other benchmarking. The default here may get flipped if performance results end up pointing at this as a more significant issue. Also thanks to Pete and Gerolf for reviewing! Differential Revision: http://reviews.llvm.org/D11213 llvm-svn: 242512	2015-07-17 06:58:24 +00:00
Silviu Baranga	0e5804a6af	Fix memcheck interval ends for pointers with negative strides Summary: The checking pointer grouping algorithm assumes that the starts/ends of the pointers are well formed (start <= end). The runtime memory checking algorithm also assumes this by doing: start0 < end1 && start1 < end0 to detect conflicts. This check only works if start0 <= end0 and start1 <= end1. This change correctly orders the interval ends by either checking the stride (if it is constant) or by using min/max SCEV expressions. Reviewers: anemet, rengolin Subscribers: rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D11149 llvm-svn: 242400	2015-07-16 14:02:58 +00:00
Sean Silva	6dd10bade7	Add a test for r242281 from an old patch of mine. This isn't thorough, but should serve as a sanity check. llvm-svn: 242356	2015-07-15 23:23:02 +00:00
Tobias Edler von Koch	d8ce16b1e6	Analyze recursive PHI nodes in BasicAA Summary: This patch allows phi nodes like %x = phi [ %incptr, ... ] [ %var, ... ] %incptr = getelementptr %x, 1 to be analyzed by BasicAliasAnalysis. In aliasPHI, we can detect incoming values that are recursive GEPs with a constant offset. Instead of trying to analyze a recursive GEP (and failing), we now ignore it and instead set the size of the memory referenced by the PHINode to UnknownSize. This represents all the possible memory locations the pointer represented by the PHINode could be advanced to by the GEP. For now, this new behavior is turned off by default to allow debugging of performance degradations seen with SPEC/x86 and Hexagon benchmarks. The flag -basicaa-recphi turns it on. Reviewers: hfinkel, sanjoy Subscribers: tobiasvk_caf, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D10368 llvm-svn: 242320	2015-07-15 19:32:22 +00:00
Silviu Baranga	a647c30f88	Cleanup after r241809 - remove uncessary call to std::sort Summary: The iteration order within a member of DepCands is deterministic and therefore we don't have to sort the accesses within a member. We also don't have to copy the indices of the pointers into a vector, since we can iterate over the members of the class. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11145 llvm-svn: 242033	2015-07-13 14:48:24 +00:00
Manuel Klimek	779cf85a4f	Revert r241981 "Revert "Revert r236894 "[BasicAA] Fix zext & sext handling""" The repros from PR23626 still fail. llvm-svn: 242025	2015-07-13 13:50:55 +00:00
Simon Pilgrim	64cc4ad0a2	[X86][SSE] Vectorized v4i32 non-uniform shifts. While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized. This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together. Differential Revision: http://reviews.llvm.org/D11063 llvm-svn: 241989	2015-07-12 11:15:19 +00:00
Hal Finkel	ef28aad9f4	Revert "Revert r236894 "[BasicAA] Fix zext & sext handling"" r236894 caused PR23626 (Clang miscompiles webkit's base64 decoder), and was reverted in r237984. This reapplies the patch with an additional test case for PR23626 and the associated fix (both scales and offsets in the BasicAliasAnalysis::constantOffsetHeuristic should initially be zero). Patch by Nick White, thanks! llvm-svn: 241981	2015-07-11 11:04:54 +00:00
Igor Laevsky	39d662f7ba	Add argmemonly attribute. This change adds new attribute called "argmemonly". Function marked with this attribute can only access memory through it's argument pointers. This attribute directly corresponds to the "OnlyAccessesArgumentPointees" ModRef behaviour in alias analysis. Differential Revision: http://reviews.llvm.org/D10398 llvm-svn: 241979	2015-07-11 10:30:36 +00:00
Silviu Baranga	3e3095c53a	Add a test of a regression discovered during testing of r241673 Summary: We were missing a corner case where DepCands was not available, but we were using DepCands to compute the checking pointer groups. This adds a test for that regression. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11068 llvm-svn: 241818	2015-07-09 16:40:25 +00:00
Silviu Baranga	ce3877fc8c	Don't rely on the DepCands iteration order when constructing checking pointer groups Summary: The checking pointer group construction algorithm relied on the iteration on DepCands. We would need the same leaders across runs and the same iteration order over the underlying std::set for determinism. This changes the algorithm to process the pointers in the order in which they were added to the runtime check, which is deterministic. We need to update the tests, since the order in which pointers appear has changed. No new tests were added, since it is impossible to test for non-determinism. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11064 llvm-svn: 241809	2015-07-09 15:18:25 +00:00
Adam Nemet	424edc6c80	[LAA] Revert a small part of r239295 This commit ([LAA] Fix estimation of number of memchecks) regressed the logic a bit. We shouldn't quit the analysis if we encounter a pointer without known bounds unless we actually need to emit a memcheck for it. The original code was using NumComparisons which is now computed differently. Instead I compute NeedRTCheck from NumReadPtrChecks and NumWritePtrChecks. As side note, I find the separation of NeedRTCheck and CanDoRT confusing, so I will try to merge them in a follow-up patch. llvm-svn: 241756	2015-07-08 22:58:48 +00:00
Silviu Baranga	1b6b50a921	[LAA] Merge memchecks for accesses separated by a constant offset Summary: Often filter-like loops will do memory accesses that are separated by constant offsets. In these cases it is common that we will exceed the threshold for the allowable number of checks. However, it should be possible to merge such checks, sice a check of any interval againt two other intervals separated by a constant offset (a,b), (a+c, b+c) will be equivalent with a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap. Assuming the loop will be executed for a sufficient number of iterations, this will be true. If not true, checking against (a, b+c) is still safe (although not equivalent). As long as there are no dependencies between two accesses, we can merge their checks into a single one. We use this technique to construct groups of accesses, and then check the intervals associated with the groups instead of checking the accesses directly. Reviewers: anemet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10386 llvm-svn: 241673	2015-07-08 09:16:33 +00:00
Simon Pilgrim	8fbf1c1f4a	[X86][SSE] Vectorized i64 uniform constant SRA shifts This patch adds vectorization support for uniform constant i64 arithmetic shift right operators. Differential Revision: http://reviews.llvm.org/D9645 llvm-svn: 241514	2015-07-06 22:35:19 +00:00
Sanjoy Das	7869d4b846	[LazyCallGraph] Port test case from r240039 to LCG. Summary: r240039 adds a test case to check that CallGraph does the right thing with respect to non-leaf intrinsics like statepoint and patchpoint. This ports the same test case to LazyCallGraph. LazyCallGraph already does the right thing with respect to escaping function pointers so there is no need to change any code. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10582 llvm-svn: 241226	2015-07-02 02:03:58 +00:00
Adam Nemet	c4866d29dd	[LAA] Try to prove non-wrapping of pointers if SCEV cannot Summary: Scalar evolution does not propagate the non-wrapping flags to values that are derived from a non-wrapping induction variable because the non-wrapping property could be flow-sensitive. This change is a first attempt to establish the non-wrapping property in some simple cases. The main idea is to look through the operations defining the pointer. As long as we arrive to a non-wrapping AddRec via a small chain of non-wrapping instruction, the pointer should not wrap either. I believe that this essentially is what Andy described in http://article.gmane.org/gmane.comp.compilers.llvm.cvs/220731 as the way forward. Reviewers: aschwaighofer, nadav, sanjoy, atrick Reviewed By: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10472 llvm-svn: 240798	2015-06-26 17:25:43 +00:00
Simon Pilgrim	d862e0f33a	[X86][SSE][CostModel] Added full set of sitofp/uitofp costings for SSE2/AVX/AVX2/AVX512F. Merged separate (but equivalent) SSE2/AVX512F tests. Removed codegen tests since these are already done better in test/CodeGen/X86. The actual cost values still need to be updated to match recent codegen improvements. llvm-svn: 240219	2015-06-20 14:58:01 +00:00
Sanjoy Das	18c9dd31de	[CallGraph] Given -print-callgraph a stable printing order. Summary: Since FunctionMap has llvm::Function pointers as keys, the order in which the traversal happens can differ from run to run, causing spurious FileCheck failures. Have CallGraph::print sort the CallGraphNodes by name before printing them. Reviewers: bogner, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10575 llvm-svn: 240191	2015-06-19 23:20:31 +00:00
Simon Pilgrim	de94fa6438	[X86][SSE][CostModel] Fixed uitofp/sitofp cost target tests to specify sse2/avx2/avx512f directly instead of via a cpu model. llvm-svn: 240062	2015-06-18 21:26:01 +00:00
Sanjoy Das	c65d43e649	[CallGraph] Teach the CallGraph about non-leaf intrinsics. Summary: Currently intrinsics don't affect the creation of the call graph. This is not accurate with respect to statepoint and patchpoint intrinsics -- these do call (or invoke) LLVM level functions. This change fixes this inconsistency by adding a call to the external node for call sites that call these non-leaf intrinsics. This coupled with the fact that these intrinsics also escape the function pointer they call gives us a conservatively correct call graph. Reviewers: reames, chandlerc, atrick, pgavlin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10526 llvm-svn: 240039	2015-06-18 19:28:26 +00:00
David Majnemer	7fddeccb8b	Move the personality function from LandingPadInst to Function The personality routine currently lives in the LandingPadInst. This isn't desirable because: - All LandingPadInsts in the same function must have the same personality routine. This means that each LandingPadInst beyond the first has an operand which produces no additional information. - There is ongoing work to introduce EH IR constructs other than LandingPadInst. Moving the personality routine off of any one particular Instruction and onto the parent function seems a lot better than have N different places a personality function can sneak onto an exceptional function. Differential Revision: http://reviews.llvm.org/D10429 llvm-svn: 239940	2015-06-17 20:52:32 +00:00
Diego Novillo	8c49a57266	Add documentation for new backedge mass propagation in irregular loops. Tweak test cases and rename headerIndexFor -> getHeaderIndex. llvm-svn: 239915	2015-06-17 16:28:22 +00:00
Diego Novillo	9a779623d9	Fix PR 23525 - Separate header mass propagation in irregular loops. Summary: When propagating mass through irregular loops, the mass flowing through each loop header may not be equal. This was causing wrong frequencies to be computed for irregular loop headers. Fixed by keeping track of masses flowing through each of the headers in an irregular loop. To do this, we now keep track of per-header backedge weights. After the loop mass is distributed through the loop, the backedge weights are used to re-distribute the loop mass to the loop headers. Since each backedge will have a mass proportional to the different branch weights, the loop headers will end up with a more approximate weight distribution (as opposed to the current distribution that assumes that every loop header is the same). Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10348 llvm-svn: 239843	2015-06-16 19:10:58 +00:00
Jingyue Wu	12b0c2835e	[ValueTracking] do not overwrite analysis results already computed Summary: ValueTracking used to overwrite the analysis results computed from assumes and dominating conditions. This patch fixes this issue. Test Plan: test/Analysis/ValueTracking/assume.ll Reviewers: hfinkel, majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10283 llvm-svn: 239718	2015-06-15 05:46:29 +00:00
Simon Pilgrim	5965680d53	[X86][SSE] Vectorized i8 and i16 shift operators This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements: 1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node. 2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper). 3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial. The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however. Tested on SSE2, SSE41 and AVX machines. Differential Revision: http://reviews.llvm.org/D9474 llvm-svn: 239509	2015-06-11 07:46:37 +00:00
Artur Pilipenko	7fad7e57e8	Minor refactoring of GEP handling in isDereferenceablePointer For GEP instructions isDereferenceablePointer checks that all indices are constant and within bounds. Replace this index calculation logic to a call to accumulateConstantOffset. Separated from the http://reviews.llvm.org/D9791 Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D9874 llvm-svn: 239299	2015-06-08 11:58:13 +00:00
Silviu Baranga	98a137196a	[LAA] Fix estimation of number of memchecks Summary: We need to add a runtime memcheck for pair of accesses (x,y) where at least one of x and y are writes. Assuming we have w writes and r reads, currently this number is estimated as being w* (w+r-1). This estimation will count (write,write) pairs twice and will overestimate the number of checks required. This change adds a getNumberOfChecks method to RuntimePointerCheck, which will count the number of runtime checks needed (similar in implementation to needsAnyChecking) and uses it to produce the correct number of runtime checks. Test Plan: llvm test suite spec2k spec2k6 Performance results: no changes observed (not surprising since the formula for 1 writer is basically the same, which would covers most cases - at least with the current check limit). Reviewers: anemet Reviewed By: anemet Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D10217 llvm-svn: 239295	2015-06-08 10:27:06 +00:00
Hao Liu	751004a67d	[LoopAccessAnalysis] Teach LAA to check the memory dependence between strided accesses. Differential Revision: http://reviews.llvm.org/D9368 llvm-svn: 239285	2015-06-08 04:48:37 +00:00
Jingyue Wu	a84feb1727	[DependenceAnalysis] Extend unifySubscriptType for handling coupled subscript groups. Summary: In continuation to an earlier commit to DependenceAnalysis.cpp by jingyue (r222100), the type for all subscripts in a coupled group need to be the same since constraints from one subscript may be propagated to another during testing. During testing, new SCEVs may be created and the operands for these need to be the same. This patch extends unifySubscriptType() to work on lists of subscript pairs, ensuring a common extended type for all of them. Test Plan: Added a test case to NonCanonicalizedSubscript.ll which causes dependence analysis to crash without this fix. All regression tests pass. Reviewers: spop, sebpop, jingyue Reviewed By: jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9698 llvm-svn: 238573	2015-05-29 16:58:08 +00:00
Hans Wennborg	2ecb8dc39c	Revert r236894 "[BasicAA] Fix zext & sext handling" This seems to have caused PR23626: Clang miscompiles webkit's base64 decoder llvm-svn: 237984	2015-05-22 01:27:37 +00:00
Artur Pilipenko	31619a847e	Fix memory-dereferenceable.ll test One of the testcases introduced by D9365 had incorrect !dereferenceable metadata on load. It must fail but it doesn't due to incorrect order of CHECK/CHECK-NOT commands in test. Fixed both. Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D9877 llvm-svn: 237897	2015-05-21 12:51:38 +00:00
Sanjoy Das	f999547d11	Dereferenceable, dereferenceable_or_null metadata for loads Summary: Introduce dereferenceable, dereferenceable_or_null metadata for loads with the same semantic as corresponding attributes. This patch depends on http://reviews.llvm.org/D9253 Patch by Artur Pilipenko! Reviewers: hfinkel, sanjoy, reames Reviewed By: sanjoy, reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9365 llvm-svn: 237720	2015-05-19 20:10:19 +00:00
Adam Nemet	df3dc5b9ca	[LoopAccesses] If shouldRetryWithRuntimeCheck, reset InterestingDependences When dependence analysis encounters a non-constant distance between memory accesses it aborts the analysis and falls back to run-time checks only. In this case we weren't resetting the array of dependences. llvm-svn: 237574	2015-05-18 15:37:03 +00:00
Adam Nemet	c3384320f2	[LoopAccesses] Rearrange printed lines in -analyze "Store to invariant address..." is moved as the last line. This is not the prime result of the analysis. Plus it simplifies some of the tests. llvm-svn: 237573	2015-05-18 15:36:57 +00:00
James Molloy	c0661aeaf8	[DependenceAnalysis] Fix for PR21585: collectUpperBound triggers asserts collectUpperBound hits an assertion when the back edge count is wider then the desired type. If that happens, truncate the backedge count. Patch by Philip Pfaffe! llvm-svn: 237439	2015-05-15 12:17:22 +00:00
Sanjoy Das	a1d39ba940	[Statepoints] Support for "patchable" statepoints. Summary: This change adds two new parameters to the statepoint intrinsic, `i64 id` and `i32 num_patch_bytes`. `id` gets propagated to the ID field in the generated StackMap section. If the `num_patch_bytes` is non-zero then the statepoint is lowered to `num_patch_bytes` bytes of nops instead of a call (the spill and reload code remains unchanged). A non-zero `num_patch_bytes` is useful in situations where a language runtime requires complete control over how a call is lowered. This change brings statepoints one step closer to patchpoints. With some additional work (that is not part of this patch) it should be possible to get rid of `TargetOpcode::STATEPOINT` altogether. PlaceSafepoints generates `statepoint` wrappers with `id` set to `0xABCDEF00` (the old default value for the ID reported in the stackmap) and `num_patch_bytes` set to `0`. This can be made more sophisticated later. Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9546 llvm-svn: 237214	2015-05-12 23:52:24 +00:00
Adam Nemet	0fd96dddf5	[Testsuite] Renumber metadata in ScopedNoAliasAA test to match CHECK lines Summary: Now it's much easier to follow what's happening in this test. Also removed some unused metadata entries. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9601 llvm-svn: 236981	2015-05-11 09:10:14 +00:00
Sanjoy Das	14f5080aa1	[BasicAA] Fix zext & sext handling Summary: There are several unhandled edge cases in BasicAA's GetLinearExpression method. This changes fixes outstanding issues, including zext / sext of a constant with the sign bit set, and the refusal to decompose zexts or sexts of wrapping arithmetic. Test Plan: Unit tests added in //q.ext.ll//. Patch by Nick White. Reviewers: hfinkel, sanjoy Reviewed By: hfinkel, sanjoy Subscribers: sanjoy, llvm-commits, hfinkel Differential Revision: http://reviews.llvm.org/D6682 llvm-svn: 236894	2015-05-08 18:58:55 +00:00
Pat Gavlin	cc0431d1c0	Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888	2015-05-08 18:07:42 +00:00
Diego Novillo	de5b8016ab	Fix information loss in branch probability computation. Summary: This addresses PR 22718. When branch weights are too large, they were being clamped to the range [1, MaxWeightForBB]. But this clamping is only applied to edges that go outside the range, so it distorts the relative branch probabilities. This patch changes the weight calculation to scale every branch so the relative probabilities are preserved. The scaling is done differently now. First, all the branch weights are added up, and if the sum exceeds 32 bits, it computes an integer scale to bring all the weights within the range. The patch fixes an existing test that had slightly wrong branch probabilities due to the previous clamping. It now gets branch weights scaled accordingly. Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9442 llvm-svn: 236750	2015-05-07 17:22:06 +00:00
Adam Nemet	e2b885c4bc	[getUnderlyingOjbects] Analyze loop PHIs further to remove false positives Specifically, if a pointer accesses different underlying objects in each iteration, don't look through the phi node defining the pointer. The motivating case is the underlyling-objects-2.ll testcase. Consider the loop nest: int *A; for (i) for (j) A[i][j] = A[i-1][j] B[j] This loop is transformed by Load-PRE to stash away A[i] for the next iteration of the outer loop: Curr = A[0]; // Prev_0 for (i: 1..N) { Prev = Curr; // Prev = PHI (Prev_0, Curr) Curr = A[i]; for (j: 0..N) Curr[j] = Prev[j] * B[j] } Since A[i] and A[i-1] are likely to be independent pointers, getUnderlyingObjects should not assume that Curr and Prev share the same underlying object in the inner loop. If it did we would try to dependence-analyze Curr and Prev and the analysis of the corresponding SCEVs would fail with non-constant distance. To fix this, the getUnderlyingObjects API is extended with an optional LoopInfo parameter. This is effectively what controls whether we want the above behavior or the original. Currently, I only changed to use this approach for LoopAccessAnalysis. The other testcase is to guard the opposite case where we do want to look through the loop PHI. If we step through an array by incrementing a pointer, the underlying object is the incoming value of the phi as the loop is entered. Fixes rdar://problem/19566729 llvm-svn: 235634	2015-04-23 20:09:20 +00:00
Brendon Cahoon	f9751ad1b0	Fix a type mismatch assert in SCEV division An assert was triggered when attempting to create a new SCEV with operands of different types in the visitAddRecExpr. In this test case, the operand types of the numerator and denominator are different. The SCEV division code should generate a conservative answer when this happens. Differential Revision: http://reviews.llvm.org/D9021 llvm-svn: 235511	2015-04-22 15:06:40 +00:00
Brendon Cahoon	a57cc8bc81	Recognize n/1 in the SCEV divide function n/1 generates a quotient equal to n and a remainder of 0. If this case is not recognized, then the SCEV divide() function can return a remainder that is greater than or equal to the denominator, which means the delinearized subscripts for the test case will be incorrect. Differential Revision: http://reviews.llvm.org/D9003 llvm-svn: 235311	2015-04-20 16:03:28 +00:00
David Blaikie	23af64846f	[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction See r230786 and r230794 for similar changes to gep and load respectively. Call is a bit different because it often doesn't have a single explicit type - usually the type is deduced from the arguments, and just the return type is explicit. In those cases there's no need to change the IR. When that's not the case, the IR usually contains the pointer type of the first operand - but since typed pointers are going away, that representation is insufficient so I'm just stripping the "pointerness" of the explicit type away. This does make the IR a bit weird - it /sort of/ reads like the type of the first operand: "call void () %x(" but %x is actually of type "void ()" and will eventually be just of type "ptr". But this seems not too bad and I don't think it would benefit from repeating the type ("void (), void () %x(" and then eventually "void (), ptr %x(") as has been done with gep and load. This also has a side benefit: since the explicit type is no longer a pointer, there's no ambiguity between an explicit type and a function that returns a function pointer. Previously this case needed an explicit type (eg: a function returning a void() function was written as "call void () () * @x(" rather than "call void () * @x(" because of the ambiguity between a function returning a pointer to a void() function and a function returning void). No ambiguity means even function pointer return types can just be written alone, without writing the whole function's type. This leaves /only/ the varargs case where the explicit type is required. Given the special type syntax in call instructions, the regex-fu used for migration was a bit more involved in its own unique way (as every one of these is) so here it is. Use it in conjunction with the apply.sh script and associated find/xargs commands I've provided in rr230786 to migrate your out of tree tests. Do let me know if any of this doesn't cover your cases & we can iterate on a more general script/regexes to help others with out of tree tests. About 9 test cases couldn't be automatically migrated - half of those were functions returning function pointers, where I just had to manually delete the function argument types now that we didn't need an explicit function type there. The other half were typedefs of function types used in calls - just had to manually drop the * from those. import fileinput import sys import re pat = re.compile(r'((?:=\|:\|^\|\s)call\s(?:[^@]?))(\s$\|\s(?:(?:\[\[[a-zA-Z0-9_]+\]\]\|[@%](?:(")?[\\\?@a-zA-Z0-9_.]?(?(3)"\|)\|{{.}}))(?:$\|$)\|undef\|inttoptr\|bitcast\|null\|asm).$)') addrspace_end = re.compile(r"addrspace\(\d+$\s\$") func_end = re.compile("(?:void.\|\)\s)\$") def conv(match, line): if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)): return line return line[:match.start()] + match.group(1)[:match.group(1).rfind('')].rstrip() + match.group(2) + line[match.end():] for line in sys.stdin: sys.stdout.write(conv(re.search(pat, line), line)) llvm-svn: 235145	2015-04-16 23:24:18 +00:00
Daniel Jasper	a73f3d51ac	Re-apply r234898 and fix tests. This commit makes LLVM not estimate branch probabilities when doing a single bit bitmask tests. The code that originally made me discover this is: if ((a & 0x1) == 0x1) { .. } In this case we don't actually have any branch probability information and should not assume to have any. LLVM transforms this into: %and = and i32 %a, 1 %tobool = icmp eq i32 %and, 0 So, in this case, the result of a bitwise and is compared against 0, but nevertheless, we should not assume to have probability information. CodeGen/ARM/2013-10-11-select-stalls.ll started failing because the changed probabilities changed the results of ARMBaseInstrInfo::isProfitableToIfCvt() and led to an Ifcvt of the diamond in the test. AFAICT, the test was never meant to test this and thus changing the test input slightly to not change the probabilities seems like the best way to preserve the meaning of the test. llvm-svn: 234979	2015-04-15 06:24:07 +00:00
Rafael Espindola	2defea0efa	Revert "The code that originally made me discover this is:" This reverts commit r234898. CodeGen/ARM/2013-10-11-select-stalls.ll was faling. llvm-svn: 234903	2015-04-14 15:56:33 +00:00
Daniel Jasper	8229ebb926	The code that originally made me discover this is: if ((a & 0x1) == 0x1) { .. } In this case we don't actually have any branch probability information and should not assume to have any. LLVM transforms this into: %and = and i32 %a, 1 %tobool = icmp eq i32 %and, 0 So, in this case, the result of a bitwise and is compared against 0, but nevertheless, we should not assume to have probability information. llvm-svn: 234898	2015-04-14 15:20:37 +00:00
Adam Nemet	26da8e9800	[LoopAccesses] Properly print whether memchecks are needed Fix oversight in -analyze output. PtrRtCheck contains the pointers that need to be checked against each other and not whether memchecks are necessary. For instance in the testcase PtrRtCheck has four elements but all no-alias so no checking is necessary. llvm-svn: 234833	2015-04-14 01:12:55 +00:00
Jingyue Wu	5da831cc31	Divergence analysis for GPU programs Summary: Some optimizations such as jump threading and loop unswitching can negatively affect performance when applied to divergent branches. The divergence analysis added in this patch conservatively estimates which branches in a GPU program can diverge. This information can then help LLVM to run certain optimizations selectively. Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll Reviewers: resistor, hfinkel, eliben, meheff, jholewinski Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8576 llvm-svn: 234567	2015-04-10 05:03:50 +00:00
Adam Nemet	ce48250f11	[LoopAccesses] Allow analysis to complete in the presence of uniform stores (Re-apply r234361 with a fix and a testcase for PR23157) Both run-time pointer checking and the dependence analysis are capable of dealing with uniform addresses. I.e. it's really just an orthogonal property of the loop that the analysis computes. Run-time pointer checking will only try to reason about SCEVAddRec pointers or else gives up. If the uniform pointer turns out the be a SCEVAddRec in an outer loop, the run-time checks generated will be correct (start and end bounds would be equal). In case of the dependence analysis, we work again with SCEVs. When compared against a loop-dependent address of the same underlying object, the difference of the two SCEVs won't be constant. This will result in returning an Unknown dependence for the pair. When compared against another uniform access, the difference would be constant and we should return the right type of dependence (forward/backward/etc). The changes also adds support to query this property of the loop and modify the vectorizer to use this. Patch by Ashutosh Nema! llvm-svn: 234424	2015-04-08 17:48:40 +00:00
Adam Nemet	e09a928c80	Revert "[LoopAccesses] Allow analysis to complete in the presence of uniform stores" This reverts commit r234361. It caused PR23157. llvm-svn: 234387	2015-04-08 04:16:55 +00:00
Adam Nemet	0515c33b70	[LoopAccesses] Allow analysis to complete in the presence of uniform stores Both run-time pointer checking and the dependence analysis are capable of dealing with uniform addresses. I.e. it's really just an orthogonal property of the loop that the analysis computes. Run-time pointer checking will only try to reason about SCEVAddRec pointers or else gives up. If the uniform pointer turns out the be a SCEVAddRec in an outer loop, the run-time checks generated will be correct (start and end bounds would be equal). In case of the dependence analysis, we work again with SCEVs. When compared against a loop-dependent address of the same underlying object, the difference of the two SCEVs won't be constant. This will result in returning an Unknown dependence for the pair. When compared against another uniform access, the difference would be constant and we should return the right type of dependence (forward/backward/etc). The changes also adds support to query this property of the loop and modify the vectorizer to use this. Patch by Ashutosh Nema! llvm-svn: 234361	2015-04-07 21:46:16 +00:00
Adam Nemet	4ff3ed5960	[LoopAccesses] Remove unused global variables in tests llvm-svn: 233887	2015-04-02 04:42:51 +00:00
Sanjoy Das	b864c1f76f	[SCEV] Look at backedge dominating conditions (re-land r233447). Summary: This change teaches ScalarEvolution::isLoopBackedgeGuardedByCond to look at edges within the loop body that dominate the latch. We don't do an exhaustive search for all possible edges, but only a quick walk up the dom tree. This re-lands r233447. r233447 was reverted because it caused massive compile-time regressions. This change has a fix for the same issue. llvm-svn: 233829	2015-04-01 18:24:06 +00:00
Diego Novillo	a354f48891	Remove 4,096 loop scale limitation. Summary: This is part 1 of fixes to address the problems described in https://llvm.org/bugs/show_bug.cgi?id=22719. The restriction to limit loop scales to 4,096 does not really prevent overflows anymore, as the underlying algorithm has changed and does not seem to suffer from this problem. Additionally, artificially restricting loop scales to such a low number skews frequency information, making loops of equal hotness appear to have very different hotness properties. The only loops that are artificially restricted to a scale of 4096 are infinite loops (those loops with an exit mass of 0). This prevents infinite loops from skewing the frequencies of other regions in the CFG. At the end of propagation, frequencies are scaled to values that take no more than 64 bits to represent. When the range of frequencies to be represented fits within 61 bits, it pushes up the scaling factor to a minimum of 8 to better distinguish small frequency values. Otherwise, small frequency values are all saturated down at 1. Tested on x86_64. Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8718 llvm-svn: 233826	2015-04-01 17:42:27 +00:00
Daniel Jasper	87e848c7dc	Revert "[SCEV] Look at backedge dominating conditions." This leads to terribly slow compile times under MSAN. More discussion on the commit thread of r233447. llvm-svn: 233529	2015-03-30 09:30:02 +00:00
Sanjoy Das	fe0e0fff92	[SCEV] Look at backedge dominating conditions. Summary: This change teaches ScalarEvolution::isLoopBackedgeGuardedByCond to look at edges within the loop body that dominate the latch. We don't do an exhaustive search for all possible edges, but only a quick walk up the dom tree. Reviewers: atrick, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8627 llvm-svn: 233447	2015-03-27 23:18:08 +00:00
Philip Reames	e1bf27045d	Require a GC strategy be specified for functions which use gc.statepoint This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory. llvm-svn: 233357	2015-03-27 05:09:33 +00:00
Sanjoy Das	14598830fe	[SCEV] Revert bailout added in r75511. Summary: With the introduction of MarkPendingLoopPredicates in r157092, I don't think the bailout is needed anymore. Reviewers: atrick, nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8624 llvm-svn: 233296	2015-03-26 17:28:26 +00:00
Sanjoy Das	e561fee2a4	[ValueTracking] Fix PR23011. Summary: `ComputeNumSignBits` returns incorrect results for `srem` instructions. This change fixes the issue and adds a test case. Reviewers: nadav, nicholas, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8600 llvm-svn: 233225	2015-03-25 22:33:53 +00:00
Nick Lewycky	be8af48824	When simplifying a SCEV truncate by distributing, consider it a simplification to replace a cast, even if we end up with a trunc around the term. Fixes PR22960! llvm-svn: 232794	2015-03-20 02:25:00 +00:00
Simon Pilgrim	5ec5c9cafe	[X86][SSE] Avoid scalarization of v2i64 vector shifts (REAPPLIED) Fixed broken tests. Differential Revision: http://reviews.llvm.org/D8416 llvm-svn: 232682	2015-03-18 22:18:51 +00:00
Sanjoy Das	cb8bca1777	[SCEV] Make isImpliedCond smarter. Summary: This change teaches isImpliedCond to infer things like "X sgt 0" => "X - 1 sgt -1". The `ConstantRange` class has the logic to do the heavy lifting, this change simply gets ScalarEvolution to exploit that when reasonable. Depends on D8345 Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8346 llvm-svn: 232576	2015-03-18 00:41:29 +00:00
Michael Zolotukhin	c3d60efb1d	TTI: Honour cost model for estimating cost of vector-intrinsic and calls. Review: http://reviews.llvm.org/D8096 llvm-svn: 232528	2015-03-17 19:37:28 +00:00
Sanjoy Das	f1e9e1df25	[SCEV] Fix PR22856. Summary: ScalarEvolutionExpander assumes that the header block of a loop is a legal place to have a use for a phi node. This is true only for phis that are either in the header or dominate the header block, but it is not true for phi nodes that are strictly internal to the loop body. This change teaches ScalarEvolutionExpander to place uses of PHI nodes in the basic block the PHI nodes belong to. This is always legal, and `hoistIVInc` ensures that the said position dominates `IsomorphicInc`. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8311 llvm-svn: 232189	2015-03-13 18:31:19 +00:00
David Blaikie	f72d05bc7b	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184	2015-03-13 18:20:45 +00:00
Owen Anderson	41a185c521	Teach TBAA analysis to report errors on cyclic TBAA metadata rather than hanging. llvm-svn: 232144	2015-03-13 07:09:33 +00:00
Nick Lewycky	b6ef9a14de	When forming an addrec out of a phi don't just look at the last computation and steal its flags for our own, there may be other computations in the middle. Check whether the LHS of the computation is the phi itself and then we know it's safe to steal the flags. Fixes PR22795. There's a missed optimization opportunity where we could look at the full chain of computation and take the intersection of the flags instead of only looking one instruction deep. llvm-svn: 232134	2015-03-13 01:37:52 +00:00
Adam Nemet	58913d65ad	[LoopAccesses 3/3] Print the dependences with -analyze The dependences are now expose through the new getInterestingDependences API so we can use that with -analyze too and fix the FIXME. This lets us remove the test that relied on -debug to check the dependences. llvm-svn: 231807	2015-03-10 17:40:43 +00:00
Karthik Bhat	8d7f7eda14	Fix a memory corruption in Dependency Analysis. This crash occurs due to memory corruption when trying to update dependency direction based on Constraints. This crash was observed during lnt regression of Polybench benchmark test case dynprog. Review: http://reviews.llvm.org/D8059 llvm-svn: 231788	2015-03-10 14:32:02 +00:00
Karthik Bhat	8d0099bdab	Fix a crash in Dependency Analysis. This crash in Dependency analysis is because we assume here that in case of UsefulGEP both source and destination have the same number of operands which may not be true. This incorrect assumption results in crash while populating Pairs. Fix the same. This crash was observed during lnt regression for code such as- struct s{ int A[10][10]; int C[10][10][10]; } S; void dep_constraint_crash_test(int k,int N) { for( int i=0;i<N;i++) for( int j=0;j<N;j++) S.A[0][0] = S.C[0][0][k]; } Review: http://reviews.llvm.org/D8162 llvm-svn: 231784	2015-03-10 13:31:03 +00:00
George Burgess IV	ab03af277b	Added ConstantExpr support to CFLAA. CFLAA didn't know how to properly handle ConstantExprs; it would silently ignore them. This was a problem if the ConstantExpr is, say, a GEP of a global, because CFLAA wouldn't realize that there's a global there. :) llvm-svn: 231743	2015-03-10 02:58:15 +00:00
George Burgess IV	b54a8d62a4	Added special handling for inttoptr in CFLAA. We now treat pointers given to ptrtoint and pointers retrieved from inttoptr as similar to arguments or globals (can alias anything, etc.) This solves some of the problems we were having with giving incorrect results. llvm-svn: 231741	2015-03-10 02:40:06 +00:00
Sanjoy Das	91b5477aad	[SCEV] Unify getUnsignedRange and getSignedRange Summary: This removes some duplicated code, and also helps optimization: e.g. in the test case added, `%idx ULT 128` in `@x` is not currently optimized to `true` by `-indvars` but will be, after this change. The only functional change in ths commit is that for add recurrences, ScalarEvolution::getRange will be more aggressive -- computing the unsigned (resp. signed) range for a SCEVAddRecExpr will now look at the NSW (resp. NUW) bits and check for signed (resp. unsigned) overflow. This can be a strict improvement in some cases (such as the attached test case), and should be no worse in other cases. Reviewers: atrick, nlewycky Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8142 llvm-svn: 231709	2015-03-09 21:43:43 +00:00
Sanjoy Das	f257452986	[SCEV] Add a `scalar-evolution-print-constant-ranges' option Summary: Unused in this commit, but will be used in a subsequent change (D8142) by a FileCheck test. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8143 llvm-svn: 231708	2015-03-09 21:43:39 +00:00
Sanjoy Das	9e2c5010f6	[SCEV] make SCEV smarter about proving no-wrap. Summary: Teach SCEV to prove no overflow for an add recurrence by proving something about the range of another add recurrence a loop-invariant distance away from it. Reviewers: atrick, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7980 llvm-svn: 231305	2015-03-04 22:24:17 +00:00
Mehdi Amini	46a43556db	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270	2015-03-04 18:43:29 +00:00
Reid Kleckner	2f05d4c91f	Make llvm.eh.begincatch use an outparam Ultimately, __CxxFrameHandler3 needs us to put a stack offset in a table, and it will take responsibility for copying the exception object into that slot. Modelling the exception object as an SSA value returned by begincatch isn't going to work in general, so make it use an output parameter. Reviewers: andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D7920 llvm-svn: 231086	2015-03-03 17:41:09 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
David Blaikie	79e6c74981	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Adam Nemet	9cc0c3999d	[LV/LoopAccesses] Backward dependences are not safe just because the accesses are via different types Noticed this while generalizing the code for loop distribution. I confirmed with Arnold that this was indeed a bug and managed to create a testcase. llvm-svn: 230647	2015-02-26 17:58:48 +00:00
Sanjoy Das	dcc84db264	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap (The change was landed in r230280 and caused the regression PR22674. This version contains a fix and a test-case for PR22674). When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230533	2015-02-25 20:02:59 +00:00
Hans Wennborg	953d6fb84e	Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap" This caused PR22674, failing this assert: Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed. llvm-svn: 230341	2015-02-24 16:19:29 +00:00
Sanjoy Das	b14010d28b	Fix bug 22641 The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. NOTE: I had accidentally committed an unrelated change with the commit message of this change in r230275 (r230275 was reverted in r230279). This is the correct change for this commit message. Differential Revision: http://reviews.llvm.org/D7808 llvm-svn: 230291	2015-02-24 01:02:42 +00:00
Sanjoy Das	18c243b933	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. NOTE: this change was landed with an incorrect commit message in rL230275 and was reverted for that reason in rL230279. This commit message is the correct one. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230280	2015-02-23 23:22:58 +00:00
Sanjoy Das	c9cf0151cf	Revert 230275. 230275 got committed with an incorrect commit message due to a mixup on my side. Will re-land in a few moments with the correct commit message. llvm-svn: 230279	2015-02-23 23:13:22 +00:00
Sanjoy Das	913dfd8f7f	Fix bug 22641 The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. Differential Revision: http://reviews.llvm.org/D7808 llvm-svn: 230275	2015-02-23 22:55:13 +00:00
Adam Nemet	e91cc6ef93	[LoopAccesses] Add -analyze support The LoopInfo in combination with depth_first is used to enumerate the loops. Right now -analyze is not yet complete. It only prints the result of the analysis, the report and the run-time checks. Printing the unsafe depedences will require a bit more reshuffling which I'd like to do in a follow-on to this patchset. Unsafe dependences are currently checked via -debug-only=loop-accesses in the new test. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229898	2015-02-19 19:15:19 +00:00
Chandler Carruth	b89464a9b6	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835	2015-02-19 10:36:19 +00:00
NAKAMURA Takumi	fa520c5f49	Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector. r229622: "[LoopAccesses] Make VectorizerParams global" r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it" r229624: "[LoopAccesses] Cache the result of canVectorizeMemory" r229626: "[LoopAccesses] Create the analysis pass" r229628: "[LoopAccesses] Change debug messages from LV to LAA" r229630: "[LoopAccesses] Add canAnalyzeLoop" r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport" r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport" r229633: "[LoopAccesses] Add -analyze support" r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference" r229638: "Analysis: fix buildbots" llvm-svn: 229650	2015-02-18 08:34:47 +00:00
Adam Nemet	75bc2d111f	[LoopAccesses] Add -analyze support The LoopInfo in combination with depth_first is used to enumerate the loops. Right now -analyze is not yet complete. It only prints the result of the analysis, the report and the run-time checks. Printing the unsafe depedences will require a bit more reshuffling which I'd like to do in a follow-on to this patchset. Unsafe dependences are currently checked via -debug-only=loop-accesses in the new test. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229633	2015-02-18 03:44:30 +00:00
Sanjoy Das	4153f47026	Generalize getExtendAddRecStart to work with both sign and zero extensions. This change also removes `DEBUG(dbgs() << "SCEV: untested prestart overflow check\n");` because that case has a unit test now. Differential Revision: http://reviews.llvm.org/D7645 llvm-svn: 229600	2015-02-18 01:47:07 +00:00
George Burgess IV	33305e7280	Fixed a bug where CFLAA would crash the compiler. We would crash if we couldn't locate a Function that either Location's Value belonged to. Now we just print out a debug message and return conservatively. llvm-svn: 228901	2015-02-12 03:07:07 +00:00
Andrew Kaylor	78b53dbcc1	Adding support for llvm.eh.begincatch and llvm.eh.endcatch intrinsics and beginning the documentation of native Windows exception handling. Differential Revision: http://reviews.llvm.org/D7398 llvm-svn: 228733	2015-02-10 19:52:43 +00:00
Ramkumar Ramachandra	82ab65c7cd	MemDerefPrinter: Require DataLayoutPass for higher accuracy Without a valid data layout, deferenceable(N) doesn't get parsed or propagated. Since this is the key item we are testing, add a dependency on the pass. Differential Revision: http://reviews.llvm.org/D7508 llvm-svn: 228611	2015-02-09 21:50:03 +00:00
Ramkumar Ramachandra	a7343d65f4	isDereferenceablePointer: look through gc.relocate calls While a theoretical GC might change dereferenceability on collection, there is no such known collector and no need to account for the case with a flag yet. Differential Revision: http://reviews.llvm.org/D7454 llvm-svn: 228606	2015-02-09 21:08:03 +00:00
Sanjoy Das	bf5d870dfa	Bugfix: SCEV incorrectly marks certain add recurrences as nsw When creating a scev for sext({X,+,Y}), scev checks if the expression is equivalent to {sext X,+,zext Y}. If it can prove that, it also tags the original {X,+,Y} as <nsw>, which is not correct. In the test case I run `-scalar-evolution` twice because the bug manifests only once SCEV has run through and seen the `sext` expressions (and then does a in-place mutation on {X,+,Y}). Differential Revision: http://reviews.llvm.org/D7495 llvm-svn: 228586	2015-02-09 18:34:55 +00:00
Johannes Doerfert	2683e5676c	Allow ScalarEvolution to catch more min/max cases For the attached test case different types are used in the ICmpInst and SelectInst that represent the min/max expressions. However, if the ICmpInst type is smaller a comparison with the sign/zero extended operands would have yielded the same result. This situation might arise after the instruction combination pass was applied. Differential Revision: http://reviews.llvm.org/D7338 llvm-svn: 228572	2015-02-09 12:34:23 +00:00
Sanjoy Das	f2e931cae9	Bugfix: ScalarEvolution incorrectly assumes that the start of certain add recurrences don't overflow. This change makes the optimization more restrictive. It still assumes that an overflowing `add nsw` is undefined behavior; and this change will need revisiting once we have a consistent semantics for poison values. Differential Revision: http://reviews.llvm.org/D7331 llvm-svn: 228552	2015-02-08 22:52:17 +00:00
Ahmed Bougacha	29efe3b287	[BasicAA] Try to disambiguate GEPs through arrays of structs into different fields. We can show that two GEPs off of the same (possibly multidimensional) array of structs, into different fields, can't alias. Quoting: For two GEPOperators GEP1 and GEP2, if we find that: - both GEPs begin indexing from the exact same pointer; - the last indices in both GEPs are constants, indexing into a struct; - said indices are different, hence,the pointed-to fields are different; - and both GEPs only index through arrays prior to that; this lets us determine that the struct that GEP1 indexes into and the struct that GEP2 indexes into must either precisely overlap or be completely disjoint. Because they cannot partially overlap, indexing into different non-overlapping fields of the struct will never alias. The other BasicAA::aliasGEP rules worked in some cases, but not all (for example, the i32x3 struct in the testcase). We can add this simple ad-hoc rule to complement them. rdar://19717375 Differential Revision: http://reviews.llvm.org/D7453 llvm-svn: 228498	2015-02-07 17:04:29 +00:00

... 2 3 4 5 6 ...

985 Commits