llvm-project

Commit Graph

Author	SHA1	Message	Date
George Burgess IV	33305e7280	Fixed a bug where CFLAA would crash the compiler. We would crash if we couldn't locate a Function that either Location's Value belonged to. Now we just print out a debug message and return conservatively. llvm-svn: 228901	2015-02-12 03:07:07 +00:00
Andrew Kaylor	78b53dbcc1	Adding support for llvm.eh.begincatch and llvm.eh.endcatch intrinsics and beginning the documentation of native Windows exception handling. Differential Revision: http://reviews.llvm.org/D7398 llvm-svn: 228733	2015-02-10 19:52:43 +00:00
Ramkumar Ramachandra	82ab65c7cd	MemDerefPrinter: Require DataLayoutPass for higher accuracy Without a valid data layout, deferenceable(N) doesn't get parsed or propagated. Since this is the key item we are testing, add a dependency on the pass. Differential Revision: http://reviews.llvm.org/D7508 llvm-svn: 228611	2015-02-09 21:50:03 +00:00
Ramkumar Ramachandra	a7343d65f4	isDereferenceablePointer: look through gc.relocate calls While a theoretical GC might change dereferenceability on collection, there is no such known collector and no need to account for the case with a flag yet. Differential Revision: http://reviews.llvm.org/D7454 llvm-svn: 228606	2015-02-09 21:08:03 +00:00
Sanjoy Das	bf5d870dfa	Bugfix: SCEV incorrectly marks certain add recurrences as nsw When creating a scev for sext({X,+,Y}), scev checks if the expression is equivalent to {sext X,+,zext Y}. If it can prove that, it also tags the original {X,+,Y} as <nsw>, which is not correct. In the test case I run `-scalar-evolution` twice because the bug manifests only once SCEV has run through and seen the `sext` expressions (and then does a in-place mutation on {X,+,Y}). Differential Revision: http://reviews.llvm.org/D7495 llvm-svn: 228586	2015-02-09 18:34:55 +00:00
Johannes Doerfert	2683e5676c	Allow ScalarEvolution to catch more min/max cases For the attached test case different types are used in the ICmpInst and SelectInst that represent the min/max expressions. However, if the ICmpInst type is smaller a comparison with the sign/zero extended operands would have yielded the same result. This situation might arise after the instruction combination pass was applied. Differential Revision: http://reviews.llvm.org/D7338 llvm-svn: 228572	2015-02-09 12:34:23 +00:00
Sanjoy Das	f2e931cae9	Bugfix: ScalarEvolution incorrectly assumes that the start of certain add recurrences don't overflow. This change makes the optimization more restrictive. It still assumes that an overflowing `add nsw` is undefined behavior; and this change will need revisiting once we have a consistent semantics for poison values. Differential Revision: http://reviews.llvm.org/D7331 llvm-svn: 228552	2015-02-08 22:52:17 +00:00
Ahmed Bougacha	29efe3b287	[BasicAA] Try to disambiguate GEPs through arrays of structs into different fields. We can show that two GEPs off of the same (possibly multidimensional) array of structs, into different fields, can't alias. Quoting: For two GEPOperators GEP1 and GEP2, if we find that: - both GEPs begin indexing from the exact same pointer; - the last indices in both GEPs are constants, indexing into a struct; - said indices are different, hence,the pointed-to fields are different; - and both GEPs only index through arrays prior to that; this lets us determine that the struct that GEP1 indexes into and the struct that GEP2 indexes into must either precisely overlap or be completely disjoint. Because they cannot partially overlap, indexing into different non-overlapping fields of the struct will never alias. The other BasicAA::aliasGEP rules worked in some cases, but not all (for example, the i32x3 struct in the testcase). We can add this simple ad-hoc rule to complement them. rdar://19717375 Differential Revision: http://reviews.llvm.org/D7453 llvm-svn: 228498	2015-02-07 17:04:29 +00:00
Ramkumar Ramachandra	8378ac3684	Introduce print-memderefs to test isDereferenceablePointer Since testing the function indirectly is tricky, introduce a direct print-memderefs pass, in the same spirit as print-memdeps, which prints dereferenceability information matched by FileCheck. Differential Revision: http://reviews.llvm.org/D7075 llvm-svn: 228369	2015-02-06 01:46:42 +00:00
Ahmed Bougacha	a7e33112f0	[BasicAA] Add datalayouts to make some tests more useful. NFC. Fixes PR22462: two of the tests have regressed for a while, but were using CHECK-NOT to match "May:". The actual output was changed to "MayAlias:" at some point, which made the tests useless. Two others return MayAlias only because of a lack of analysis; BasicAA returns PartialAlias in those cases, when a datalayout is present. llvm-svn: 228346	2015-02-05 21:10:14 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
Daniel Berlin	16f7a52628	Fix incorrect partial aliasing Update testcases llvm-svn: 227099	2015-01-26 17:31:17 +00:00
Elena Demikhovsky	a3232f764e	Implemented cost model for masked load/store operations. llvm-svn: 227035	2015-01-25 08:44:46 +00:00
Chandler Carruth	df8b223dea	[PM] Actually add the new pass manager support for the assumption cache. I had already factored this analysis specifically to enable doing this, but hadn't actually committed the necessary wiring to get at this from the new pass manager. This also nicely shows how the separate cache object can be directly managed by the new pass manager. This analysis didn't have any direct tests and so I've added a printer pass and a boring test case. I chose to print the i1 value which is being assumed rather than the call to llvm.assume as that seems much more useful for testing... but suggestions on an even better printing strategy welcome. My main goal was to make sure things actually work. =] llvm-svn: 226868	2015-01-22 21:53:09 +00:00
Sanjoy Das	cb47366366	Make ScalarEvolution less aggressive with respect to no-wrap flags. ScalarEvolution currently lowers a subtraction recurrence to an add recurrence with the same no-wrap flags as the subtraction. This is incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y` and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch fixes the issue, and adds two test cases demonstrating the bug. Differential Revision: http://reviews.llvm.org/D7081 llvm-svn: 226755	2015-01-22 00:48:47 +00:00
George Burgess IV	a1255d3a74	Added test to cover the CFLAA bitset indexing bug. llvm-svn: 226710	2015-01-21 22:39:35 +00:00
Chandler Carruth	aaf0b4cd57	[PM] Port LoopInfo to the new pass manager, adding both a LoopAnalysis pass and a LoopPrinterPass with the expected associated wiring. I've added a RUN line to the only test case (!!!) we have that actually prints loops. Everything seems to be working. This is somewhat exciting as this is the first analysis using another analysis to go in for the new pass manager. =D I also believe it is the last analysis necessary for porting instcombine, but of course I may yet discover more. llvm-svn: 226560	2015-01-20 10:58:50 +00:00
Chandler Carruth	64764b446b	[PM] Port domtree to the new pass manager (at last). This adds the domtree analysis to the new pass manager. The analysis returns the same DominatorTree result entity used by the old pass manager and essentially all of the code is shared. We just have different boilerplate for running and printing the analysis. I've converted one test to run in both modes just to make sure this is exercised while both are live in the tree. llvm-svn: 225969	2015-01-14 10:19:28 +00:00
Chandler Carruth	ef7a9fb63b	[dom] Add a basic dominator tree test. Correct, we have zero basic testing of the dominator tree in the regression test suite. There is a single test that even prints it out, and that test only checks a single line of the output. There are a handful of tests that check post dominators, but all of those are looking for bugs rather than just exercising the basic machinery. This test is super boring and unexciting. But hey, it's something. I needed there to be something so I could switch the basic test to run with both the old and new pass manager. llvm-svn: 225936	2015-01-14 03:34:55 +00:00
Sanjoy Das	81401d4b19	Fix PR22179. We were incorrectly inferring nsw for certain SCEVs. We can be more aggressive here (see Richard Smith's comment on http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just focuses on correctness. Differential Revision: http://reviews.llvm.org/D6914 llvm-svn: 225591	2015-01-10 23:41:24 +00:00
Tim Northover	eb16112e97	Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" It's not really expected to stick around, last time it provoked a weird LTO build failure that I can't reproduce now, and the bot logs are long gone. I'll re-revert it if the failures recur. Original description: Perform Scalar PRE on gep indices that feed loads before doing Load PRE. llvm-svn: 225536	2015-01-09 19:19:56 +00:00
Duncan P. N. Exon Smith	be7ea19b58	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
Duncan P. N. Exon Smith	57cbdfc99a	BFI: Saturate when combining edges to a successor When a loop gets bundled up, its outgoing edges are quite large, and can just barely overflow 64-bits. If one successor has multiple incoming edges -- and that successor is getting all the incoming mass -- combining just its edges can overflow. Handle that by saturating rather than asserting. This fixes PR21622. llvm-svn: 223500	2014-12-05 19:13:42 +00:00
Manman Ren	c67109313c	Revert r222039 because of bot failure. http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/ Hopefully, bot will be green. If not, we will re-submit the commit. llvm-svn: 222287	2014-11-19 00:13:26 +00:00
Jingyue Wu	0fa125a77d	[DependenceAnalysis] Allow subscripts of different types Summary: Several places in DependenceAnalysis assumes both SCEVs in a subscript pair share the same integer type. For instance, isKnownPredicate calls SE->getMinusSCEV(X, Y) which asserts X and Y share the same type. However, DependenceAnalysis fails to ensure this assumption when producing a subscript pair, causing tests such as NonCanonicalizedSubscript to crash. With this patch, DependenceAnalysis runs unifySubscriptType before producing any subscript pair, ensuring the assumption. Test Plan: Added NonCanonicalizedSubscript.ll on which DependenceAnalysis before the fix crashed because subscripts have different types. Reviewers: spop, sebpop, jingyue Reviewed By: jingyue Subscribers: eliben, meheff, llvm-commits Differential Revision: http://reviews.llvm.org/D6289 llvm-svn: 222100	2014-11-16 16:52:44 +00:00
David Majnemer	0df1d12476	ScalarEvolution: HowFarToZero was wrongly using signed division HowFarToZero was supposed to use unsigned division in order to calculate the backedge taken count. However, SCEVDivision::divide performs signed division. Unless I am mistaken, no users of SCEVDivision actually want signed arithmetic: switch to udiv and urem. This fixes PR21578. llvm-svn: 222093	2014-11-16 07:30:35 +00:00
Chad Rosier	1ff4c0bf0b	Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" This commit updates the failing test in Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll The failing test is sensitive to the order in which we process loads. This version turns on the RPO traversal instead of the while DT traversal in GVN. The new test code is functionally same just the order of loads that are eliminated is swapped. This new version also fixes an issue where GVN splits a critical edge and potentially invalidate the RPO/DT iterator. llvm-svn: 222039	2014-11-14 21:09:13 +00:00
Elena Demikhovsky	d5e95b57e0	AVX-512: SINT_TO_FP cost model and some bugfixes Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883	2014-11-13 11:46:16 +00:00
Hal Finkel	45ba2c10e4	Revert r219432 - "Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information.""" Let's try this again... This reverts r219432, plus a bug fix. Description of the bug in r219432 (by Nick): The bug was using AllPositive to break out of the loop; if the loop break condition i != e is changed to i != e && AllPositive then the test_modulo_analysis_with_global test I've added will fail as the Modulo will be calculated incorrectly (as the last loop iteration is skipped, so Modulo isn't updated with its Scale). Nick also adds this comment: ComputeSignBit is safe to use in loops as it takes into account phi nodes, and the == EK_ZeroEx check is safe in loops as, no matter how the variable changes between iterations, zero-extensions will always guarantee a zero sign bit. The isValueEqualInPotentialCycles check is therefore definitely not needed as all the variable analysis holds no matter how the variables change between loop iterations. And this patch also adds another enhancement to GetLinearExpression - basically to convert ConstantInts to Offsets (see test_const_eval and test_const_eval_scaled for the situations this improves). Original commit message: This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick): The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 221876	2014-11-13 09:16:54 +00:00
Quentin Colombet	360460ba64	[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> llvm-svn: 221657	2014-11-11 02:23:47 +00:00
Rafael Espindola	1c99344156	Use FileCheck in a few tests. llvm-svn: 221459	2014-11-06 15:05:51 +00:00
Bradley Smith	9992b167ae	[SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags In a case where we have a no {un,}signed wrap flag on the increment, if RHS - Start is constant then we can avoid inserting a max operation bewteen the two, since we can statically determine which is greater. This allows us to unroll loops such as: void testcase3(int v) { for (int i=v; i<=v+1; ++i) f(i); } llvm-svn: 220960	2014-10-31 11:40:32 +00:00
Hal Finkel	dd38c0b876	[DSE] Remove no-data-layout-only type-based overlap checking DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035	2014-10-17 11:56:00 +00:00
Rafael Espindola	11aaaeebe0	Delete -std-compile-opts. These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951	2014-10-16 20:00:02 +00:00
Hal Finkel	db5f86a9bf	[CFL-AA] CFL-AA should not assert on an va_arg instruction The CFL-AA implementation was missing a visit* routine for va_arg instructions, causing it to assert when run on a function that had one. For now, handle these in a conservative way. Fixes PR20954. llvm-svn: 219718	2014-10-14 20:51:26 +00:00
Sanjoy Das	1f05c51e5e	This patch teaches ScalarEvolution to pick and use !range metadata. It also makes it more aggressive in querying range information by adding a call to isKnownPredicateWithRanges to isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond. phabricator: http://reviews.llvm.org/D5638 Reviewed by: atrick, hfinkel llvm-svn: 219532	2014-10-10 21:22:34 +00:00
Mark Heffernan	2beab5f0b4	This patch de-pessimizes the calculation of loop trip counts in ScalarEvolution in the presence of multiple exits. Previously all loops exits had to have identical counts for a loop trip count to be considered computable. This pessimization was implemented by calling getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock) inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME in the comments of that function). The pessimization was added to fix a corner case involving undefined behavior (pr/16130). This patch more precisely handles the undefined behavior case allowing the pessimization to be removed. ControlsExit replaces IsSubExpr to more precisely track the case where undefined behavior is expected to occur. Because undefined behavior is tracked more precisely we can remove MustExit from ExitLimit. MustExit was used to track the case where the limit was computed potentially assuming undefined behavior even if undefined behavior didn't necessarily occur. llvm-svn: 219517	2014-10-10 17:39:11 +00:00
Hal Finkel	cbbd3df836	Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information."" This reverts commit r219135 -- still causing miscompiles in SPEC it seems... llvm-svn: 219432	2014-10-09 19:48:12 +00:00
Hal Finkel	43ce71f1b1	[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information." This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick) The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 219135	2014-10-06 18:37:59 +00:00
Hal Finkel	8eae3ad2ff	[CFL-AA] Update for handling of globals and more tests We used to return PartialAlias if either variable being queried interacted with arguments or globals. AFAICT, we can change this to only returning MayAlias iff both variables being queried interacted with arguments or globals. Also, adding some basic functionality tests: some basic IPA tests, checking that we give conservative responses with arguments/globals thrown in the mix, and ensuring that we trace values through stores and loads. Note that saying that 'x' interacted with arguments or globals means that the Attributes of the StratifiedSet that 'x' belongs to has any bits set. Patch by George Burgess IV, thanks! llvm-svn: 219122	2014-10-06 14:42:56 +00:00
Lang Hames	89e9c17235	[BasicAA] Revert r218714 - Make better use of zext and sign information. This patch broke 447.dealII on Darwin. I'm currently working on a reduced test-case, but reverting for now to keep the bots happy. <rdar://problem/18530107> llvm-svn: 218944	2014-10-03 01:33:47 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Hal Finkel	fd86317989	[BasicAA] Make better use of zext and sign information Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 218714	2014-09-30 22:43:40 +00:00
Elena Demikhovsky	27012478d2	AVX-512: added cost for some AVX-512 instructions llvm-svn: 217863	2014-09-16 07:57:37 +00:00
Hal Finkel	cc4f31d3d7	Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector types The default implementation of getCmpSelInstrCost, which provides the cost of icmp/fcmp/select instructions, did not deal sensibly with illegal vector types that were scalarized. We'd ask for the legalization cost of the vector type, which would return something like (4, f64) given an input of <4 x double>, and we'd then check the TLI status of the ISD opcode on that scalar type. This would result in querying (ISD::VSELECT, f64), for example. Amusingly enough, ISD::VSELECT on scalar types is marked as Legal by default (as with most other operations), and most backends never change this because VSELECT is never generated on scalars. However, seeing the resulting operation as Legal, we'd neglect to add the scalarization cost before returning. The result is that we'd grossly under-estimate the cost of cmps/selects on illegal vector types. Now, if type legalization clearly results in scalarization, we skip the early return and add the scalarization cost. llvm-svn: 217859	2014-09-16 04:35:50 +00:00
Matt Arsenault	f090bda1d5	CHECK-LABELize test llvm-svn: 217797	2014-09-15 17:56:56 +00:00
James Molloy	a9f47b6bae	[ARM] Teach the cost model that cross-class copies are costly. Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case. llvm-svn: 217674	2014-09-12 13:29:40 +00:00
Hal Finkel	cebf0cc210	Make use @llvm.assume for loop guards in ScalarEvolution This adds a basic (but important) use of @llvm.assume calls in ScalarEvolution. When SE is attempting to validate a condition guarding a loop (such as whether or not the loop count can be zero), this check should also include dominating assumptions. llvm-svn: 217348	2014-09-07 21:37:59 +00:00
Hal Finkel	7529c55c02	Add a CFL Alias Analysis implementation This provides an implementation of CFL alias analysis (including some supporting data structures). Currently, we don't have any extremely fancy features, sans some interprocedural analysis (i.e. no field sensitivity, etc.), and we do best sitting behind BasicAA + TBAA. In such a configuration, we take ~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA answered with NoAlias by this algorithm. Patch by George Burgess IV (with minor modifications by me -- mostly adapting some BasicAA tests), thanks! llvm-svn: 216970	2014-09-02 21:43:13 +00:00
Hal Finkel	930469107d	Add @llvm.assume, lowering, and some basic properties This is the first commit in a series that add an @llvm.assume intrinsic which can be used to provide the optimizer with a condition it may assume to be true (when the control flow would hit the intrinsic call). Some basic properties are added here: - llvm.invariant(true) is dead. - llvm.invariant(false) is unreachable (this directly corresponds to the documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef). The intrinsic is tagged as writing arbitrarily, in order to maintain control dependencies. BasicAA has been updated, however, to return NoModRef for any particular location-based query so that we don't unnecessarily block code motion. llvm-svn: 213973	2014-07-25 21:13:35 +00:00
Hal Finkel	ff0bcb60c9	Convert noalias parameter attributes into noalias metadata during inlining This functionality is currently turned off by default. Part of the motivation for introducing scoped-noalias metadata is to enable the preservation of noalias parameter attribute information after inlining. Sometimes this can be inferred from the code in the caller after inlining, but often we simply lose valuable information. The overall process if fairly simple: 1. Create a new unqiue scope domain. 2. For each (used) noalias parameter, create a new alias scope. 3. For each pointer, collect the underlying objects. Add a noalias scope for each noalias parameter from which we're not derived (and has not been captured prior to that point). 4. Add an alias.scope for each noalias parameter from which we might be derived (or has been captured before that point). Note that the capture checks apply only if one of the underlying objects is not an identified function-local object. llvm-svn: 213949	2014-07-25 15:50:08 +00:00
Hal Finkel	029cde639c	Simplify and improve scoped-noalias metadata semantics In the process of fixing the noalias parameter -> metadata conversion process that will take place during inlining (which will be committed soon, but not turned on by default), I have come to realize that the semantics provided by yesterday's commit are not really what we want. Here's why: void foo(noalias a, noalias b, noalias c, bool x) { q = x ? a : b; c = q; } Generically, we know that c does not alias with a and with b (so there is an 'and' in what we know we're not), and we know that q might be derived from a or from *b (so there is an 'or' in what we know that we are). So we do not want the semantics currently, where any noalias scope matching any alias.scope causes a NoAlias return. What we want to know is that the noalias scopes form a superset of the alias.scope list (meaning that all the things we know we're not is a superset of all of things the other instruction might be). Making that change, however, introduces a composibility problem. If we inline once, adding the noalias metadata, and then inline again adding more, and we append new scopes onto the noalias and alias.scope lists each time. But, this means that we could change what was a NoAlias result previously into a MayAlias result because we appended an additional scope onto one of the alias.scope lists. So, instead of giving scopes the ability to have parents (which I had borrowed from the TBAA implementation, but seems increasingly unlikely to be useful in practice), I've given them domains. The subset/superset condition now applies within each domain independently, and we only need it to hold in one domain. Each time we inline, we add the new scopes in a new scope domain, and everything now composes nicely. In addition, this simplifies the implementation. llvm-svn: 213948	2014-07-25 15:50:02 +00:00
Hal Finkel	9414665a3b	Add scoped-noalias metadata This commit adds scoped noalias metadata. The primary motivations for this feature are: 1. To preserve noalias function attribute information when inlining 2. To provide the ability to model block-scope C99 restrict pointers Neither of these two abilities are added here, only the necessary infrastructure. In fact, there should be no change to existing functionality, only the addition of new features. The logic that converts noalias function parameters into this metadata during inlining will come in a follow-up commit. What is added here is the ability to generally specify noalias memory-access sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA nodes: !scope0 = metadata !{ metadata !"scope of foo()" } !scope1 = metadata !{ metadata !"scope 1", metadata !scope0 } !scope2 = metadata !{ metadata !"scope 2", metadata !scope0 } !scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 } !scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 } Loads and stores can be tagged with an alias-analysis scope, and also, with a noalias tag for a specific scope: ... = load %ptr1, !alias.scope !{ !scope1 } ... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 } When evaluating an aliasing query, if one of the instructions is associated with an alias.scope id that is identical to the noalias scope associated with the other instruction, or is a descendant (in the scope hierarchy) of the noalias scope associated with the other instruction, then the two memory accesses are assumed not to alias. Note that is the first element of the scope metadata is a string, then it can be combined accross functions and translation units. The string can be replaced by a self-reference to create globally unqiue scope identifiers. [Note: This overview is slightly stylized, since the metadata nodes really need to just be numbers (!0 instead of !scope0), and the scope lists are also global unnamed metadata.] Existing noalias metadata in a callee is "cloned" for use by the inlined code. This is necessary because the aliasing scopes are unique to each call site (because of possible control dependencies on the aliasing properties). For example, consider a function: foo(noalias a, noalias b) { a = b; } that gets inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } -- now just because we know that a1 does not alias with b1 at the first call site, and a2 does not alias with b2 at the second call site, we cannot let inlining these functons have the metadata imply that a1 does not alias with b2. llvm-svn: 213864	2014-07-24 14:25:39 +00:00
Hal Finkel	cc39b67530	AA metadata refactoring (introduce AAMDNodes) In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859	2014-07-24 12:16:19 +00:00
Hal Finkel	354e23b029	Improve BasicAA CS-CS queries (redux) This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219	2014-07-17 01:28:25 +00:00
Nick Lewycky	7a63c3b389	Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303. llvm-svn: 213024	2014-07-15 00:53:38 +00:00
Hal Finkel	8ae0f8d618	Improve BasicAA CS-CS queries BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 212572	2014-07-08 23:16:49 +00:00
Andrea Di Biagio	c8e8bda58f	[CostModel][x86] Improved cost model for alternate shuffles. This patch: 1) Improves the cost model for x86 alternate shuffles (originally added at revision 211339); 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles. Alternate shuffles are a special kind of blend; on x86, we can often easily lowered alternate shuffled into single blend instruction (depending on the subtarget features). The existing cost model didn't take into account subtarget features. Also, it had a couple of "dead" entries for vector types that are never legal (example: on x86 types v2i32 and v2f32 are not legal; those are always either promoted or widened to 128-bit vector types). The new x86 cost model takes into account what target features we have before returning the shuffle cost (i.e. the number of instructions after the blend is lowered/expanded). This patch also teaches the Cost Model Analysis how to identify and analyze alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions): - added function 'isAlternateVectorMask'; - added some logic to check if an instruction is a alternate shuffle and, in case, call the target specific TTI to get the corresponding shuffle cost; - added a test to verify the cost model analysis on alternate shuffles. llvm-svn: 212296	2014-07-03 22:24:18 +00:00
Alp Toker	d3d017cf00	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Tobias Grosser	40ac10085a	ScalarEvolution: Derive element size from the type of the loaded element Before, we where looking at the size of the pointer type that specifies the location from which to load the element. This did not make any sense at all. This change fixes a bug in the delinearization where we failed to delinerize certain load instructions. llvm-svn: 210435	2014-06-08 19:21:20 +00:00
Sebastian Pop	a6e5860513	remove constant terms The delinearization is needed only to remove the non linearity induced by expressions involving multiplications of parameters and induction variables. There is no problem in dealing with constant times parameters, or constant times an induction variable. For this reason, the current patch discards all constant terms and multipliers before running the delinearization algorithm on the terms. The only thing remaining in the term expressions are parameters and multiply expressions of parameters: these simplified term expressions are passed to the array shape recognizer that will not recognize constant dimensions anymore: these will be recognized as different strides in parametric subscripts. The only important special case of a constant dimension is the size of elements. Instead of relying on the delinearization to infer the size of an element, compute the element size from the base address type. This is a much more precise way of computing the element size than before, as we would have mixed together the size of an element with the strides of the innermost dimension. llvm-svn: 209691	2014-05-27 22:41:45 +00:00
Dinesh Dwivedi	c0e6703360	Adding testcase for PR18886. Differential Revision: http://reviews.llvm.org/D3837 llvm-svn: 209645	2014-05-27 06:44:25 +00:00
Tim Northover	3b0846e8f7	AArch64/ARM64: move ARM64 into AArch64's place This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577	2014-05-24 12:50:23 +00:00
Andrew Trick	b429083aff	Test case comments. Fix sloppiness. llvm-svn: 209551	2014-05-23 20:46:21 +00:00
Andrew Trick	839e30b2c0	Fix and improve SCEV ComputeBackedgeTankCount. This is a follow-up to r209358: PR19799: Indvars miscompile due to an incorrect max backedge taken count from SCEV. That fix was incomplete as pointed out by Arnold and Michael Z. The code was also too confusing. It needed a careful rewrite with more unit tests. This version will also happen to optimize more cases. <rdar://17005101> PR19799: Indvars miscompile... llvm-svn: 209545	2014-05-23 19:47:13 +00:00
Andrew Trick	e255359b57	Fix a bug in SCEV's backedge taken count computation from my prior fix in Jan. This has to do with the trip count computation for loops with multiple exits, which is quite subtle. Most passes just ask for a single trip count number, so we must be conservative assuming any exit could be taken. Normally, we rely on the "exact" trip count, which was correctly given as "unknown". However, SCEV also gives a "max" back-edge taken count. The loops max BE taken count is conservatively a maximum over the max of each exit's non-exiting iterations count. Note that some exit tests can be skipped so the max loop back-edge taken count can actually exceed the max non-exiting iterations for some exits. However, when we know the loop latch cannot be skipped, we can directly use its max taken count disregarding other exits. I previously took the minimum here without checking whether the other exit could be skipped. The correct, and simpler thing to do here is just to directly use the loop latch's max non-exiting iterations as the loops max back-edge count. In the problematic test case, the first loop exit had a max of zero non-exiting iterations, but could be skipped. The loop latch was known not to be skipped but had max of one non-exiting iteration. We incorrectly claimed the loop back-edge could be taken zero times, when it is actually taken one time. Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count. Loop %for.body.i: max backedge-taken count is 1. llvm-svn: 209358	2014-05-22 00:37:03 +00:00
Filipe Cabecinhas	7b12d773e3	Added tests for the cost of lowering VSELECT instructions. llvm-svn: 209045	2014-05-16 22:47:58 +00:00
Alp Toker	beaca19c7c	Fix typos llvm-svn: 208839	2014-05-15 01:52:21 +00:00
Adam Nemet	63e4b30f79	[Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfg Tested by comparing make check VERBOSE=1 before and after to make sure no tests are missed. (VERBOSE=1 prints the list of tests.) Only one test :( remains where .cpp is required: tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp \| FileCheck %s --check-prefix=STDOUT The topic was discussed in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html llvm-svn: 208621	2014-05-12 19:57:31 +00:00
Sebastian Pop	b1a548f72d	do not assert when delinearization fails llvm-svn: 208615	2014-05-12 19:01:53 +00:00
Sebastian Pop	45dd14bac2	add testcase for r208237: do not collect undef terms llvm-svn: 208347	2014-05-08 18:38:58 +00:00
Sebastian Pop	448712b1a6	split delinearization pass in 3 steps To compute the dimensions of the array in a unique way, we split the delinearization analysis in three steps: - find parametric terms in all memory access functions - compute the array dimensions from the set of terms - compute the delinearized access functions for each dimension The first step is executed on all the memory access functions such that we gather all the patterns in which an array is accessed. The second step reduces all this information in a unique description of the sizes of the array. The third step is delinearizing each memory access function following the common description of the shape of the array computed in step 2. This rewrite of the delinearization pass also solves a problem we had with the previous implementation: because the previous algorithm was by induction on the structure of the SCEV, it would not correctly recognize the shape of the array when the memory access was not following the nesting of the loops: for example, see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll ; void foo(long n, long m, long o, double A[n][m][o]) { ; ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; for (long k = 0; k < o; k++) ; A[i][k][j] = 1.0; Starting with this patch we no longer delinearize access functions that do not contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll ;; for (long int i = 0; i < 100; i++) ;; for (long int j = 0; j < 100; j++) { ;; A[2i - 4j] = i; ;; B++ = A[6i + 8*j]; these accesses will not be delinearized as the upper bound of the loops are constants, and their access functions do not contain SCEVUnknown parameters. llvm-svn: 208232	2014-05-07 18:01:20 +00:00
Benjamin Kramer	1625bfccbe	TTI: Estimate @llvm.fmuladd cost as fmul + fadd when FMA's aren't legal on the target. llvm-svn: 208115	2014-05-06 18:36:23 +00:00
Duncan P. N. Exon Smith	c5a3139ebd	Reapply "blockfreq: Approximate irreducible control flow" This reverts commit r207287, reapplying r207286. I'm hoping that declaring an explicit struct and instantiating `addBlockEdges()` directly works around the GCC crash from r207286. This is a lot more boilerplate, though. llvm-svn: 207438	2014-04-28 20:02:29 +00:00
Benjamin Kramer	ce4b3fee72	X86TTI: Adjust sdiv cost now that we can lower it on plain SSE2. Includes a fix for a horrible typo that caused all SDIV costs to be slightly off :) llvm-svn: 207371	2014-04-27 18:47:54 +00:00
Benjamin Kramer	7c3722724b	X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now. Turn vectorization back on. llvm-svn: 207320	2014-04-26 14:53:05 +00:00
Duncan P. N. Exon Smith	42292ceaa9	Revert "blockfreq: Approximate irreducible control flow" This reverts commit r207286. It causes an ICE on the cmake-llvm-x86_64-linux buildbot [1]: llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function: llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035 [1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio llvm-svn: 207287	2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith	384d0e8ad4	blockfreq: Approximate irreducible control flow Previously, irreducible backedges were ignored. With this commit, irreducible SCCs are discovered on the fly, and modelled as loops with multiple headers. This approximation specifies the headers of irreducible sub-SCCs as its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if we insert a new block that intercepts all the edges to the headers. All backedges and entries to the irreducible SCC point to this imaginary block. This imaginary block has an edge (with even probability) to each header block. The result is now reasonable enough that I've added a number of testcases for irreducible control flow. I've outlined in `BlockFrequencyInfoImpl.h` ways to improve the approximation. <rdar://problem/14292693> llvm-svn: 207286	2014-04-25 23:08:57 +00:00
Duncan P. N. Exon Smith	cb7d29d30c	blockfreq: Only one mass distribution per node Remove the concepts of "forward" and "general" mass distributions, which was wrong. The split might have made sense in an early version of the algorithm, but it's definitely wrong now. <rdar://problem/14292693> llvm-svn: 207195	2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith	84408d1fda	blockfreq: Use better branch weights in multiexit test The branch weights were even before. Make them different. <rdar://problem/14292693> llvm-svn: 207193	2014-04-25 04:38:37 +00:00
Duncan P. N. Exon Smith	58c8948a0c	blockfreq: Clean up irreducible testcases Strip irreducible testcases to pure control flow. The function calls made the branch weights more believable but cluttered it up a lot. There isn't going to be any constant analysis here, so just use dumb branch logic to clarify the important parts. <rdar://problem/14292693> llvm-svn: 207192	2014-04-25 04:38:35 +00:00
Duncan P. N. Exon Smith	b3380ea60a	blockfreq: Skip irreducible backedges inside functions The branch that skips irreducible backedges was only active when propagating mass at the top-level. In particular, when propagating mass through a loop recognized by `LoopInfo` with irreducible control flow inside, irreducible backedges would not be skipped. Not sure where that idea came from, but the result was that mass was lost until after loop exit. Added a testcase that covers this case. llvm-svn: 206860	2014-04-22 03:31:53 +00:00
Duncan P. N. Exon Smith	10be9a8868	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206707, reapplying r206704. The preceding commit to CalcSpillWeights should have sorted out the failing buildbots. <rdar://problem/14292693> llvm-svn: 206766	2014-04-21 17:57:07 +00:00
Duncan P. N. Exon Smith	e63327e967	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206704, as expected. llvm-svn: 206707	2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith	875ddfac75	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite. I've done a careful audit, added some asserts, and fixed a couple of bugs (unfortunately, they were in unlikely code paths). There's a small chance that this will appease the failing bots [1][2]. (If so, great!) If not, I have a follow-up commit ready that will temporarily add -debug-only=block-freq to the two failing tests, allowing me to compare the code path between what the failing bots and what my machines (and the rest of the bots) are doing. Once I've triggered those builds, I'll revert both commits so the bots go green again. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 <rdar://problem/14292693> llvm-svn: 206704	2014-04-19 22:34:26 +00:00
Duncan P. N. Exon Smith	76b813619a	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206666, as planned. Still stumped on why the bots are failing. Sanitizer bots haven't turned anything up. If anyone can help me debug either of the failures (referenced in r206666) I'll owe them a beer. (In the meantime, I'll be auditing my patch for undefined behaviour.) llvm-svn: 206677	2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith	b3caf3646f	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206628, reapplying r206622 (and r206626). Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind don't tell us anything, but we're hoping the msan bot will catch it. So, I'm applying this again to get more feedback from the bots. I'll leave it in long enough to trigger builds in at least the sanitizer buildbots (it was failing for reasons unrelated to my commit last time it was in), and hopefully a few others.... and then I expect to revert a third time. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 llvm-svn: 206666	2014-04-18 22:30:03 +00:00
Duncan P. N. Exon Smith	0842ff36a6	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206622 and the MSVC fixup in r206626. Apparently the remotely failing tests are still failing, despite my attempt to fix the nondeterminism in r206621. llvm-svn: 206628	2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith	f8361d127a	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206622	2014-04-18 17:22:25 +00:00
Chandler Carruth	18eadd9260	[LCG] Add support for building persistent and connected SCCs to the LazyCallGraph. This is the start of the whole point of this different abstraction, but it is just the initial bits. Here is a run-down of what's going on here. I'm planning to incorporate some (or all) of this into comments going forward, hopefully with better editing and wording. =] The crux of the problem with the traditional way of building SCCs is that they are ephemeral. The new pass manager however really needs the ability to associate analysis passes and results of analysis passes with SCCs in order to expose these analysis passes to the SCC passes. Making this work is kind-of the whole point of the new pass manager. =] So, when we're building SCCs for the call graph, we actually want to build persistent nodes that stick around and can be reasoned about later. We'd also like the ability to walk the SCC graph in more complex ways than just the traditional postorder traversal of the current CGSCC walk. That means that in addition to being persistent, the SCCs need to be connected into a useful graph structure. However, we still want the SCCs to be formed lazily where possible. These constraints are quite hard to satisfy with the SCC iterator. Also, using that would bypass our ability to actually add data to the nodes of the call graph to facilite implementing the Tarjan walk. So I've re-implemented things in a more direct and embedded way. This immediately makes it easy to get the persistence and connectivity correct, and it also allows leveraging the existing nodes to simplify the algorithm. I've worked somewhat to make this implementation more closely follow the traditional paper's nomenclature and strategy, although it is still a bit obtuse because it isn't recursive, using an explicit stack and a tail call instead, and it is interruptable, resuming each time we need another SCC. The other tricky bit here, and what actually took almost all the time and trials and errors I spent building this, is exactly what graph structure to build for the SCCs. The naive thing to build is the call graph in its newly acyclic form. I wrote about 4 versions of this which did precisely this. Inevitably, when I experimented with them across various use cases, they became incredibly awkward. It was all implementable, but it felt like a complete wrong fit. Square peg, round hole. There were two overriding aspects that pushed me in a different direction: 1) We want to discover the SCC graph in a postorder fashion. That means the root node will be the last node we find. Using the call-SCC DAG as the graph structure of the SCCs results in an orphaned graph until we discover a root. 2) We will eventually want to walk the SCC graph in parallel, exploring distinct sub-graphs independently, and synchronizing at merge points. This again is not helped by the call-SCC DAG structure. The structure which, quite surprisingly, ended up being completely natural to use is the inverse of the call-SCC DAG. We add the leaf SCCs to the graph as "roots", and have edges to the caller SCCs. Once I switched to building this structure, everything just fell into place elegantly. Aside from general cleanups (there are FIXMEs and too few comments overall) that are still needed, the other missing piece of this is support for iterating across levels of the SCC graph. These will become useful for implementing #2, but they aren't an immediate priority. Once SCCs are in good shape, I'll be working on adding mutation support for incremental updates and adding the pass manager that this analysis enables. llvm-svn: 206581	2014-04-18 10:50:32 +00:00
Duncan P. N. Exon Smith	e576167df8	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556	2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith	12e68e1733	blockfreq: Rewrite BlockFrequencyInfoImpl Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is not incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548	2014-04-18 01:57:45 +00:00
Akira Hatanaka	5638b89944	Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly. Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights. This fixes PR18705 and <rdar://problem/15991090>. Differential Revision: http://reviews.llvm.org/D3363 llvm-svn: 206194	2014-04-14 16:56:19 +00:00
Hal Finkel	56bf297e3a	Don't assert in BasicTTI::getMemoryOpCost for non-simple types BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for example, non-power-of-two vector types return non-simple EVTs (not MVT::Other). llvm-svn: 206150	2014-04-14 05:59:09 +00:00
Sebastian Pop	b5b84e0963	in findGCD of multiply expr return the gcd we used to return 1 instead of the gcd llvm-svn: 205800	2014-04-08 21:21:05 +00:00
Hal Finkel	de0b413ec0	[PowerPC] Adjust load/store costs in PPCTTI This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. llvm-svn: 205658	2014-04-04 23:51:18 +00:00
Hal Finkel	6fd19ab35e	Account for scalarization costs in BasicTTI::getMemoryOpCost for extending vector loads When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) llvm-svn: 205495	2014-04-03 00:53:59 +00:00
Hal Finkel	55312debee	Fix multi-register costs in BasicTTI::getCastInstrCost For an cast (extension, etc.), the currently logic predicts a low cost if the associated operation (keyed on the destination type) is legal (or promoted). This is not true when the number of values required to legalize the type is changing. For example, <8 x i16> being sign extended by <8 x i32> is not generically cheap on PPC with VSX, even though sign extension to v4i32 is legal, because two output v4i32 values are required compared to the single v8i16 input value, and without custom logic in the target, this conversion will scalarize. llvm-svn: 205487	2014-04-02 23:18:54 +00:00

1 2 3 4 5 ...

693 Commits