llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjoy Das	b864c1f76f	[SCEV] Look at backedge dominating conditions (re-land r233447). Summary: This change teaches ScalarEvolution::isLoopBackedgeGuardedByCond to look at edges within the loop body that dominate the latch. We don't do an exhaustive search for all possible edges, but only a quick walk up the dom tree. This re-lands r233447. r233447 was reverted because it caused massive compile-time regressions. This change has a fix for the same issue. llvm-svn: 233829	2015-04-01 18:24:06 +00:00
Diego Novillo	a354f48891	Remove 4,096 loop scale limitation. Summary: This is part 1 of fixes to address the problems described in https://llvm.org/bugs/show_bug.cgi?id=22719. The restriction to limit loop scales to 4,096 does not really prevent overflows anymore, as the underlying algorithm has changed and does not seem to suffer from this problem. Additionally, artificially restricting loop scales to such a low number skews frequency information, making loops of equal hotness appear to have very different hotness properties. The only loops that are artificially restricted to a scale of 4096 are infinite loops (those loops with an exit mass of 0). This prevents infinite loops from skewing the frequencies of other regions in the CFG. At the end of propagation, frequencies are scaled to values that take no more than 64 bits to represent. When the range of frequencies to be represented fits within 61 bits, it pushes up the scaling factor to a minimum of 8 to better distinguish small frequency values. Otherwise, small frequency values are all saturated down at 1. Tested on x86_64. Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8718 llvm-svn: 233826	2015-04-01 17:42:27 +00:00
Daniel Jasper	87e848c7dc	Revert "[SCEV] Look at backedge dominating conditions." This leads to terribly slow compile times under MSAN. More discussion on the commit thread of r233447. llvm-svn: 233529	2015-03-30 09:30:02 +00:00
Sanjoy Das	fe0e0fff92	[SCEV] Look at backedge dominating conditions. Summary: This change teaches ScalarEvolution::isLoopBackedgeGuardedByCond to look at edges within the loop body that dominate the latch. We don't do an exhaustive search for all possible edges, but only a quick walk up the dom tree. Reviewers: atrick, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8627 llvm-svn: 233447	2015-03-27 23:18:08 +00:00
Philip Reames	e1bf27045d	Require a GC strategy be specified for functions which use gc.statepoint This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory. llvm-svn: 233357	2015-03-27 05:09:33 +00:00
Sanjoy Das	14598830fe	[SCEV] Revert bailout added in r75511. Summary: With the introduction of MarkPendingLoopPredicates in r157092, I don't think the bailout is needed anymore. Reviewers: atrick, nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8624 llvm-svn: 233296	2015-03-26 17:28:26 +00:00
Sanjoy Das	e561fee2a4	[ValueTracking] Fix PR23011. Summary: `ComputeNumSignBits` returns incorrect results for `srem` instructions. This change fixes the issue and adds a test case. Reviewers: nadav, nicholas, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8600 llvm-svn: 233225	2015-03-25 22:33:53 +00:00
Nick Lewycky	be8af48824	When simplifying a SCEV truncate by distributing, consider it a simplification to replace a cast, even if we end up with a trunc around the term. Fixes PR22960! llvm-svn: 232794	2015-03-20 02:25:00 +00:00
Simon Pilgrim	5ec5c9cafe	[X86][SSE] Avoid scalarization of v2i64 vector shifts (REAPPLIED) Fixed broken tests. Differential Revision: http://reviews.llvm.org/D8416 llvm-svn: 232682	2015-03-18 22:18:51 +00:00
Sanjoy Das	cb8bca1777	[SCEV] Make isImpliedCond smarter. Summary: This change teaches isImpliedCond to infer things like "X sgt 0" => "X - 1 sgt -1". The `ConstantRange` class has the logic to do the heavy lifting, this change simply gets ScalarEvolution to exploit that when reasonable. Depends on D8345 Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8346 llvm-svn: 232576	2015-03-18 00:41:29 +00:00
Michael Zolotukhin	c3d60efb1d	TTI: Honour cost model for estimating cost of vector-intrinsic and calls. Review: http://reviews.llvm.org/D8096 llvm-svn: 232528	2015-03-17 19:37:28 +00:00
Sanjoy Das	f1e9e1df25	[SCEV] Fix PR22856. Summary: ScalarEvolutionExpander assumes that the header block of a loop is a legal place to have a use for a phi node. This is true only for phis that are either in the header or dominate the header block, but it is not true for phi nodes that are strictly internal to the loop body. This change teaches ScalarEvolutionExpander to place uses of PHI nodes in the basic block the PHI nodes belong to. This is always legal, and `hoistIVInc` ensures that the said position dominates `IsomorphicInc`. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8311 llvm-svn: 232189	2015-03-13 18:31:19 +00:00
David Blaikie	f72d05bc7b	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184	2015-03-13 18:20:45 +00:00
Owen Anderson	41a185c521	Teach TBAA analysis to report errors on cyclic TBAA metadata rather than hanging. llvm-svn: 232144	2015-03-13 07:09:33 +00:00
Nick Lewycky	b6ef9a14de	When forming an addrec out of a phi don't just look at the last computation and steal its flags for our own, there may be other computations in the middle. Check whether the LHS of the computation is the phi itself and then we know it's safe to steal the flags. Fixes PR22795. There's a missed optimization opportunity where we could look at the full chain of computation and take the intersection of the flags instead of only looking one instruction deep. llvm-svn: 232134	2015-03-13 01:37:52 +00:00
Adam Nemet	58913d65ad	[LoopAccesses 3/3] Print the dependences with -analyze The dependences are now expose through the new getInterestingDependences API so we can use that with -analyze too and fix the FIXME. This lets us remove the test that relied on -debug to check the dependences. llvm-svn: 231807	2015-03-10 17:40:43 +00:00
Karthik Bhat	8d7f7eda14	Fix a memory corruption in Dependency Analysis. This crash occurs due to memory corruption when trying to update dependency direction based on Constraints. This crash was observed during lnt regression of Polybench benchmark test case dynprog. Review: http://reviews.llvm.org/D8059 llvm-svn: 231788	2015-03-10 14:32:02 +00:00
Karthik Bhat	8d0099bdab	Fix a crash in Dependency Analysis. This crash in Dependency analysis is because we assume here that in case of UsefulGEP both source and destination have the same number of operands which may not be true. This incorrect assumption results in crash while populating Pairs. Fix the same. This crash was observed during lnt regression for code such as- struct s{ int A[10][10]; int C[10][10][10]; } S; void dep_constraint_crash_test(int k,int N) { for( int i=0;i<N;i++) for( int j=0;j<N;j++) S.A[0][0] = S.C[0][0][k]; } Review: http://reviews.llvm.org/D8162 llvm-svn: 231784	2015-03-10 13:31:03 +00:00
George Burgess IV	ab03af277b	Added ConstantExpr support to CFLAA. CFLAA didn't know how to properly handle ConstantExprs; it would silently ignore them. This was a problem if the ConstantExpr is, say, a GEP of a global, because CFLAA wouldn't realize that there's a global there. :) llvm-svn: 231743	2015-03-10 02:58:15 +00:00
George Burgess IV	b54a8d62a4	Added special handling for inttoptr in CFLAA. We now treat pointers given to ptrtoint and pointers retrieved from inttoptr as similar to arguments or globals (can alias anything, etc.) This solves some of the problems we were having with giving incorrect results. llvm-svn: 231741	2015-03-10 02:40:06 +00:00
Sanjoy Das	91b5477aad	[SCEV] Unify getUnsignedRange and getSignedRange Summary: This removes some duplicated code, and also helps optimization: e.g. in the test case added, `%idx ULT 128` in `@x` is not currently optimized to `true` by `-indvars` but will be, after this change. The only functional change in ths commit is that for add recurrences, ScalarEvolution::getRange will be more aggressive -- computing the unsigned (resp. signed) range for a SCEVAddRecExpr will now look at the NSW (resp. NUW) bits and check for signed (resp. unsigned) overflow. This can be a strict improvement in some cases (such as the attached test case), and should be no worse in other cases. Reviewers: atrick, nlewycky Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8142 llvm-svn: 231709	2015-03-09 21:43:43 +00:00
Sanjoy Das	f257452986	[SCEV] Add a `scalar-evolution-print-constant-ranges' option Summary: Unused in this commit, but will be used in a subsequent change (D8142) by a FileCheck test. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8143 llvm-svn: 231708	2015-03-09 21:43:39 +00:00
Sanjoy Das	9e2c5010f6	[SCEV] make SCEV smarter about proving no-wrap. Summary: Teach SCEV to prove no overflow for an add recurrence by proving something about the range of another add recurrence a loop-invariant distance away from it. Reviewers: atrick, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7980 llvm-svn: 231305	2015-03-04 22:24:17 +00:00
Mehdi Amini	46a43556db	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270	2015-03-04 18:43:29 +00:00
Reid Kleckner	2f05d4c91f	Make llvm.eh.begincatch use an outparam Ultimately, __CxxFrameHandler3 needs us to put a stack offset in a table, and it will take responsibility for copying the exception object into that slot. Modelling the exception object as an SSA value returned by begincatch isn't going to work in general, so make it use an output parameter. Reviewers: andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D7920 llvm-svn: 231086	2015-03-03 17:41:09 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
David Blaikie	79e6c74981	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Adam Nemet	9cc0c3999d	[LV/LoopAccesses] Backward dependences are not safe just because the accesses are via different types Noticed this while generalizing the code for loop distribution. I confirmed with Arnold that this was indeed a bug and managed to create a testcase. llvm-svn: 230647	2015-02-26 17:58:48 +00:00
Sanjoy Das	dcc84db264	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap (The change was landed in r230280 and caused the regression PR22674. This version contains a fix and a test-case for PR22674). When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230533	2015-02-25 20:02:59 +00:00
Hans Wennborg	953d6fb84e	Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap" This caused PR22674, failing this assert: Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed. llvm-svn: 230341	2015-02-24 16:19:29 +00:00
Sanjoy Das	b14010d28b	Fix bug 22641 The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. NOTE: I had accidentally committed an unrelated change with the commit message of this change in r230275 (r230275 was reverted in r230279). This is the correct change for this commit message. Differential Revision: http://reviews.llvm.org/D7808 llvm-svn: 230291	2015-02-24 01:02:42 +00:00
Sanjoy Das	18c243b933	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. NOTE: this change was landed with an incorrect commit message in rL230275 and was reverted for that reason in rL230279. This commit message is the correct one. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230280	2015-02-23 23:22:58 +00:00
Sanjoy Das	c9cf0151cf	Revert 230275. 230275 got committed with an incorrect commit message due to a mixup on my side. Will re-land in a few moments with the correct commit message. llvm-svn: 230279	2015-02-23 23:13:22 +00:00
Sanjoy Das	913dfd8f7f	Fix bug 22641 The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. Differential Revision: http://reviews.llvm.org/D7808 llvm-svn: 230275	2015-02-23 22:55:13 +00:00
Adam Nemet	e91cc6ef93	[LoopAccesses] Add -analyze support The LoopInfo in combination with depth_first is used to enumerate the loops. Right now -analyze is not yet complete. It only prints the result of the analysis, the report and the run-time checks. Printing the unsafe depedences will require a bit more reshuffling which I'd like to do in a follow-on to this patchset. Unsafe dependences are currently checked via -debug-only=loop-accesses in the new test. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229898	2015-02-19 19:15:19 +00:00
Chandler Carruth	b89464a9b6	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835	2015-02-19 10:36:19 +00:00
NAKAMURA Takumi	fa520c5f49	Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector. r229622: "[LoopAccesses] Make VectorizerParams global" r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it" r229624: "[LoopAccesses] Cache the result of canVectorizeMemory" r229626: "[LoopAccesses] Create the analysis pass" r229628: "[LoopAccesses] Change debug messages from LV to LAA" r229630: "[LoopAccesses] Add canAnalyzeLoop" r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport" r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport" r229633: "[LoopAccesses] Add -analyze support" r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference" r229638: "Analysis: fix buildbots" llvm-svn: 229650	2015-02-18 08:34:47 +00:00
Adam Nemet	75bc2d111f	[LoopAccesses] Add -analyze support The LoopInfo in combination with depth_first is used to enumerate the loops. Right now -analyze is not yet complete. It only prints the result of the analysis, the report and the run-time checks. Printing the unsafe depedences will require a bit more reshuffling which I'd like to do in a follow-on to this patchset. Unsafe dependences are currently checked via -debug-only=loop-accesses in the new test. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229633	2015-02-18 03:44:30 +00:00
Sanjoy Das	4153f47026	Generalize getExtendAddRecStart to work with both sign and zero extensions. This change also removes `DEBUG(dbgs() << "SCEV: untested prestart overflow check\n");` because that case has a unit test now. Differential Revision: http://reviews.llvm.org/D7645 llvm-svn: 229600	2015-02-18 01:47:07 +00:00
George Burgess IV	33305e7280	Fixed a bug where CFLAA would crash the compiler. We would crash if we couldn't locate a Function that either Location's Value belonged to. Now we just print out a debug message and return conservatively. llvm-svn: 228901	2015-02-12 03:07:07 +00:00
Andrew Kaylor	78b53dbcc1	Adding support for llvm.eh.begincatch and llvm.eh.endcatch intrinsics and beginning the documentation of native Windows exception handling. Differential Revision: http://reviews.llvm.org/D7398 llvm-svn: 228733	2015-02-10 19:52:43 +00:00
Ramkumar Ramachandra	82ab65c7cd	MemDerefPrinter: Require DataLayoutPass for higher accuracy Without a valid data layout, deferenceable(N) doesn't get parsed or propagated. Since this is the key item we are testing, add a dependency on the pass. Differential Revision: http://reviews.llvm.org/D7508 llvm-svn: 228611	2015-02-09 21:50:03 +00:00
Ramkumar Ramachandra	a7343d65f4	isDereferenceablePointer: look through gc.relocate calls While a theoretical GC might change dereferenceability on collection, there is no such known collector and no need to account for the case with a flag yet. Differential Revision: http://reviews.llvm.org/D7454 llvm-svn: 228606	2015-02-09 21:08:03 +00:00
Sanjoy Das	bf5d870dfa	Bugfix: SCEV incorrectly marks certain add recurrences as nsw When creating a scev for sext({X,+,Y}), scev checks if the expression is equivalent to {sext X,+,zext Y}. If it can prove that, it also tags the original {X,+,Y} as <nsw>, which is not correct. In the test case I run `-scalar-evolution` twice because the bug manifests only once SCEV has run through and seen the `sext` expressions (and then does a in-place mutation on {X,+,Y}). Differential Revision: http://reviews.llvm.org/D7495 llvm-svn: 228586	2015-02-09 18:34:55 +00:00
Johannes Doerfert	2683e5676c	Allow ScalarEvolution to catch more min/max cases For the attached test case different types are used in the ICmpInst and SelectInst that represent the min/max expressions. However, if the ICmpInst type is smaller a comparison with the sign/zero extended operands would have yielded the same result. This situation might arise after the instruction combination pass was applied. Differential Revision: http://reviews.llvm.org/D7338 llvm-svn: 228572	2015-02-09 12:34:23 +00:00
Sanjoy Das	f2e931cae9	Bugfix: ScalarEvolution incorrectly assumes that the start of certain add recurrences don't overflow. This change makes the optimization more restrictive. It still assumes that an overflowing `add nsw` is undefined behavior; and this change will need revisiting once we have a consistent semantics for poison values. Differential Revision: http://reviews.llvm.org/D7331 llvm-svn: 228552	2015-02-08 22:52:17 +00:00
Ahmed Bougacha	29efe3b287	[BasicAA] Try to disambiguate GEPs through arrays of structs into different fields. We can show that two GEPs off of the same (possibly multidimensional) array of structs, into different fields, can't alias. Quoting: For two GEPOperators GEP1 and GEP2, if we find that: - both GEPs begin indexing from the exact same pointer; - the last indices in both GEPs are constants, indexing into a struct; - said indices are different, hence,the pointed-to fields are different; - and both GEPs only index through arrays prior to that; this lets us determine that the struct that GEP1 indexes into and the struct that GEP2 indexes into must either precisely overlap or be completely disjoint. Because they cannot partially overlap, indexing into different non-overlapping fields of the struct will never alias. The other BasicAA::aliasGEP rules worked in some cases, but not all (for example, the i32x3 struct in the testcase). We can add this simple ad-hoc rule to complement them. rdar://19717375 Differential Revision: http://reviews.llvm.org/D7453 llvm-svn: 228498	2015-02-07 17:04:29 +00:00
Ramkumar Ramachandra	8378ac3684	Introduce print-memderefs to test isDereferenceablePointer Since testing the function indirectly is tricky, introduce a direct print-memderefs pass, in the same spirit as print-memdeps, which prints dereferenceability information matched by FileCheck. Differential Revision: http://reviews.llvm.org/D7075 llvm-svn: 228369	2015-02-06 01:46:42 +00:00
Ahmed Bougacha	a7e33112f0	[BasicAA] Add datalayouts to make some tests more useful. NFC. Fixes PR22462: two of the tests have regressed for a while, but were using CHECK-NOT to match "May:". The actual output was changed to "MayAlias:" at some point, which made the tests useless. Two others return MayAlias only because of a lack of analysis; BasicAA returns PartialAlias in those cases, when a datalayout is present. llvm-svn: 228346	2015-02-05 21:10:14 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
Daniel Berlin	16f7a52628	Fix incorrect partial aliasing Update testcases llvm-svn: 227099	2015-01-26 17:31:17 +00:00
Elena Demikhovsky	a3232f764e	Implemented cost model for masked load/store operations. llvm-svn: 227035	2015-01-25 08:44:46 +00:00
Chandler Carruth	df8b223dea	[PM] Actually add the new pass manager support for the assumption cache. I had already factored this analysis specifically to enable doing this, but hadn't actually committed the necessary wiring to get at this from the new pass manager. This also nicely shows how the separate cache object can be directly managed by the new pass manager. This analysis didn't have any direct tests and so I've added a printer pass and a boring test case. I chose to print the i1 value which is being assumed rather than the call to llvm.assume as that seems much more useful for testing... but suggestions on an even better printing strategy welcome. My main goal was to make sure things actually work. =] llvm-svn: 226868	2015-01-22 21:53:09 +00:00
Sanjoy Das	cb47366366	Make ScalarEvolution less aggressive with respect to no-wrap flags. ScalarEvolution currently lowers a subtraction recurrence to an add recurrence with the same no-wrap flags as the subtraction. This is incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y` and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch fixes the issue, and adds two test cases demonstrating the bug. Differential Revision: http://reviews.llvm.org/D7081 llvm-svn: 226755	2015-01-22 00:48:47 +00:00
George Burgess IV	a1255d3a74	Added test to cover the CFLAA bitset indexing bug. llvm-svn: 226710	2015-01-21 22:39:35 +00:00
Chandler Carruth	aaf0b4cd57	[PM] Port LoopInfo to the new pass manager, adding both a LoopAnalysis pass and a LoopPrinterPass with the expected associated wiring. I've added a RUN line to the only test case (!!!) we have that actually prints loops. Everything seems to be working. This is somewhat exciting as this is the first analysis using another analysis to go in for the new pass manager. =D I also believe it is the last analysis necessary for porting instcombine, but of course I may yet discover more. llvm-svn: 226560	2015-01-20 10:58:50 +00:00
Chandler Carruth	64764b446b	[PM] Port domtree to the new pass manager (at last). This adds the domtree analysis to the new pass manager. The analysis returns the same DominatorTree result entity used by the old pass manager and essentially all of the code is shared. We just have different boilerplate for running and printing the analysis. I've converted one test to run in both modes just to make sure this is exercised while both are live in the tree. llvm-svn: 225969	2015-01-14 10:19:28 +00:00
Chandler Carruth	ef7a9fb63b	[dom] Add a basic dominator tree test. Correct, we have zero basic testing of the dominator tree in the regression test suite. There is a single test that even prints it out, and that test only checks a single line of the output. There are a handful of tests that check post dominators, but all of those are looking for bugs rather than just exercising the basic machinery. This test is super boring and unexciting. But hey, it's something. I needed there to be something so I could switch the basic test to run with both the old and new pass manager. llvm-svn: 225936	2015-01-14 03:34:55 +00:00
Sanjoy Das	81401d4b19	Fix PR22179. We were incorrectly inferring nsw for certain SCEVs. We can be more aggressive here (see Richard Smith's comment on http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just focuses on correctness. Differential Revision: http://reviews.llvm.org/D6914 llvm-svn: 225591	2015-01-10 23:41:24 +00:00
Tim Northover	eb16112e97	Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" It's not really expected to stick around, last time it provoked a weird LTO build failure that I can't reproduce now, and the bot logs are long gone. I'll re-revert it if the failures recur. Original description: Perform Scalar PRE on gep indices that feed loads before doing Load PRE. llvm-svn: 225536	2015-01-09 19:19:56 +00:00
Duncan P. N. Exon Smith	be7ea19b58	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
Duncan P. N. Exon Smith	57cbdfc99a	BFI: Saturate when combining edges to a successor When a loop gets bundled up, its outgoing edges are quite large, and can just barely overflow 64-bits. If one successor has multiple incoming edges -- and that successor is getting all the incoming mass -- combining just its edges can overflow. Handle that by saturating rather than asserting. This fixes PR21622. llvm-svn: 223500	2014-12-05 19:13:42 +00:00
Manman Ren	c67109313c	Revert r222039 because of bot failure. http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/ Hopefully, bot will be green. If not, we will re-submit the commit. llvm-svn: 222287	2014-11-19 00:13:26 +00:00
Jingyue Wu	0fa125a77d	[DependenceAnalysis] Allow subscripts of different types Summary: Several places in DependenceAnalysis assumes both SCEVs in a subscript pair share the same integer type. For instance, isKnownPredicate calls SE->getMinusSCEV(X, Y) which asserts X and Y share the same type. However, DependenceAnalysis fails to ensure this assumption when producing a subscript pair, causing tests such as NonCanonicalizedSubscript to crash. With this patch, DependenceAnalysis runs unifySubscriptType before producing any subscript pair, ensuring the assumption. Test Plan: Added NonCanonicalizedSubscript.ll on which DependenceAnalysis before the fix crashed because subscripts have different types. Reviewers: spop, sebpop, jingyue Reviewed By: jingyue Subscribers: eliben, meheff, llvm-commits Differential Revision: http://reviews.llvm.org/D6289 llvm-svn: 222100	2014-11-16 16:52:44 +00:00
David Majnemer	0df1d12476	ScalarEvolution: HowFarToZero was wrongly using signed division HowFarToZero was supposed to use unsigned division in order to calculate the backedge taken count. However, SCEVDivision::divide performs signed division. Unless I am mistaken, no users of SCEVDivision actually want signed arithmetic: switch to udiv and urem. This fixes PR21578. llvm-svn: 222093	2014-11-16 07:30:35 +00:00
Chad Rosier	1ff4c0bf0b	Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" This commit updates the failing test in Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll The failing test is sensitive to the order in which we process loads. This version turns on the RPO traversal instead of the while DT traversal in GVN. The new test code is functionally same just the order of loads that are eliminated is swapped. This new version also fixes an issue where GVN splits a critical edge and potentially invalidate the RPO/DT iterator. llvm-svn: 222039	2014-11-14 21:09:13 +00:00
Elena Demikhovsky	d5e95b57e0	AVX-512: SINT_TO_FP cost model and some bugfixes Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883	2014-11-13 11:46:16 +00:00
Hal Finkel	45ba2c10e4	Revert r219432 - "Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information.""" Let's try this again... This reverts r219432, plus a bug fix. Description of the bug in r219432 (by Nick): The bug was using AllPositive to break out of the loop; if the loop break condition i != e is changed to i != e && AllPositive then the test_modulo_analysis_with_global test I've added will fail as the Modulo will be calculated incorrectly (as the last loop iteration is skipped, so Modulo isn't updated with its Scale). Nick also adds this comment: ComputeSignBit is safe to use in loops as it takes into account phi nodes, and the == EK_ZeroEx check is safe in loops as, no matter how the variable changes between iterations, zero-extensions will always guarantee a zero sign bit. The isValueEqualInPotentialCycles check is therefore definitely not needed as all the variable analysis holds no matter how the variables change between loop iterations. And this patch also adds another enhancement to GetLinearExpression - basically to convert ConstantInts to Offsets (see test_const_eval and test_const_eval_scaled for the situations this improves). Original commit message: This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick): The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 221876	2014-11-13 09:16:54 +00:00
Quentin Colombet	360460ba64	[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> llvm-svn: 221657	2014-11-11 02:23:47 +00:00
Rafael Espindola	1c99344156	Use FileCheck in a few tests. llvm-svn: 221459	2014-11-06 15:05:51 +00:00
Bradley Smith	9992b167ae	[SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags In a case where we have a no {un,}signed wrap flag on the increment, if RHS - Start is constant then we can avoid inserting a max operation bewteen the two, since we can statically determine which is greater. This allows us to unroll loops such as: void testcase3(int v) { for (int i=v; i<=v+1; ++i) f(i); } llvm-svn: 220960	2014-10-31 11:40:32 +00:00
Hal Finkel	dd38c0b876	[DSE] Remove no-data-layout-only type-based overlap checking DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035	2014-10-17 11:56:00 +00:00
Rafael Espindola	11aaaeebe0	Delete -std-compile-opts. These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951	2014-10-16 20:00:02 +00:00
Hal Finkel	db5f86a9bf	[CFL-AA] CFL-AA should not assert on an va_arg instruction The CFL-AA implementation was missing a visit* routine for va_arg instructions, causing it to assert when run on a function that had one. For now, handle these in a conservative way. Fixes PR20954. llvm-svn: 219718	2014-10-14 20:51:26 +00:00
Sanjoy Das	1f05c51e5e	This patch teaches ScalarEvolution to pick and use !range metadata. It also makes it more aggressive in querying range information by adding a call to isKnownPredicateWithRanges to isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond. phabricator: http://reviews.llvm.org/D5638 Reviewed by: atrick, hfinkel llvm-svn: 219532	2014-10-10 21:22:34 +00:00
Mark Heffernan	2beab5f0b4	This patch de-pessimizes the calculation of loop trip counts in ScalarEvolution in the presence of multiple exits. Previously all loops exits had to have identical counts for a loop trip count to be considered computable. This pessimization was implemented by calling getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock) inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME in the comments of that function). The pessimization was added to fix a corner case involving undefined behavior (pr/16130). This patch more precisely handles the undefined behavior case allowing the pessimization to be removed. ControlsExit replaces IsSubExpr to more precisely track the case where undefined behavior is expected to occur. Because undefined behavior is tracked more precisely we can remove MustExit from ExitLimit. MustExit was used to track the case where the limit was computed potentially assuming undefined behavior even if undefined behavior didn't necessarily occur. llvm-svn: 219517	2014-10-10 17:39:11 +00:00
Hal Finkel	cbbd3df836	Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information."" This reverts commit r219135 -- still causing miscompiles in SPEC it seems... llvm-svn: 219432	2014-10-09 19:48:12 +00:00
Hal Finkel	43ce71f1b1	[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information." This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick) The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 219135	2014-10-06 18:37:59 +00:00
Hal Finkel	8eae3ad2ff	[CFL-AA] Update for handling of globals and more tests We used to return PartialAlias if either variable being queried interacted with arguments or globals. AFAICT, we can change this to only returning MayAlias iff both variables being queried interacted with arguments or globals. Also, adding some basic functionality tests: some basic IPA tests, checking that we give conservative responses with arguments/globals thrown in the mix, and ensuring that we trace values through stores and loads. Note that saying that 'x' interacted with arguments or globals means that the Attributes of the StratifiedSet that 'x' belongs to has any bits set. Patch by George Burgess IV, thanks! llvm-svn: 219122	2014-10-06 14:42:56 +00:00
Lang Hames	89e9c17235	[BasicAA] Revert r218714 - Make better use of zext and sign information. This patch broke 447.dealII on Darwin. I'm currently working on a reduced test-case, but reverting for now to keep the bots happy. <rdar://problem/18530107> llvm-svn: 218944	2014-10-03 01:33:47 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Hal Finkel	fd86317989	[BasicAA] Make better use of zext and sign information Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 218714	2014-09-30 22:43:40 +00:00
Elena Demikhovsky	27012478d2	AVX-512: added cost for some AVX-512 instructions llvm-svn: 217863	2014-09-16 07:57:37 +00:00
Hal Finkel	cc4f31d3d7	Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector types The default implementation of getCmpSelInstrCost, which provides the cost of icmp/fcmp/select instructions, did not deal sensibly with illegal vector types that were scalarized. We'd ask for the legalization cost of the vector type, which would return something like (4, f64) given an input of <4 x double>, and we'd then check the TLI status of the ISD opcode on that scalar type. This would result in querying (ISD::VSELECT, f64), for example. Amusingly enough, ISD::VSELECT on scalar types is marked as Legal by default (as with most other operations), and most backends never change this because VSELECT is never generated on scalars. However, seeing the resulting operation as Legal, we'd neglect to add the scalarization cost before returning. The result is that we'd grossly under-estimate the cost of cmps/selects on illegal vector types. Now, if type legalization clearly results in scalarization, we skip the early return and add the scalarization cost. llvm-svn: 217859	2014-09-16 04:35:50 +00:00
Matt Arsenault	f090bda1d5	CHECK-LABELize test llvm-svn: 217797	2014-09-15 17:56:56 +00:00
James Molloy	a9f47b6bae	[ARM] Teach the cost model that cross-class copies are costly. Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case. llvm-svn: 217674	2014-09-12 13:29:40 +00:00
Hal Finkel	cebf0cc210	Make use @llvm.assume for loop guards in ScalarEvolution This adds a basic (but important) use of @llvm.assume calls in ScalarEvolution. When SE is attempting to validate a condition guarding a loop (such as whether or not the loop count can be zero), this check should also include dominating assumptions. llvm-svn: 217348	2014-09-07 21:37:59 +00:00
Hal Finkel	7529c55c02	Add a CFL Alias Analysis implementation This provides an implementation of CFL alias analysis (including some supporting data structures). Currently, we don't have any extremely fancy features, sans some interprocedural analysis (i.e. no field sensitivity, etc.), and we do best sitting behind BasicAA + TBAA. In such a configuration, we take ~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA answered with NoAlias by this algorithm. Patch by George Burgess IV (with minor modifications by me -- mostly adapting some BasicAA tests), thanks! llvm-svn: 216970	2014-09-02 21:43:13 +00:00
Hal Finkel	930469107d	Add @llvm.assume, lowering, and some basic properties This is the first commit in a series that add an @llvm.assume intrinsic which can be used to provide the optimizer with a condition it may assume to be true (when the control flow would hit the intrinsic call). Some basic properties are added here: - llvm.invariant(true) is dead. - llvm.invariant(false) is unreachable (this directly corresponds to the documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef). The intrinsic is tagged as writing arbitrarily, in order to maintain control dependencies. BasicAA has been updated, however, to return NoModRef for any particular location-based query so that we don't unnecessarily block code motion. llvm-svn: 213973	2014-07-25 21:13:35 +00:00
Hal Finkel	ff0bcb60c9	Convert noalias parameter attributes into noalias metadata during inlining This functionality is currently turned off by default. Part of the motivation for introducing scoped-noalias metadata is to enable the preservation of noalias parameter attribute information after inlining. Sometimes this can be inferred from the code in the caller after inlining, but often we simply lose valuable information. The overall process if fairly simple: 1. Create a new unqiue scope domain. 2. For each (used) noalias parameter, create a new alias scope. 3. For each pointer, collect the underlying objects. Add a noalias scope for each noalias parameter from which we're not derived (and has not been captured prior to that point). 4. Add an alias.scope for each noalias parameter from which we might be derived (or has been captured before that point). Note that the capture checks apply only if one of the underlying objects is not an identified function-local object. llvm-svn: 213949	2014-07-25 15:50:08 +00:00
Hal Finkel	029cde639c	Simplify and improve scoped-noalias metadata semantics In the process of fixing the noalias parameter -> metadata conversion process that will take place during inlining (which will be committed soon, but not turned on by default), I have come to realize that the semantics provided by yesterday's commit are not really what we want. Here's why: void foo(noalias a, noalias b, noalias c, bool x) { q = x ? a : b; c = q; } Generically, we know that c does not alias with a and with b (so there is an 'and' in what we know we're not), and we know that q might be derived from a or from *b (so there is an 'or' in what we know that we are). So we do not want the semantics currently, where any noalias scope matching any alias.scope causes a NoAlias return. What we want to know is that the noalias scopes form a superset of the alias.scope list (meaning that all the things we know we're not is a superset of all of things the other instruction might be). Making that change, however, introduces a composibility problem. If we inline once, adding the noalias metadata, and then inline again adding more, and we append new scopes onto the noalias and alias.scope lists each time. But, this means that we could change what was a NoAlias result previously into a MayAlias result because we appended an additional scope onto one of the alias.scope lists. So, instead of giving scopes the ability to have parents (which I had borrowed from the TBAA implementation, but seems increasingly unlikely to be useful in practice), I've given them domains. The subset/superset condition now applies within each domain independently, and we only need it to hold in one domain. Each time we inline, we add the new scopes in a new scope domain, and everything now composes nicely. In addition, this simplifies the implementation. llvm-svn: 213948	2014-07-25 15:50:02 +00:00
Hal Finkel	9414665a3b	Add scoped-noalias metadata This commit adds scoped noalias metadata. The primary motivations for this feature are: 1. To preserve noalias function attribute information when inlining 2. To provide the ability to model block-scope C99 restrict pointers Neither of these two abilities are added here, only the necessary infrastructure. In fact, there should be no change to existing functionality, only the addition of new features. The logic that converts noalias function parameters into this metadata during inlining will come in a follow-up commit. What is added here is the ability to generally specify noalias memory-access sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA nodes: !scope0 = metadata !{ metadata !"scope of foo()" } !scope1 = metadata !{ metadata !"scope 1", metadata !scope0 } !scope2 = metadata !{ metadata !"scope 2", metadata !scope0 } !scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 } !scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 } Loads and stores can be tagged with an alias-analysis scope, and also, with a noalias tag for a specific scope: ... = load %ptr1, !alias.scope !{ !scope1 } ... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 } When evaluating an aliasing query, if one of the instructions is associated with an alias.scope id that is identical to the noalias scope associated with the other instruction, or is a descendant (in the scope hierarchy) of the noalias scope associated with the other instruction, then the two memory accesses are assumed not to alias. Note that is the first element of the scope metadata is a string, then it can be combined accross functions and translation units. The string can be replaced by a self-reference to create globally unqiue scope identifiers. [Note: This overview is slightly stylized, since the metadata nodes really need to just be numbers (!0 instead of !scope0), and the scope lists are also global unnamed metadata.] Existing noalias metadata in a callee is "cloned" for use by the inlined code. This is necessary because the aliasing scopes are unique to each call site (because of possible control dependencies on the aliasing properties). For example, consider a function: foo(noalias a, noalias b) { a = b; } that gets inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } -- now just because we know that a1 does not alias with b1 at the first call site, and a2 does not alias with b2 at the second call site, we cannot let inlining these functons have the metadata imply that a1 does not alias with b2. llvm-svn: 213864	2014-07-24 14:25:39 +00:00
Hal Finkel	cc39b67530	AA metadata refactoring (introduce AAMDNodes) In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859	2014-07-24 12:16:19 +00:00
Hal Finkel	354e23b029	Improve BasicAA CS-CS queries (redux) This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219	2014-07-17 01:28:25 +00:00
Nick Lewycky	7a63c3b389	Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303. llvm-svn: 213024	2014-07-15 00:53:38 +00:00
Hal Finkel	8ae0f8d618	Improve BasicAA CS-CS queries BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 212572	2014-07-08 23:16:49 +00:00
Andrea Di Biagio	c8e8bda58f	[CostModel][x86] Improved cost model for alternate shuffles. This patch: 1) Improves the cost model for x86 alternate shuffles (originally added at revision 211339); 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles. Alternate shuffles are a special kind of blend; on x86, we can often easily lowered alternate shuffled into single blend instruction (depending on the subtarget features). The existing cost model didn't take into account subtarget features. Also, it had a couple of "dead" entries for vector types that are never legal (example: on x86 types v2i32 and v2f32 are not legal; those are always either promoted or widened to 128-bit vector types). The new x86 cost model takes into account what target features we have before returning the shuffle cost (i.e. the number of instructions after the blend is lowered/expanded). This patch also teaches the Cost Model Analysis how to identify and analyze alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions): - added function 'isAlternateVectorMask'; - added some logic to check if an instruction is a alternate shuffle and, in case, call the target specific TTI to get the corresponding shuffle cost; - added a test to verify the cost model analysis on alternate shuffles. llvm-svn: 212296	2014-07-03 22:24:18 +00:00
Alp Toker	d3d017cf00	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Tobias Grosser	40ac10085a	ScalarEvolution: Derive element size from the type of the loaded element Before, we where looking at the size of the pointer type that specifies the location from which to load the element. This did not make any sense at all. This change fixes a bug in the delinearization where we failed to delinerize certain load instructions. llvm-svn: 210435	2014-06-08 19:21:20 +00:00
Sebastian Pop	a6e5860513	remove constant terms The delinearization is needed only to remove the non linearity induced by expressions involving multiplications of parameters and induction variables. There is no problem in dealing with constant times parameters, or constant times an induction variable. For this reason, the current patch discards all constant terms and multipliers before running the delinearization algorithm on the terms. The only thing remaining in the term expressions are parameters and multiply expressions of parameters: these simplified term expressions are passed to the array shape recognizer that will not recognize constant dimensions anymore: these will be recognized as different strides in parametric subscripts. The only important special case of a constant dimension is the size of elements. Instead of relying on the delinearization to infer the size of an element, compute the element size from the base address type. This is a much more precise way of computing the element size than before, as we would have mixed together the size of an element with the strides of the innermost dimension. llvm-svn: 209691	2014-05-27 22:41:45 +00:00
Dinesh Dwivedi	c0e6703360	Adding testcase for PR18886. Differential Revision: http://reviews.llvm.org/D3837 llvm-svn: 209645	2014-05-27 06:44:25 +00:00
Tim Northover	3b0846e8f7	AArch64/ARM64: move ARM64 into AArch64's place This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577	2014-05-24 12:50:23 +00:00
Andrew Trick	b429083aff	Test case comments. Fix sloppiness. llvm-svn: 209551	2014-05-23 20:46:21 +00:00
Andrew Trick	839e30b2c0	Fix and improve SCEV ComputeBackedgeTankCount. This is a follow-up to r209358: PR19799: Indvars miscompile due to an incorrect max backedge taken count from SCEV. That fix was incomplete as pointed out by Arnold and Michael Z. The code was also too confusing. It needed a careful rewrite with more unit tests. This version will also happen to optimize more cases. <rdar://17005101> PR19799: Indvars miscompile... llvm-svn: 209545	2014-05-23 19:47:13 +00:00
Andrew Trick	e255359b57	Fix a bug in SCEV's backedge taken count computation from my prior fix in Jan. This has to do with the trip count computation for loops with multiple exits, which is quite subtle. Most passes just ask for a single trip count number, so we must be conservative assuming any exit could be taken. Normally, we rely on the "exact" trip count, which was correctly given as "unknown". However, SCEV also gives a "max" back-edge taken count. The loops max BE taken count is conservatively a maximum over the max of each exit's non-exiting iterations count. Note that some exit tests can be skipped so the max loop back-edge taken count can actually exceed the max non-exiting iterations for some exits. However, when we know the loop latch cannot be skipped, we can directly use its max taken count disregarding other exits. I previously took the minimum here without checking whether the other exit could be skipped. The correct, and simpler thing to do here is just to directly use the loop latch's max non-exiting iterations as the loops max back-edge count. In the problematic test case, the first loop exit had a max of zero non-exiting iterations, but could be skipped. The loop latch was known not to be skipped but had max of one non-exiting iteration. We incorrectly claimed the loop back-edge could be taken zero times, when it is actually taken one time. Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count. Loop %for.body.i: max backedge-taken count is 1. llvm-svn: 209358	2014-05-22 00:37:03 +00:00
Filipe Cabecinhas	7b12d773e3	Added tests for the cost of lowering VSELECT instructions. llvm-svn: 209045	2014-05-16 22:47:58 +00:00
Alp Toker	beaca19c7c	Fix typos llvm-svn: 208839	2014-05-15 01:52:21 +00:00
Adam Nemet	63e4b30f79	[Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfg Tested by comparing make check VERBOSE=1 before and after to make sure no tests are missed. (VERBOSE=1 prints the list of tests.) Only one test :( remains where .cpp is required: tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp \| FileCheck %s --check-prefix=STDOUT The topic was discussed in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html llvm-svn: 208621	2014-05-12 19:57:31 +00:00
Sebastian Pop	b1a548f72d	do not assert when delinearization fails llvm-svn: 208615	2014-05-12 19:01:53 +00:00
Sebastian Pop	45dd14bac2	add testcase for r208237: do not collect undef terms llvm-svn: 208347	2014-05-08 18:38:58 +00:00
Sebastian Pop	448712b1a6	split delinearization pass in 3 steps To compute the dimensions of the array in a unique way, we split the delinearization analysis in three steps: - find parametric terms in all memory access functions - compute the array dimensions from the set of terms - compute the delinearized access functions for each dimension The first step is executed on all the memory access functions such that we gather all the patterns in which an array is accessed. The second step reduces all this information in a unique description of the sizes of the array. The third step is delinearizing each memory access function following the common description of the shape of the array computed in step 2. This rewrite of the delinearization pass also solves a problem we had with the previous implementation: because the previous algorithm was by induction on the structure of the SCEV, it would not correctly recognize the shape of the array when the memory access was not following the nesting of the loops: for example, see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll ; void foo(long n, long m, long o, double A[n][m][o]) { ; ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; for (long k = 0; k < o; k++) ; A[i][k][j] = 1.0; Starting with this patch we no longer delinearize access functions that do not contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll ;; for (long int i = 0; i < 100; i++) ;; for (long int j = 0; j < 100; j++) { ;; A[2i - 4j] = i; ;; B++ = A[6i + 8*j]; these accesses will not be delinearized as the upper bound of the loops are constants, and their access functions do not contain SCEVUnknown parameters. llvm-svn: 208232	2014-05-07 18:01:20 +00:00
Benjamin Kramer	1625bfccbe	TTI: Estimate @llvm.fmuladd cost as fmul + fadd when FMA's aren't legal on the target. llvm-svn: 208115	2014-05-06 18:36:23 +00:00
Duncan P. N. Exon Smith	c5a3139ebd	Reapply "blockfreq: Approximate irreducible control flow" This reverts commit r207287, reapplying r207286. I'm hoping that declaring an explicit struct and instantiating `addBlockEdges()` directly works around the GCC crash from r207286. This is a lot more boilerplate, though. llvm-svn: 207438	2014-04-28 20:02:29 +00:00
Benjamin Kramer	ce4b3fee72	X86TTI: Adjust sdiv cost now that we can lower it on plain SSE2. Includes a fix for a horrible typo that caused all SDIV costs to be slightly off :) llvm-svn: 207371	2014-04-27 18:47:54 +00:00
Benjamin Kramer	7c3722724b	X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now. Turn vectorization back on. llvm-svn: 207320	2014-04-26 14:53:05 +00:00
Duncan P. N. Exon Smith	42292ceaa9	Revert "blockfreq: Approximate irreducible control flow" This reverts commit r207286. It causes an ICE on the cmake-llvm-x86_64-linux buildbot [1]: llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function: llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035 [1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio llvm-svn: 207287	2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith	384d0e8ad4	blockfreq: Approximate irreducible control flow Previously, irreducible backedges were ignored. With this commit, irreducible SCCs are discovered on the fly, and modelled as loops with multiple headers. This approximation specifies the headers of irreducible sub-SCCs as its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if we insert a new block that intercepts all the edges to the headers. All backedges and entries to the irreducible SCC point to this imaginary block. This imaginary block has an edge (with even probability) to each header block. The result is now reasonable enough that I've added a number of testcases for irreducible control flow. I've outlined in `BlockFrequencyInfoImpl.h` ways to improve the approximation. <rdar://problem/14292693> llvm-svn: 207286	2014-04-25 23:08:57 +00:00
Duncan P. N. Exon Smith	cb7d29d30c	blockfreq: Only one mass distribution per node Remove the concepts of "forward" and "general" mass distributions, which was wrong. The split might have made sense in an early version of the algorithm, but it's definitely wrong now. <rdar://problem/14292693> llvm-svn: 207195	2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith	84408d1fda	blockfreq: Use better branch weights in multiexit test The branch weights were even before. Make them different. <rdar://problem/14292693> llvm-svn: 207193	2014-04-25 04:38:37 +00:00
Duncan P. N. Exon Smith	58c8948a0c	blockfreq: Clean up irreducible testcases Strip irreducible testcases to pure control flow. The function calls made the branch weights more believable but cluttered it up a lot. There isn't going to be any constant analysis here, so just use dumb branch logic to clarify the important parts. <rdar://problem/14292693> llvm-svn: 207192	2014-04-25 04:38:35 +00:00
Duncan P. N. Exon Smith	b3380ea60a	blockfreq: Skip irreducible backedges inside functions The branch that skips irreducible backedges was only active when propagating mass at the top-level. In particular, when propagating mass through a loop recognized by `LoopInfo` with irreducible control flow inside, irreducible backedges would not be skipped. Not sure where that idea came from, but the result was that mass was lost until after loop exit. Added a testcase that covers this case. llvm-svn: 206860	2014-04-22 03:31:53 +00:00
Duncan P. N. Exon Smith	10be9a8868	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206707, reapplying r206704. The preceding commit to CalcSpillWeights should have sorted out the failing buildbots. <rdar://problem/14292693> llvm-svn: 206766	2014-04-21 17:57:07 +00:00
Duncan P. N. Exon Smith	e63327e967	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206704, as expected. llvm-svn: 206707	2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith	875ddfac75	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite. I've done a careful audit, added some asserts, and fixed a couple of bugs (unfortunately, they were in unlikely code paths). There's a small chance that this will appease the failing bots [1][2]. (If so, great!) If not, I have a follow-up commit ready that will temporarily add -debug-only=block-freq to the two failing tests, allowing me to compare the code path between what the failing bots and what my machines (and the rest of the bots) are doing. Once I've triggered those builds, I'll revert both commits so the bots go green again. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 <rdar://problem/14292693> llvm-svn: 206704	2014-04-19 22:34:26 +00:00
Duncan P. N. Exon Smith	76b813619a	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206666, as planned. Still stumped on why the bots are failing. Sanitizer bots haven't turned anything up. If anyone can help me debug either of the failures (referenced in r206666) I'll owe them a beer. (In the meantime, I'll be auditing my patch for undefined behaviour.) llvm-svn: 206677	2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith	b3caf3646f	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206628, reapplying r206622 (and r206626). Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind don't tell us anything, but we're hoping the msan bot will catch it. So, I'm applying this again to get more feedback from the bots. I'll leave it in long enough to trigger builds in at least the sanitizer buildbots (it was failing for reasons unrelated to my commit last time it was in), and hopefully a few others.... and then I expect to revert a third time. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 llvm-svn: 206666	2014-04-18 22:30:03 +00:00
Duncan P. N. Exon Smith	0842ff36a6	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206622 and the MSVC fixup in r206626. Apparently the remotely failing tests are still failing, despite my attempt to fix the nondeterminism in r206621. llvm-svn: 206628	2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith	f8361d127a	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206622	2014-04-18 17:22:25 +00:00
Chandler Carruth	18eadd9260	[LCG] Add support for building persistent and connected SCCs to the LazyCallGraph. This is the start of the whole point of this different abstraction, but it is just the initial bits. Here is a run-down of what's going on here. I'm planning to incorporate some (or all) of this into comments going forward, hopefully with better editing and wording. =] The crux of the problem with the traditional way of building SCCs is that they are ephemeral. The new pass manager however really needs the ability to associate analysis passes and results of analysis passes with SCCs in order to expose these analysis passes to the SCC passes. Making this work is kind-of the whole point of the new pass manager. =] So, when we're building SCCs for the call graph, we actually want to build persistent nodes that stick around and can be reasoned about later. We'd also like the ability to walk the SCC graph in more complex ways than just the traditional postorder traversal of the current CGSCC walk. That means that in addition to being persistent, the SCCs need to be connected into a useful graph structure. However, we still want the SCCs to be formed lazily where possible. These constraints are quite hard to satisfy with the SCC iterator. Also, using that would bypass our ability to actually add data to the nodes of the call graph to facilite implementing the Tarjan walk. So I've re-implemented things in a more direct and embedded way. This immediately makes it easy to get the persistence and connectivity correct, and it also allows leveraging the existing nodes to simplify the algorithm. I've worked somewhat to make this implementation more closely follow the traditional paper's nomenclature and strategy, although it is still a bit obtuse because it isn't recursive, using an explicit stack and a tail call instead, and it is interruptable, resuming each time we need another SCC. The other tricky bit here, and what actually took almost all the time and trials and errors I spent building this, is exactly what graph structure to build for the SCCs. The naive thing to build is the call graph in its newly acyclic form. I wrote about 4 versions of this which did precisely this. Inevitably, when I experimented with them across various use cases, they became incredibly awkward. It was all implementable, but it felt like a complete wrong fit. Square peg, round hole. There were two overriding aspects that pushed me in a different direction: 1) We want to discover the SCC graph in a postorder fashion. That means the root node will be the last node we find. Using the call-SCC DAG as the graph structure of the SCCs results in an orphaned graph until we discover a root. 2) We will eventually want to walk the SCC graph in parallel, exploring distinct sub-graphs independently, and synchronizing at merge points. This again is not helped by the call-SCC DAG structure. The structure which, quite surprisingly, ended up being completely natural to use is the inverse of the call-SCC DAG. We add the leaf SCCs to the graph as "roots", and have edges to the caller SCCs. Once I switched to building this structure, everything just fell into place elegantly. Aside from general cleanups (there are FIXMEs and too few comments overall) that are still needed, the other missing piece of this is support for iterating across levels of the SCC graph. These will become useful for implementing #2, but they aren't an immediate priority. Once SCCs are in good shape, I'll be working on adding mutation support for incremental updates and adding the pass manager that this analysis enables. llvm-svn: 206581	2014-04-18 10:50:32 +00:00
Duncan P. N. Exon Smith	e576167df8	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556	2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith	12e68e1733	blockfreq: Rewrite BlockFrequencyInfoImpl Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is not incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548	2014-04-18 01:57:45 +00:00
Akira Hatanaka	5638b89944	Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly. Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights. This fixes PR18705 and <rdar://problem/15991090>. Differential Revision: http://reviews.llvm.org/D3363 llvm-svn: 206194	2014-04-14 16:56:19 +00:00
Hal Finkel	56bf297e3a	Don't assert in BasicTTI::getMemoryOpCost for non-simple types BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for example, non-power-of-two vector types return non-simple EVTs (not MVT::Other). llvm-svn: 206150	2014-04-14 05:59:09 +00:00
Sebastian Pop	b5b84e0963	in findGCD of multiply expr return the gcd we used to return 1 instead of the gcd llvm-svn: 205800	2014-04-08 21:21:05 +00:00
Hal Finkel	de0b413ec0	[PowerPC] Adjust load/store costs in PPCTTI This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. llvm-svn: 205658	2014-04-04 23:51:18 +00:00
Hal Finkel	6fd19ab35e	Account for scalarization costs in BasicTTI::getMemoryOpCost for extending vector loads When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) llvm-svn: 205495	2014-04-03 00:53:59 +00:00
Hal Finkel	55312debee	Fix multi-register costs in BasicTTI::getCastInstrCost For an cast (extension, etc.), the currently logic predicts a low cost if the associated operation (keyed on the destination type) is legal (or promoted). This is not true when the number of values required to legalize the type is changing. For example, <8 x i16> being sign extended by <8 x i32> is not generically cheap on PPC with VSX, even though sign extension to v4i32 is legal, because two output v4i32 values are required compared to the single v8i16 input value, and without custom logic in the target, this conversion will scalarize. llvm-svn: 205487	2014-04-02 23:18:54 +00:00
Tim Northover	00ed9964c6	ARM64: initial backend import This adds a second implementation of the AArch64 architecture to LLVM, accessible in parallel via the "arm64" triple. The plan over the coming weeks & months is to merge the two into a single backend, during which time thorough code review should naturally occur. Everything will be easier with the target in-tree though, hence this commit. llvm-svn: 205090	2014-03-29 10:18:08 +00:00
Arnold Schwaighofer	1a444489e9	PR15967 Fix in basicaa for faulty returning no alias. This commit consist of two parts. The first part fix the PR15967. The wrong conclusion was made when the MaxLookup limit was reached. The fix introduce a out parameter (MaxLookupReached) to DecomposeGEPExpression that the function aliasGEP can act upon. The second part is introducing the constant MaxLookupSearchDepth to make sure that DecomposeGEPExpression and GetUnderlyingObject use the same search depth. This is a small cleanup to clarify the original algorithm. Patch by Karl-Johan Karlsson! llvm-svn: 204859	2014-03-26 21:30:19 +00:00
Benjamin Kramer	e75eaca32f	ScalarEvolution: Compute exit counts for loops with a power-of-2 step. If we have a loop of the form for (unsigned n = 0; n != (k & -32); n += 32) {} then we know that n is always divisible by 32 and the loop must terminate. Even if we have a condition where the loop counter will overflow it'll always hold this invariant. PR19183. Our loop vectorizer creates this pattern and it's also occasionally formed by loop counters derived from pointers. llvm-svn: 204728	2014-03-25 16:25:12 +00:00
Rafael Espindola	f3336bc1d5	Reject alias to undefined symbols in the verifier. On ELF and COFF an alias is just another name for a position in the file. There is no way to refer to a position in another file, so an alias to undefined is meaningless. MachO currently doesn't support aliases. The spec has a N_INDR, which when implemented will have a different set of restrictions. Adding support for it shouldn't be harder than any other IR extension. For now, having the IR represent what is actually possible with current tools makes it easier to fix the design of GlobalAlias. llvm-svn: 203705	2014-03-12 20:15:49 +00:00
Raul E. Silvera	ce376c0fcb	When analyzing vectors of element type that require legalization, the legalization cost must be included to get an accurate estimation of the total cost of the scalarized vector. The inaccurate cost triggered unprofitable SLP vectorization on 32-bit X86. Summary: Include legalization overhead when computing scalarization cost Reviewers: hfinkel, nadav CC: chandlerc, rnk, llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2992 llvm-svn: 203509	2014-03-10 22:59:13 +00:00
Matt Arsenault	a236ea551c	Teach lint about address spaces llvm-svn: 203132	2014-03-06 17:33:55 +00:00
Sebastian Pop	f05ba89bd3	add -da-delinearize runs and checks to MIV testcases llvm-svn: 201869	2014-02-21 18:15:18 +00:00
Nico Rieck	5ba5226ab9	Add extra CHECK prefix to tests with explicit prefix These tests mistakenly assume that CHECK is still available even if an explicit prefix is specified. llvm-svn: 201492	2014-02-16 13:28:15 +00:00
Nico Rieck	35a237d4ed	Actually call FileCheck in tests llvm-svn: 201491	2014-02-16 13:27:39 +00:00
Nico Rieck	7647178738	Fix broken CHECK lines llvm-svn: 201479	2014-02-16 07:31:05 +00:00
Andrea Di Biagio	b7882b3bd1	[Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called 'OK_NonUniformConstValue' to identify operands which are constants but not constant splats. The cost model now allows returning 'OK_NonUniformConstValue' for non splat operands that are instances of ConstantVector or ConstantDataVector. With this change, targets are now able to compute different costs for instructions with non-uniform constant operands. For example, On X86 the cost of a vector shift may vary depending on whether the second operand is a uniform or non-uniform constant. This patch applies the following changes: - The cost model computation now takes into account non-uniform constants; - The cost of vector shift instructions has been improved in X86TargetTransformInfo analysis pass; - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish between non-uniform and uniform constant operands. Added a new test to verify that the output of opt '-cost-model -analyze' is valid in the following configurations: SSE2, SSE4.1, AVX, AVX2. llvm-svn: 201272	2014-02-12 23:43:47 +00:00
Craig Topper	522af29465	Test case I forgot to 'add' for r201126. llvm-svn: 201207	2014-02-12 03:58:47 +00:00
Benjamin Kramer	5a188549ad	ScalarEvolution: Analyze trip count of loops with a switch guarding the exit. llvm-svn: 201159	2014-02-11 15:44:32 +00:00
Tim Northover	f0e21616f3	X86: add costs for 64-bit vector ext/trunc & rebalance The most important part of this is probably adding any cost at all for operations like zext <8 x i8> to <8 x i32>. Before they were being recorded as extremely costly (24, I believe) which made LLVM fall back on a 4-wide vectorisation of a loop. It also rebalances the values for sext, zext and trunc. Lacking any other sane metric that might work across CPU microarchitectures I went for instructions. This seems to be in reasonable accord with the rest of the table (sitofp, ...) though no doubt at least one value is sub-optimal for some bizarre reason. Finally, separate AVX and AVX2 values are provided where appropriate. The CodeGen is quite different in many cases. rdar://problem/15981990 llvm-svn: 200928	2014-02-06 18:18:36 +00:00
Chandler Carruth	bf71a34eb9	[PM] Add a new "lazy" call graph analysis pass for the new pass manager. The primary motivation for this pass is to separate the call graph analysis used by the new pass manager's CGSCC pass management from the existing call graph analysis pass. That analysis pass is (somewhat unfortunately) over-constrained by the existing CallGraphSCCPassManager requirements. Those requirements make it really hard to cleanly layer the needed functionality for the new pass manager on top of the existing analysis. However, there are also a bunch of things that the pass manager would specifically benefit from doing differently from the existing call graph analysis, and this new implementation tries to address several of them: - Be lazy about scanning function definitions. The existing pass eagerly scans the entire module to build the initial graph. This new pass is significantly more lazy, and I plan to push this even further to maximize locality during CGSCC walks. - Don't use a single synthetic node to partition functions with an indirect call from functions whose address is taken. This node creates a huge choke-point which would preclude good parallelization across the fanout of the SCC graph when we got to the point of looking at such changes to LLVM. - Use a memory dense and lightweight representation of the call graph rather than value handles and tracking call instructions. This will require explicit update calls instead of some updates working transparently, but should end up being significantly more efficient. The explicit update calls ended up being needed in many cases for the existing call graph so we don't really lose anything. - Doesn't explicitly model SCCs and thus doesn't provide an "identity" for an SCC which is stable across updates. This is essential for the new pass manager to work correctly. - Only form the graph necessary for traversing all of the functions in an SCC friendly order. This is a much simpler graph structure and should be more memory dense. It does limit the ways in which it is appropriate to use this analysis. I wish I had a better name than "call graph". I've commented extensively this aspect. This is still very much a WIP, in fact it is really just the initial bits. But it is about the fourth version of the initial bits that I've implemented with each of the others running into really frustrating problms. This looks like it will actually work and I'd like to split the actual complexity across commits for the sake of my reviewers. =] The rest of the implementation along with lots of wiring will follow somewhat more rapidly now that there is a good path forward. Naturally, this doesn't impact any of the existing optimizer. This code is specific to the new pass manager. A bunch of thanks are deserved for the various folks that have helped with the design of this, especially Nick Lewycky who actually sat with me to go through the fundamentals of the final version here. llvm-svn: 200903	2014-02-06 04:37:03 +00:00
Nick Lewycky	629199ccb3	Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else! llvm-svn: 200210	2014-01-27 10:47:44 +00:00
Nick Lewycky	31eaca5513	Teach SCEV to handle more cases of 'and X, CST', specifically where CST is any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits. Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary. Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown. llvm-svn: 200203	2014-01-27 10:04:03 +00:00
Alp Toker	cb40291100	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018	2014-01-24 17:20:08 +00:00
Arnold Schwaighofer	e3ac099726	BasicAA: We need to check both access sizes when comparing a gep and an underlying object of unknown size. Fixes PR18460. llvm-svn: 199351	2014-01-16 04:53:18 +00:00
Benjamin Kramer	c10563d14e	Fix broken CHECK lines. llvm-svn: 199016	2014-01-11 21:06:00 +00:00
Stepan Dyatkovskiy	431993b57b	Fixed old typo in ScalarEvolution, that caused wrong SCEVs zext operation. Detailed description is here: http://llvm.org/bugs/show_bug.cgi?id=18000#c16 For participation in bugfix process special thanks to David Wiberg. llvm-svn: 198863	2014-01-09 12:26:12 +00:00
Arnold Schwaighofer	833a82ecde	BasicAA: Use reachabilty instead of dominance for checking value equality in phi cycles This allows the value equality check to work even if we don't have a dominator tree. Also add some more comments. I was worried about compile time impacts and did not implement reachability but used the dominance check in the initial patch. The trade-off was that the dominator tree was required. The llvm utility function isPotentiallyReachable cuts off the recursive search after 32 visits. Testing did not show any compile time regressions showing my worries unjustfied. No compile time or performance regressions at O3 -flto -mavx on test-suite + externals. Addresses review comments from r198290. llvm-svn: 198400	2014-01-03 05:47:03 +00:00
Arnold Schwaighofer	0d10a9d579	BasicAA: Fix value equality and phi cycles When there are cycles in the value graph we have to be careful interpreting "Value" identity as "value" equivalence. We interpret the value of a phi node as the value of its operands. When we check for value equivalence now we make sure that the "Value" dominates all cycles (phis). %0 = phi [%noaliasval, %addr2] %l = load %ptr %addr1 = gep @a, 0, %l %addr2 = gep @a, 0, (%l + 1) store %ptr ... Before this patch we would return NoAlias for (%0, %addr1) which is wrong because the value of the load is from different iterations of the loop. Tested on x86_64 -mavx at O3 and O3 -flto with no performance or compile time regressions. PR18068 radar://15653794 llvm-svn: 198290	2014-01-02 03:31:36 +00:00
Matt Arsenault	a8fe22baba	Use correct size for address space in BasicAA. The tests just hit this with a different sized address space since I haven't figured out how to use this to break it. I thought I committed this a long time ago, and I'm not sure why missing this hasn't caused any problems. llvm-svn: 194903	2013-11-16 00:36:43 +00:00
Sebastian Pop	a1cc34b981	improve dependence analysis testcases print the name of the function on which the dependence analysis is performed such that changes to the testcase are easier to review. llvm-svn: 194528	2013-11-12 22:47:30 +00:00
Sebastian Pop	c62c679c1b	delinearization of arrays llvm-svn: 194527	2013-11-12 22:47:20 +00:00
Andrew Trick	34e2f0c4ea	Rewrite SCEV's backedge taken count computation. Patch by Michele Scandale! Rewrite of the functions used to compute the backedge taken count of a loop on LT and GT comparisons. I decided to split the handling of LT and GT cases becasue the trick "a > b == -a < -b" in some cases prevents the trip count computation due to the multiplication by -1 on the two operands of the comparison. This issue comes from the conservative computation of value range of SCEVs: taking the negative SCEV of an expression that have a small positive range (e.g. [0,31]), we would have a SCEV with a fullset as value range. Indeed, in the new rewritten function I tried to better handle the maximum backedge taken count computation when MAX/MIN expression are used to handle the cases where no entry guard is found. Some test have been modified in order to check the new value correctly (I manually check them and reasoning on possible overflow the new values seem correct). I finally added a new test case related to the multiplication by -1 issue on GT comparisons. llvm-svn: 194116	2013-11-06 02:08:26 +00:00
Hal Finkel	4d94930bcb	Consider (x == -1) unlikely in BranchProbabilityInfo This adds another heuristic to BPI, similar to the existing heuristic that considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and equality comparisons with -1 are also unlikely to succeed. Local experimentation supports this hypothesis: This yields a 1-2% speedup in the test-suite sqlite benchmark on the PPC A2 core, with no significant regressions. llvm-svn: 193855	2013-11-01 10:58:22 +00:00
Benjamin Kramer	6094f30da2	SCEV: Make the final add of an inbounds GEP nuw if we know that the index is positive. We can't do this for the general case as saying a GEP with a negative index doesn't have unsigned wrap isn't valid for negative indices. %gep = getelementptr inbounds i32* %p, i64 -1 But an inbounds GEP cannot run past the end of address space. So we check for the very common case of a positive index and make GEPs derived from that NUW. Together with Andy's recent non-unit stride work this lets us analyze loops like void foo3(int a, int b) { for (; a < b; a++) {} } PR12375, PR12376. Differential Revision: http://llvm-reviews.chandlerc.com/D2033 llvm-svn: 193514	2013-10-28 07:30:06 +00:00
Shuxin Yang	2e1890e18b	Revert r193251 : Use address-taken to disambiguate global variable and indirect memops. llvm-svn: 193489	2013-10-27 03:08:44 +00:00
Benjamin Kramer	0ccab2d66c	X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate. Also update the cost model. llvm-svn: 193270	2013-10-23 21:06:07 +00:00
Shuxin Yang	e4fb375995	Use address-taken to disambiguate global variable and indirect memops. Major steps include: 1). introduces a not-addr-taken bit-field in GlobalVariable 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable dosen't have its address taken. 3). AA use this info for disambiguation. llvm-svn: 193251	2013-10-23 17:28:19 +00:00
Manman Ren	a1ce9891ac	Simplify testing case (Thanks Rafael for the testing case). llvm-svn: 193177	2013-10-22 18:15:50 +00:00
Manman Ren	f3850c0807	TBAA: fix PR17620. We can have a struct type with a single field and the field does not start with 0. In that case, we should correctly update the offset. llvm-svn: 193137	2013-10-22 01:40:25 +00:00
Matt Arsenault	be18b8a3ca	Fix creating bitcasts between address spaces in SCEV. The test before wasn't successfully testing this since it was missing the datalayout piece to change the size of the second address space. llvm-svn: 193102	2013-10-21 18:41:10 +00:00
Andrew Trick	768b917dc8	SCEV should use NSW to get trip count for positive nonunit stride loops. SCEV currently fails to compute loop counts for nonunit stride loops. This comes up frequently. It prevents loop optimization and forces vectorization to insert extra loop checks. For example: void foo(int n, int *x) { for (int i = 0; i < n; i += 3) { x[i] = i; x[i+1] = i+1; x[i+2] = i+2; } } We need to properly handle the case in which limit > INT_MAX-stride. In the above case: n > INT_MAX-3. In this case the loop counter will step beyond the limit and overflow at the same time. However, knowing that signed integer overlow in undefined, we can assume the loop test behavior is arbitrary after overflow. This obeys both C undefined behavior rules, and the more strict LLVM poison value rules. I'm finally fixing this in response to Hal Finkel's persistence. The most probable reason that we never optimized this before is that we were being careful to handle case where the developer expected a side-effect free infinite loop relying on overflow: for (int i = 0; i < n; i += s) { ++j; } return j; If INT_MAX+1 is a multiple of s and n > INT_MAX-s, then we might expect an infinite loop. However there are plenty of ways to achieve this effect without relying on undefined behavior of signed overflow. llvm-svn: 193015	2013-10-18 23:43:53 +00:00
Chandler Carruth	ea56494625	Remove the very substantial, largely unmaintained legacy PGO infrastructure. This was essentially work toward PGO based on a design that had several flaws, partially dating from a time when LLVM had a different architecture, and with an effort to modernize it abandoned without being completed. Since then, it has bitrotted for several years further. The result is nearly unusable, and isn't helping any of the modern PGO efforts. Instead, it is getting in the way, adding confusion about PGO in LLVM and distracting everyone with maintenance on essentially dead code. Removing it paves the way for modern efforts around PGO. Among other effects, this removes the last of the runtime libraries from LLVM. Those are being developed in the separate 'compiler-rt' project now, with somewhat different licensing specifically more approriate for runtimes. llvm-svn: 191835	2013-10-02 15:42:23 +00:00
Matt Arsenault	21981a1a0d	Use CHECK-LABEL llvm-svn: 191713	2013-09-30 23:31:55 +00:00
Manman Ren	0ed04fc9ab	TBAA: handle scalar TBAA format and struct-path aware TBAA format. Remove the command line argument "struct-path-tbaa" since we should not depend on command line argument to decide which format the IR file is using. Instead, we check the first operand of the tbaa tag node, if it is a MDNode, we treat it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA format. When clang starts to use struct-path aware TBAA format no matter whether struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support for scalar TBAA format can be dropped. Existing testing cases are updated to use the struct-path aware TBAA format. llvm-svn: 191538	2013-09-27 18:34:27 +00:00
Yi Jiang	5c343de8d3	X86 horizontal vector reduction cost model llvm-svn: 191021	2013-09-19 17:48:48 +00:00
Arnold Schwaighofer	cae8735a54	Costmodel: Add support for horizontal vector reductions Upcoming SLP vectorization improvements will want to be able to estimate costs of horizontal reductions. Add infrastructure to support this. We model reductions as a series of (shufflevector,add) tuples ultimately followed by an extractelement. For example, for an add-reduction of <4 x float> we could generate the following sequence: (v0, v1, v2, v3) \ \ / / \ \ / + + (v0+v2, v1+v3, undef, undef) \ / ((v0+v2) + (v1+v3), undef, undef) %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7 %r = extractelement <4 x float> %bin.rdx8, i32 0 This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)" that will allow clients to ask for the cost of such a reduction (as backends might generate more efficient code than the cost of the individual instructions summed up). This interface is excercised by the CostModel analysis pass which looks for reduction patterns like the one above - starting at extractelements - and if it sees a matching sequence will call the cost model interface. We will also support a second form of pairwise reduction that is well supported on common architectures (haddps, vpadd, faddp). (v0, v1, v2, v3) \ / \ / (v0+v1, v2+v3, undef, undef) \ / ((v0+v1)+(v2+v3), undef, undef, undef) %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef> %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef> %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1 %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1 %r = extractelement <4 x float> %bin.rdx.1, i32 0 llvm-svn: 190876	2013-09-17 18:06:50 +00:00
Matt Arsenault	a90a18e0ea	Teach ScalarEvolution about pointer address spaces llvm-svn: 190425	2013-09-10 19:55:24 +00:00
Matt Arsenault	5faa669b66	Fix lint assert on integer vector division llvm-svn: 189290	2013-08-26 23:29:33 +00:00
Bill Wendling	a08bb49c61	FileCheck-ize tests. llvm-svn: 188971	2013-08-22 00:51:19 +00:00
Daniel Dunbar	9efbedfd35	[tests] Cleanup initialization of test suffixes. - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513	2013-08-16 00:37:11 +00:00
Bill Wendling	dc17270968	FileCheckize some of the testcases. llvm-svn: 187756	2013-08-05 23:43:18 +00:00
Renato Golin	0178a25fc5	Fixes ARM LNT bot from SLP change in O3 This patch fixes the multiple breakages on ARM test-suite after the SLP vectorizer was introduced by default on O3. The problem was an illegal vector type on ARMTTI::getCmpSelInstrCost() <3 x i1> which is not simple. The guard protects this code from breaking (cause of the problems) but doesn't fix the issue that is generating the odd vector in the first place, which also needs to be investigated. llvm-svn: 187658	2013-08-02 17:10:04 +00:00
Stephen Lin	6dd347b39f	Add newlines at end of test files, no functionality change llvm-svn: 186263	2013-07-13 22:00:58 +00:00
Hal Finkel	ec474f28e3	Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo This fixes an oversight that Intrinsic::nearbyint was not being mapped to ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to nearbyint calls for some targets). llvm-svn: 185783	2013-07-08 03:24:07 +00:00
Nick Lewycky	c2ec0725ce	Extend 'readonly' and 'readnone' to work on function arguments as well as functions. Make the function attributes pass add it to known library functions and when it can deduce it. llvm-svn: 185735	2013-07-06 00:29:58 +00:00
Jakob Stoklund Olesen	0b075103cd	Minimize precision loss when computing cyclic probabilities. Allow block frequencies to exceed 32 bits by using the new BlockFrequency division function. llvm-svn: 185236	2013-06-28 22:40:43 +00:00
Preston Briggs	6c286b6029	(no commit message) llvm-svn: 185187	2013-06-28 18:44:48 +00:00
Nadav Rotem	f9ecbcb835	CostModel: improve the cost model for load/store of non power-of-two types such as <3 x float>, which are popular in graphics. llvm-svn: 185085	2013-06-27 17:52:04 +00:00
Jakob Stoklund Olesen	6e630d46d2	Print block frequencies in decimal form. This is easier to read than the internal fixed-point representation. If anybody knows the correct algorithm for converting fixed-point numbers to base 10, feel free to fix it. llvm-svn: 184881	2013-06-25 21:57:38 +00:00
Arnold Schwaighofer	a04b9ef1e8	X86 cost model: Vectorizing integer division is a bad idea radar://14057959 llvm-svn: 184872	2013-06-25 19:14:09 +00:00
Benjamin Kramer	866793109e	BlockFrequency: Bump up the entry frequency a bit. This is a band-aid to fix the most severe regressions we're seeing from basing spill decisions on block frequencies, until we have a better solution. llvm-svn: 184835	2013-06-25 13:34:40 +00:00
Benjamin Kramer	bfb84d0bd6	Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability." This reverts commit r184584. Breaks PPC selfhost. llvm-svn: 184590	2013-06-21 20:20:27 +00:00
Benjamin Kramer	bd0f107929	BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability. Zero is used by BlockFrequencyInfo as a special "don't know" value. It also causes a sink for frequencies as you can't ever get off a zero frequency with more multiplies. This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency was propagated into an inner loop causing excessive spilling. PR16402. llvm-svn: 184584	2013-06-21 19:30:05 +00:00
Andrew Trick	3c944ba2f0	Unit test for SCEV fix r182989, PR16130. llvm-svn: 183017	2013-05-31 16:42:41 +00:00
Michael Kuperstein	f3e663af39	Make BasicAliasAnalysis recognize the fact a noalias argument cannot alias another argument, even if the other argument is not itself marked noalias. llvm-svn: 182755	2013-05-28 08:17:48 +00:00
Diego Novillo	c63995394d	Add a new function attribute 'cold' to functions. Other than recognizing the attribute, the patch does little else. It changes the branch probability analyzer so that edges into blocks postdominated by a cold function are given low weight. Added analysis and code generation tests. Added documentation for the new attribute. llvm-svn: 182638	2013-05-24 12:26:52 +00:00

... 2 3 4 5 6 ...

832 Commits