llvm-project

Commit Graph

Author	SHA1	Message	Date
Ahmed Bougacha	29efe3b287	[BasicAA] Try to disambiguate GEPs through arrays of structs into different fields. We can show that two GEPs off of the same (possibly multidimensional) array of structs, into different fields, can't alias. Quoting: For two GEPOperators GEP1 and GEP2, if we find that: - both GEPs begin indexing from the exact same pointer; - the last indices in both GEPs are constants, indexing into a struct; - said indices are different, hence,the pointed-to fields are different; - and both GEPs only index through arrays prior to that; this lets us determine that the struct that GEP1 indexes into and the struct that GEP2 indexes into must either precisely overlap or be completely disjoint. Because they cannot partially overlap, indexing into different non-overlapping fields of the struct will never alias. The other BasicAA::aliasGEP rules worked in some cases, but not all (for example, the i32x3 struct in the testcase). We can add this simple ad-hoc rule to complement them. rdar://19717375 Differential Revision: http://reviews.llvm.org/D7453 llvm-svn: 228498	2015-02-07 17:04:29 +00:00
Benjamin Kramer	d7e331e0f9	SCEV: Compress disposition pairs. Composing DenseMaps and SmallVectors is still somewhat suboptimal, but this at least halves the size of the vector elements. NFC. llvm-svn: 228497	2015-02-07 16:41:12 +00:00
Michael Zolotukhin	4e8598eee3	[InstSimplify] Add SimplifyFPBinOp function. It is a variation of SimplifyBinOp, but it takes into account FastMathFlags. It is needed in inliner and loop-unroller to accurately predict the transformation's outcome (previously we dropped the flags and were too conservative in some cases). Example: float foo(float a, float b) { float r; if (a[1] b) r = /* a lot of expensive computations /; else r = 1; return r; } float boo(float a) { return foo(a, 0.0); } Without this patch, we don't inline 'foo' into 'boo'. llvm-svn: 228432	2015-02-06 20:02:51 +00:00
Adam Nemet	7206d7a5d2	[LV] Move addRuntimeCheck to LoopAccessAnalysis This will allow it to be shared with the new Loop Distribution pass. getFirstInst is currently duplicated across LoopVectorize.cpp and LoopAccessAnalysis.cpp. This is a short-term work-around until we figure out a better solution. NFC. (The code moved is adjusted a bit for the name of the Loop member and that PtrRtCheck is now a reference rather than a pointer.) llvm-svn: 228418	2015-02-06 18:31:04 +00:00
Chad Rosier	92c1f363f4	Whitespace. llvm-svn: 228397	2015-02-06 14:14:41 +00:00
Ramkumar Ramachandra	8378ac3684	Introduce print-memderefs to test isDereferenceablePointer Since testing the function indirectly is tricky, introduce a direct print-memderefs pass, in the same spirit as print-memdeps, which prints dereferenceability information matched by FileCheck. Differential Revision: http://reviews.llvm.org/D7075 llvm-svn: 228369	2015-02-06 01:46:42 +00:00
Cameron Esfahani	17177d1e84	Value soft float calls as more expensive in the inliner. Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point. Reviewers: chandlerc, echristo Reviewed By: echristo Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6936 llvm-svn: 228263	2015-02-05 02:09:33 +00:00
David Majnemer	8a6578a0e7	ValueTracking: Make isSafeToSpeculativelyExecute a little cleaner No functional change intended. llvm-svn: 227760	2015-02-01 19:10:19 +00:00
Adam Nemet	0456327cfb	[LoopVectorize] Move LoopAccessAnalysis to its own module Other than moving code and adding the boilerplate for the new files, the code being moved is unchanged. There are a few global functions that are shared with the rest of the LoopVectorizer. I moved these to the new module as well (emitLoopAnalysis, stripIntegerCast, replaceSymbolicStrideSCEV) along with the Report class used by emitLoopAnalysis. There is probably room for further improvement in this area. I kept DEBUG_TYPE "loop-vectorize" because it's used as the PassName with emitOptimizationRemarkAnalysis. This will obviously have to change. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227756	2015-02-01 16:56:15 +00:00
Chandler Carruth	21fc195c13	[multiversion] Kill FunctionTargetTransformInfo, TTI itself is now per-function and supports the exact desired interface. llvm-svn: 227743	2015-02-01 14:37:03 +00:00
Chandler Carruth	ab5cb36c40	[multiversion] Remove the function parameter from the unrolling preferences interface on TTI now that all of TTI is per-function. llvm-svn: 227741	2015-02-01 14:31:23 +00:00
Chandler Carruth	5ec2b1d11a	[multiversion] Implement the old pass manager's TTI wrapper pass in terms of the new pass manager's TargetIRAnalysis. Yep, this is one of the nicer bits of the new pass manager's design. Passes can in many cases operate in a vacuum and so we can just nest things when convenient. This is particularly convenient here as I can now consolidate all of the TargetMachine logic on this analysis. The most important change here is that this pushes the function we need TTI for all the way into the TargetMachine, and re-creates the TTI object for each function rather than re-using it for each function. We're now prepared to teach the targets to produce function-specific TTI objects with specific subtargets cached, etc. One piece of feedback I'd love here is whether its worth renaming any of this stuff. None of the names really seem that awesome to me at this point, but TargetTransformInfoWrapperPass is particularly ... odd. TargetIRAnalysisWrapper might make more sense. I would want to do that rename separately anyways, but let me know what you think. llvm-svn: 227731	2015-02-01 12:26:09 +00:00
Chandler Carruth	fdb9c573f7	[multiversion] Thread a function argument through all the callers of the getTTI method used to get an actual TTI object. No functionality changed. This just threads the argument and ensures code like the inliner can correctly look up the callee's TTI rather than using a fixed one. The next change will use this to implement per-function subtarget usage by TTI. The changes after that should eliminate the need for FTTI as that will have become the default. llvm-svn: 227730	2015-02-01 12:01:35 +00:00
Chandler Carruth	e038552c8a	[PM] Port TTI to the new pass manager, introducing a TargetIRAnalysis to produce it. This adds a function to the TargetMachine that produces this analysis via a callback for each function. This in turn faves the way to produce a different TTI per-function with the correct subtarget cached. I've also done the necessary wiring in the opt tool to thread the target machine down and make it available to the pass registry so that we can construct this analysis from a target machine when available. llvm-svn: 227721	2015-02-01 10:11:22 +00:00
Chandler Carruth	93dcdc47db	[PM] Switch the TargetMachine interface from accepting a pass manager base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI pass in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685	2015-01-31 11:17:59 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
Elena Demikhovsky	45f0448081	Fold fcmp in cases where value is provably non-negative. By Arch Robison. This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where: - the predicate is olt or uge - the first operand is provably not less than zero - the second operand is zero The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(xx+yy).. http://reviews.llvm.org/D6972 llvm-svn: 227298	2015-01-28 08:03:58 +00:00
Reid Kleckner	4af6415237	Move EH personality type classification to Analysis/LibCallSemantics.h Summary: Also add enum types for __C_specific_handler and _CxxFrameHandler3 for which we know a few things. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7214 llvm-svn: 227284	2015-01-28 01:17:38 +00:00
Chad Rosier	f9327d6fe9	Commoning of target specific load/store intrinsics in Early CSE. Phabricator revision: http://reviews.llvm.org/D7121 Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! llvm-svn: 227149	2015-01-26 22:51:15 +00:00
Philip Reames	a7ad6a589c	Refine memory dependence's notion of volatile semantics According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile. The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes. I will be submitting a PRE w/volatiles test case seperately in the near future. Differential Revision: http://reviews.llvm.org/D6901 llvm-svn: 227112	2015-01-26 18:54:27 +00:00
Philip Reames	32351455f6	Pass QueryInst down through non-local dependency calculation This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block. Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that. Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself. Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well. Differential Revision: http://reviews.llvm.org/D6895 llvm-svn: 227110	2015-01-26 18:39:52 +00:00
Daniel Berlin	16f7a52628	Fix incorrect partial aliasing Update testcases llvm-svn: 227099	2015-01-26 17:31:17 +00:00
Daniel Berlin	8f10e387bb	Fix delegation llvm-svn: 227098	2015-01-26 17:30:39 +00:00
Elena Demikhovsky	a3232f764e	Implemented cost model for masked load/store operations. llvm-svn: 227035	2015-01-25 08:44:46 +00:00
Chandler Carruth	c0291865ed	[PM] Rework how the TargetLibraryInfo pass integrates with the new pass manager to support the actual uses of it. =] When I ported instcombine to the new pass manager I discover that it didn't work because TLI wasn't available in the right places. This is a somewhat surprising and/or subtle aspect of the new pass manager design that came up before but I think is useful to be reminded of: While the new pass manager allows a function pass to query a module analysis, it requires that the module analysis is already run and cached prior to the function pass manager starting up, possibly with a 'require<foo>' style utility in the pass pipeline. This is an intentional hurdle because using a module analysis from a function pass requires that the module analysis is run prior to entering the function pass manager. Otherwise the other functions in the module could be in who-knows-what state, etc. A somewhat surprising consequence of this design decision (at least to me) is that you have to design a function pass that leverages a module analysis to do so as an optional feature. Even if that means your function pass does no work in the absence of the module analysis, you have to handle that possibility and remain conservatively correct. This is a natural consequence of things being able to invalidate the module analysis and us being unable to re-run it. And it's a generally good thing because it lets us reorder passes arbitrarily without breaking correctness, etc. This ends up causing problems in one case. What if we have a module analysis that is definitionally impossible to invalidate. In the places this might come up, the analysis is usually also definitionally trivial to run even while other transformation passes run on the module, regardless of the state of anything. And so, it follows that it is natural to have a hard requirement on such analyses from a function pass. It turns out, that TargetLibraryInfo is just such an analysis, and InstCombine has a hard requirement on it. The approach I've taken here is to produce an analysis that models this flexibility by making it both a module and a function analysis. This exposes the fact that it is in fact safe to compute at any point. We can even make it a valid CGSCC analysis at some point if that is useful. However, we don't want to have a copy of the actual target library info state for each function! This state is specific to the triple. The somewhat direct and blunt approach here is to turn TLI into a pimpl, with the state and mutators in the implementation class and the query routines primarily in the wrapper. Then the analysis can lazily construct and cache the implementations, keyed on the triple, and on-demand produce wrappers of them for each function. One minor annoyance is that we will end up with a wrapper for each function in the module. While this is a bit wasteful (one pointer per function) it seems tolerable. And it has the advantage of ensuring that we pay the absolute minimum synchronization cost to access this information should we end up with a nice parallel function pass manager in the future. We could look into trying to mark when analysis results are especially cheap to recompute and more eagerly GC-ing the cached results, or we could look at supporting a variant of analyses whose results are specifically not cached and expected to just be used and discarded by the consumer. Either way, these seem like incremental enhancements that should happen when we start profiling the memory and CPU usage of the new pass manager and not before. The other minor annoyance is that if we end up using the TLI in both a module pass and a function pass, those will be produced by two separate analyses, and thus will point to separate copies of the implementation state. While a minor issue, I dislike this and would like to find a way to cleanly allow a single analysis instance to be used across multiple IR unit managers. But I don't have a good solution to this today, and I don't want to hold up all of the work waiting to come up with one. This too seems like a reasonable thing to incrementally improve later. llvm-svn: 226981	2015-01-24 02:06:09 +00:00
Chandler Carruth	df8b223dea	[PM] Actually add the new pass manager support for the assumption cache. I had already factored this analysis specifically to enable doing this, but hadn't actually committed the necessary wiring to get at this from the new pass manager. This also nicely shows how the separate cache object can be directly managed by the new pass manager. This analysis didn't have any direct tests and so I've added a printer pass and a boring test case. I chose to print the i1 value which is being assumed rather than the call to llvm.assume as that seems much more useful for testing... but suggestions on an even better printing strategy welcome. My main goal was to make sure things actually work. =] llvm-svn: 226868	2015-01-22 21:53:09 +00:00
Ramkumar Ramachandra	75a4f35b26	Intrinsics: introduce llvm_any_ty aka ValueType Any Specifically, gc.result benefits from this greatly. Instead of: gc.result.int.* gc.result.float.* gc.result.ptr.* ... We now have a gc.result.* that can specialize to literally any type. Differential Revision: http://reviews.llvm.org/D7020 llvm-svn: 226857	2015-01-22 20:14:38 +00:00
Sanjoy Das	cb47366366	Make ScalarEvolution less aggressive with respect to no-wrap flags. ScalarEvolution currently lowers a subtraction recurrence to an add recurrence with the same no-wrap flags as the subtraction. This is incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y` and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch fixes the issue, and adds two test cases demonstrating the bug. Differential Revision: http://reviews.llvm.org/D7081 llvm-svn: 226755	2015-01-22 00:48:47 +00:00
George Burgess IV	3c898c2119	Fixed a bug with how we determine bitset indices. llvm-svn: 226671	2015-01-21 16:37:21 +00:00
Chandler Carruth	aaf0b4cd57	[PM] Port LoopInfo to the new pass manager, adding both a LoopAnalysis pass and a LoopPrinterPass with the expected associated wiring. I've added a RUN line to the only test case (!!!) we have that actually prints loops. Everything seems to be working. This is somewhat exciting as this is the first analysis using another analysis to go in for the new pass manager. =D I also believe it is the last analysis necessary for porting instcombine, but of course I may yet discover more. llvm-svn: 226560	2015-01-20 10:58:50 +00:00
Chandler Carruth	691addc25f	[PM] Now that LoopInfo isn't in the Pass type hierarchy, it is much cleaner to derive from the generic base. Thise removes a ton of boiler plate code and somewhat strange and pointless indirections. It also remove a bunch of the previously needed friend declarations. To fully remove these, I also lifted the verify logic into the generic LoopInfoBase, which seems good anyways -- it is generic and useful logic even for the machine side. llvm-svn: 226385	2015-01-18 01:25:51 +00:00
Chandler Carruth	bc045a5a33	[PM] Cleanup more warnings my refactoring exposed where now we have unused variables in a no-asserts build. I've fixed this by putting the entire loop behind an #ifndef as it contains nothing other than asserts. llvm-svn: 226377	2015-01-17 14:49:23 +00:00
Chandler Carruth	4f8f307c77	[PM] Split the LoopInfo object apart from the legacy pass, creating a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373	2015-01-17 14:16:18 +00:00
Chandler Carruth	8ca43224db	[PM] Port TargetLibraryInfo to the new pass manager, provided by the TargetLibraryAnalysis pass. There are actually no direct tests of this already in the tree. I've added the most basic test that the pass manager bits themselves work, and the TLI object produced will be tested by an upcoming patches as they port passes which rely on TLI. This is starting to point out the awkwardness of the invalidate API -- it seems poorly fitting on the result object. I suspect I will change it to live on the analysis instead, but that's not for this change, and I'd rather have a few more passes ported in order to have more experience with how this plays out. I believe there is only one more analysis required in order to start porting instcombine. =] llvm-svn: 226160	2015-01-15 11:39:46 +00:00
Chandler Carruth	b98f63dbdb	[PM] Separate the TargetLibraryInfo object from the immutable pass. The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157	2015-01-15 10:41:28 +00:00
NAKAMURA Takumi	24ebfcb619	Update libdeps since TLI was moved from Target to Analysis in r226078. llvm-svn: 226126	2015-01-15 05:21:00 +00:00
Chandler Carruth	62d4215baa	[PM] Move TargetLibraryInfo into the Analysis library. While the term "Target" is in the name, it doesn't really have to do with the LLVM Target library -- this isn't an abstraction which LLVM targets generally need to implement or extend. It has much more to do with modeling the various runtime libraries on different OSes and with different runtime environments. The "target" in this sense is the more general sense of a target of cross compilation. This is in preparation for porting this analysis to the new pass manager. No functionality changed, and updates inbound for Clang and Polly. llvm-svn: 226078	2015-01-15 02:16:27 +00:00
Richard Smith	e78bb1249e	For PR21145: recognise a builtin call to a known deallocation function even if it's defined in the current module. Clang generates this situation for the C++14 sized deallocation functions, because it generates a weak definition in case one isn't provided by the C++ runtime library. llvm-svn: 226069	2015-01-15 01:00:33 +00:00
Chandler Carruth	d9903888d9	[cleanup] Re-sort all the #include lines in LLVM using utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974	2015-01-14 11:23:27 +00:00
Chandler Carruth	11f5032368	Revert r225854: [PM] Move the LazyCallGraph printing functionality to a print method. This was formulated on a bad idea, but sadly I didn't uncover how bad this was until I got further down the path. I had hoped that we could provide a low boilerplate way of printing analyses, but it just doesn't seem like this really fits the needs of the analyses. Not all analyses really want to do printing, and those that do don't all use the same interface. Instead, with the new pass manager let's just take advantage of the fact that creating an explicit printer pass like the LCG has is pretty low boilerplate already and rely on that for testing. llvm-svn: 225861	2015-01-14 00:27:45 +00:00
Chandler Carruth	76890d82c0	[PM] Move the LazyCallGraph printing functionality to a print method. I'm adding generic analysis printing utility pass support which will require such a method (or a specialization) so this will let the existing printing logic satisfy that. llvm-svn: 225854	2015-01-13 23:53:50 +00:00
Chandler Carruth	703378f156	[PM] Remove the defunt CGSCC-specific debug flag. Even before I sunk the debug flag into the opt tool this had been made obsolete by factoring the pass and analysis managers into a single set of templates that all used the core flag. No functionality changed here. llvm-svn: 225842	2015-01-13 22:45:13 +00:00
Chandler Carruth	816702ffe0	[PM] Refactor the new pass manager to use a single template to implement the generic functionality of the pass managers themselves. In the new infrastructure, the pass "manager" isn't actually interesting at all. It just pipelines a single chunk of IR through N passes. We don't need to know anything about the IR or the passes to do this really and we can replace the 3 implementations of the exact same functionality with a single generic PassManager template, complementing the single generic AnalysisManager template. I've left typedefs in place to give convenient names to the various obvious instantiations of the template. With this, I think I've nuked almost all of the redundant logic in the managers, and I think the overall design is actually simpler for having single templates that clearly indicate there is no special logic here. The logging is made somewhat more annoying by this change, but I don't think the difference is worth having heavy-weight traits to help log things. llvm-svn: 225783	2015-01-13 11:13:56 +00:00
Ramkumar Ramachandra	40c3e03e27	Standardize {pred,succ,use,user}_empty() The functions {pred,succ,use,user}_{begin,end} exist, but many users have to check _begin() with _end() by hand to determine if the BasicBlock or User is empty. Fix this with a standard *_empty(), demonstrating a few usecases. llvm-svn: 225760	2015-01-13 03:46:47 +00:00
Chandler Carruth	7ad6d620b7	[PM] Fold all three analysis managers into a single AnalysisManager template. This consolidates three copies of nearly the same core logic. It adds "complexity" to the ModuleAnalysisManager in that it makes it possible to share a ModuleAnalysisManager across multiple modules... But it does so by deleting all of the code, so I'm OK with that. This will naturally make fixing bugs in this code much simpler, etc. The only down side here is that we have to use 'typename' and 'this->' in various places, and the implementation is lifted into the header. I'll take that for the code size reduction. The convenient names are still typedef-ed and used throughout so that users can largely ignore this aspect of the implementation. The follow-up change to this will do the exact same refactoring for the PassManagers. =D It turns out that the interesting different code is almost entirely in the adaptors. At the end, that should be essentially all that is left. llvm-svn: 225757	2015-01-13 02:51:47 +00:00
Chandler Carruth	2e7522e9ce	[PM] Re-clang-format much of this code as the code has changed some and so has clang-format. Notably, this fixes a bunch of formatting in the CGSCC pass manager side of things that has been improved in clang-format recently. llvm-svn: 225743	2015-01-13 00:36:47 +00:00
Sanjoy Das	81401d4b19	Fix PR22179. We were incorrectly inferring nsw for certain SCEVs. We can be more aggressive here (see Richard Smith's comment on http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just focuses on correctness. Differential Revision: http://reviews.llvm.org/D6914 llvm-svn: 225591	2015-01-10 23:41:24 +00:00
Sanjay Patel	2a385e2494	remove names from comments; NFC llvm-svn: 225526	2015-01-09 16:47:20 +00:00
Sanjay Patel	938e279082	fix typos; NFC llvm-svn: 225525	2015-01-09 16:35:37 +00:00
Sanjay Patel	e6e58c1a9e	fix typo; NFC llvm-svn: 225524	2015-01-09 16:29:50 +00:00
Sanjay Patel	d729115fa7	more efficient use of a dyn_cast; no functional change intended llvm-svn: 225523	2015-01-09 16:28:15 +00:00
Philip Reames	33d7f9de33	[REFACTOR] Push logic from MemDepPrinter into getNonLocalPointerDependency Previously, MemDepPrinter handled volatile and unordered accesses without involving MemoryDependencyAnalysis. By making a slight tweak to the documented interface - which is respected by both callers - we can move this responsibility to MDA for the benefit of any future callers. This is basically just cleanup. In the future, we may decide to extend MDA's non local dependency analysis to return useful results for ordered or volatile loads. I believe (but have not really checked in detail) that local dependency analyis does get useful results for ordered, but not volatile, loads. llvm-svn: 225483	2015-01-09 00:26:45 +00:00
Philip Reames	567feb98f0	[Refactor] Have getNonLocalPointerDependency take the query instruction Previously, MemoryDependenceAnalysis::getNonLocalPointerDependency was taking a list of properties about the instruction being queried. Since I'm about to need one more property to be passed down through the infrastructure - I need to know a query instruction is non-volatile in an inner helper - fix the interface once and for all. I also added some assertions and behaviour clarifications around volatile and ordered field accesses. At the moment, this is mostly to document expected behaviour. The only non-standard instructions which can currently reach this are atomic, but unordered, loads and stores. Neither ordered or volatile accesses can reach here. The call in GVN is protected by an isSimple check when it first considers the load. The calls in MemDepPrinter are protected by isUnordered checks. Both utilities also check isVolatile for loads and stores. llvm-svn: 225481	2015-01-09 00:04:22 +00:00
Nick Lewycky	c99cc19650	Remove empty statement. No functionality change. llvm-svn: 225420	2015-01-08 00:47:03 +00:00
Chandler Carruth	fdb4180514	[PM] Fix a pretty nasty bug where the new pass manager would invalidate passes too many time. I think this is actually the issue that someone raised with me at the developer's meeting and in an email, but that we never really got to the bottom of. Having all the testing utilities made it much easier to dig down and uncover the core issue. When a pass manager is running many passes over a single function, we need it to invalidate the analyses between each run so that they can be re-computed as needed. We also need to track the intersection of preserved higher-level analyses across all the passes that we run (for example, if there is one module analysis which all the function analyses preserve, we want to track that and propagate it). Unfortunately, this interacted poorly with any enclosing pass adaptor between two IR units. It would see the intersection of preserved analyses, and need to invalidate any other analyses, but some of the un-preserved analyses might have already been invalidated and recomputed! We would fail to propagate the fact that the analysis had already been invalidated. The solution to this struck me as really strange at first, but the more I thought about it, the more natural it seemed. After a nice discussion with Duncan about it on IRC, it seemed even nicer. The idea is that invalidating an analysis causes it to be preserved! Preserving the lack of result is trivial. If it is recomputed, great. Until something else invalidates it again, we're good. The consequence of this is that the invalidate methods on the analysis manager which operate over many passes now consume their PreservedAnalyses object, update it to "preserve" every analysis pass to which it delivers an invalidation (regardless of whether the pass chooses to be removed, or handles the invalidation itself by updating itself). Then we return this augmented set from the invalidate routine, letting the pass manager take the result and use the intersection of that across each pass run to compute the final preserved set. This accounts for all the places where the early invalidation of an analysis has already "preserved" it for a future run. I've beefed up the testing and adjusted the assertions to show that we no longer repeatedly invalidate or compute the analyses across nested pass managers. llvm-svn: 225333	2015-01-07 01:58:35 +00:00
David Majnemer	5310c1e954	Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as computeOverflowForUnsignedAdd. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225329	2015-01-07 00:39:50 +00:00
Chandler Carruth	3472ffb37e	[PM] Add a utility pass template that synthesizes the invalidation of a specific analysis result. This is quite handy to test things, and will also likely be very useful for debugging issues. You could narrow down pass validation failures by walking these invalidate pass runs up and down the pass pipeline, etc. I've added support to the pass pipeline parsing to be able to create one of these for any analysis pass desired. Just adding this class uncovered one latent bug where the AnalysisManager CRTP base class had a hard-coded Module type rather than using IRUnitT. I've also added tests for invalidation and caching of analyses in a basic way across all the pass managers. These in turn uncovered two more bugs where we failed to correctly invalidate an analysis -- its results were invalidated but the key for re-running the pass was never cleared and so it was never re-run. Quite nasty. I'm very glad to debug this here rather than with a full system. Also, yes, the naming here is horrid. I'm going to update some of the names to be slightly less awful shortly. But really, I've no "good" ideas for naming. I'll be satisfied if I can get it to "not bad". llvm-svn: 225246	2015-01-06 04:49:44 +00:00
Chandler Carruth	0b576b377f	[PM] Add a collection of no-op analysis passes and switch the new pass manager tests to use them and be significantly more comprehensive. This, naturally, uncovered a bug where the CGSCC pass manager wasn't printing analyses when they were run. The only remaining core manipulator is I think an invalidate pass similar to the require pass. That'll be next. =] llvm-svn: 225240	2015-01-06 02:50:06 +00:00
Chandler Carruth	539dc4b9d5	[PM] Don't run the machinery of invalidating all the analysis passes when all are being preserved. We want to short-circuit this for a couple of reasons. One, I don't really want passes to grow a dependency on actually receiving their invalidate call when they've been preserved. I'm thinking about removing this entirely. But more importantly, preserving everything is likely to be the common case in a lot of scenarios, and it would be really good to bypass all of the invalidation and preservation machinery there. Avoiding calling N opaque functions to try to invalidate things that are by definition still valid seems important. =] This wasn't really inpsired by much other than seeing the spam in the logging for analyses, but it seems better ot get it checked in rather than forgetting about it. llvm-svn: 225163	2015-01-05 12:32:11 +00:00
Chandler Carruth	e5e8fb3bf6	[PM] Add names and debug logging for analysis passes to the new pass manager. This starts to allow us to test analyses more easily, but it's really only the beginning. Some of the code here is still untestable without manual changes to create analysis passes, but I wanted to factor it into a small of chunks as possible. Next up in order to be able to test things are, in no particular order: - No-op analyses passes so we don't have to use real ones to exercise the pass maneger itself. - Automatic way of generating dummy passes that require an analysis be run, including a variant that calls a 'print' method on a pass to make it even easier to print out the results of an analysis. - Dummy passes that invalidate all analyses for their IR unit so we can test invalidation and re-runs. - Automatic way to print each analysis pass as it is re-run. - Automatic but optional verification of analysis passes everywhere possible. I'm not claiming I'll get to all of these immediately, but that's what is in the pipeline at some stage. I'm fleshing out exactly what I need and what to prioritize by working on converting analyses and then trying to test the conversion. =] llvm-svn: 225162	2015-01-05 12:21:44 +00:00
Chandler Carruth	d174ce4ad1	[PM] Switch the new pass manager to use a reference-based API for IR units. This was debated back and forth a bunch, but using references is now clearly cleaner. Of all the code written using pointers thus far, in only one place did it really make more sense to have a pointer. In most cases, this just removes immediate dereferencing from the code. I think it is much better to get errors on null IR units earlier, potentially at compile time, than to delay it. Most notably, the legacy pass manager uses references for its routines and so as more and more code works with both, the use of pointers was likely to become really annoying. I noticed this when I ported the domtree analysis over and wrote the entire thing with references only to have it fail to compile. =/ It seemed better to switch now than to delay. We can, of course, revisit this is we learn that references are really problematic in the API. llvm-svn: 225145	2015-01-05 02:47:05 +00:00
Chandler Carruth	75c11b88af	[PM] Cleanup a const_cast and other machinery left over in this code from before I removed thet non-const use of the function. The unused variable that held the const_cast was already kindly removed by Michael. llvm-svn: 225143	2015-01-04 23:13:57 +00:00
Michael Kuperstein	58c3f6cc31	Fix unused variable warning for non-asserts builds. NFC. llvm-svn: 225133	2015-01-04 13:35:44 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
David Majnemer	6ee8d17bc6	ValueTracking: ComputeNumSignBits should tolerate misshapen phi nodes PHI nodes can have zero operands in the middle of a transform. It is expected that utilities in Analysis don't freak out when this happens. Note that it is considered invalid to allow these misshapen phi nodes to make it to another pass. This fixes PR22086. llvm-svn: 225126	2015-01-04 07:06:53 +00:00
David Majnemer	8df46c9289	ValueTracking: Make computeKnownBits for Arguments a little more clear We would sometimes leave the out-param APInts untouched while going through computeKnownBits. While I don't know of a way to trigger a bug involving this in practice, it goes against the overall design of computeKnownBits. Found via code inspection. llvm-svn: 225109	2015-01-03 02:33:25 +00:00
David Majnemer	c8a576b5c0	InstCombine: Detect when llvm.umul.with.overflow always overflows We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077	2015-01-02 07:29:47 +00:00
David Majnemer	491331aca8	Analysis: Reformulate WillNotOverflowUnsignedMul for reusability WillNotOverflowUnsignedMul's smarts will live in ValueTracking as computeOverflowForUnsignedMul. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225076	2015-01-02 07:29:43 +00:00
David Majnemer	a55027f68a	ValueTracking: Small cleanup in ComputeNumSignBits Constant contains the isAllOnesValue and isNullValue predicates, not ConstantInt. llvm-svn: 224848	2014-12-26 09:20:17 +00:00
Michael Kuperstein	be8032c875	[ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits() GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects. Differential Revision: http://reviews.llvm.org/D6758 llvm-svn: 224765	2014-12-23 11:33:41 +00:00
David Majnemer	147f8586be	InstSimplify: Don't bother if getScalarSizeInBits returns zero getScalarSizeInBits returns zero when the comparison operands are not integral. No functionality change intended. llvm-svn: 224675	2014-12-20 04:45:33 +00:00
David Majnemer	7bd7144e44	Simplify the code No functionality change intended. llvm-svn: 224673	2014-12-20 03:29:59 +00:00
David Majnemer	0b6a0b0257	InstSimplify: Optimize away pointless comparisons (X & INT_MIN) ? X & INT_MAX : X into X & INT_MAX (X & INT_MIN) ? X : X & INT_MAX into X (X & INT_MIN) ? X \| INT_MIN : X into X (X & INT_MIN) ? X : X \| INT_MIN into X \| INT_MIN llvm-svn: 224669	2014-12-20 03:04:38 +00:00
Tilmann Scheller	3f7f3341b9	Remove redundant assignment. Found with the Clang static analyzer. llvm-svn: 224570	2014-12-19 11:29:34 +00:00
David Majnemer	65c52ae8ca	InstSimplify: shl nsw/nuw undef, %V -> undef We can always choose an value for undef which might cause %V to shift out an important bit except for one case, when %V is zero. However, shl behaves like an identity function when the right hand side is zero. llvm-svn: 224405	2014-12-17 01:54:33 +00:00
Sanjoy Das	4555b6d870	Teach ScalarEvolution to exploit min and max expressions when proving isKnownPredicate. The motivation for this change is to optimize away checks in loops like this: limit = min(t, len) for (i = 0 to limit) if (i >= len \|\| i < 0) throw_array_of_of_bounds(); a[i] = ... Differential Revision: http://reviews.llvm.org/D6635 llvm-svn: 224285	2014-12-15 22:50:15 +00:00
Mark Heffernan	acbed5e530	Clarify HowFarToZero computation when the step is a positive power of two. Functionally this should be identical to the existing code except for the case where Step is maximally negative (eg, INT_MIN). We now punt in that one corner case to make reasoning about the code easier. llvm-svn: 224274	2014-12-15 21:19:53 +00:00
Elena Demikhovsky	a5599bfd72	Sink store based on alias analysis - by Ella Bolshinsky The alias analysis is used define whether the given instruction is a barrier for store sinking. For 2 identical stores, following instructions are checked in the both basic blocks, to determine whether they are sinking barriers. http://reviews.llvm.org/D6420 llvm-svn: 224247	2014-12-15 14:09:53 +00:00
Elena Demikhovsky	3fcafa2cdb	Loop Vectorizer minor changes in the code - some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218	2014-12-14 09:43:50 +00:00
David Majnemer	4e87936d2f	ScalarEvolution: Remove SCEVUDivision, it's unused This is just a code simplification, no functionality change is intended. llvm-svn: 224216	2014-12-14 09:12:33 +00:00
David Majnemer	9b6097586c	ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume Respect the MaxDepth recursion limit, doing otherwise will trigger an assert in computeKnownBits. This fixes PR21891. llvm-svn: 224168	2014-12-12 23:59:29 +00:00
Mark Heffernan	41d7656d5a	Fix PR21694. r219517 added a use of SCEV divide in HowFarToZero computation. This divide can produce incorrect results as we are using an unsigned divide for what should be a modular divide. This change reverts back to a more conservative computation using trailing zeros. llvm-svn: 223974	2014-12-10 22:53:52 +00:00
David Majnemer	5a7717e498	ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0 Zero is usually a nicer constant to have than -1. llvm-svn: 223969	2014-12-10 21:58:15 +00:00
David Majnemer	ae707582c0	InstSimplify: [al]shr exact undef, %X -> undef Exact shifts always keep the non-zero bits of their input. This means it keeps it's undef bits. llvm-svn: 223923	2014-12-10 09:14:52 +00:00
David Majnemer	71dc8fb867	InstSimplify: div %X, 0 -> undef We already optimized rem %X, 0 to undef, we should do the same for div. llvm-svn: 223919	2014-12-10 07:52:18 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
David Majnemer	d5b3aa49ac	InstSimplify: Try to bring back the rest of r223583 This reverts r223624 with a small tweak, hopefully this will make stage3 equivalent. llvm-svn: 223679	2014-12-08 18:30:43 +00:00
NAKAMURA Takumi	2b6e662672	Revert a part of r223583, for now. It seems causing different emission between stage2(gcc-clang) and stage3 clang. Investigating. llvm-svn: 223624	2014-12-08 02:07:22 +00:00
David Majnemer	1af36e5baf	InstSimplify: Optimize away useless unsigned comparisons Code like X < Y && Y == 0 should always be folded away to false. llvm-svn: 223583	2014-12-06 10:51:40 +00:00
Nick Lewycky	05044c248e	Canonicalize multiplies by looking at whether the operands have any constants themselves. Patch by Tim Murray! llvm-svn: 223554	2014-12-06 00:45:50 +00:00
Duncan P. N. Exon Smith	57cbdfc99a	BFI: Saturate when combining edges to a successor When a loop gets bundled up, its outgoing edges are quite large, and can just barely overflow 64-bits. If one successor has multiple incoming edges -- and that successor is getting all the incoming mass -- combining just its edges can overflow. Handle that by saturating rather than asserting. This fixes PR21622. llvm-svn: 223500	2014-12-05 19:13:42 +00:00
Hal Finkel	aa19bafc9c	Revert "r223364 - Revert r223347 which has caused crashes on bootstrap bots." Reapply r223347, with a fix to not crash on uninserted instructions (or more precisely, instructions in uninserted blocks). bugpoint was able to reduce the test case somewhat, but it is still somewhat large (and relies on setting things up to be simplified during inlining), so I've not included it here. Nevertheless, it is clear what is going on and why. Original commit message: Restrict somewhat the memory-allocation pointer cmp opt from r223093 Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223371	2014-12-04 17:45:19 +00:00
Alexander Potapenko	76770e4930	Revert r223347 which has caused crashes on bootstrap bots. llvm-svn: 223364	2014-12-04 14:22:27 +00:00
Elena Demikhovsky	f1de34b84d	Masked Load / Store Intrinsics - the CodeGen part. I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348	2014-12-04 09:40:44 +00:00
Hal Finkel	8b24b32c44	Restrict somewhat the memory-allocation pointer cmp opt from r223093 Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223347	2014-12-04 09:22:28 +00:00
Hal Finkel	afcd8dbbcf	Simplify pointer comparisons involving memory allocation functions System memory allocation functions, which are identified at the IR level by the noalias attribute on the return value, must return a pointer into a memory region disjoint from any other memory accessible to the caller. We can use this property to simplify pointer comparisons between allocated memory and local stack addresses and the addresses of global variables. Neither the stack nor global variables can overlap with the region used by the memory allocator. Fixes PR21556. llvm-svn: 223093	2014-12-01 23:38:06 +00:00
Philip Reames	337c4bd4ab	[Statepoints 1/4] Statepoint infrastructure for garbage collection: IR Intrinsics The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations. A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch series) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly. This is the first patch in a small series. This patch contains only the IR parts; the documentation and backend support will be following separately. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683. Reviewed by: atrick, ributzka llvm-svn: 223078	2014-12-01 21:18:12 +00:00
Rafael Espindola	a4b2ee4548	Relax an assert a bit to avoid a crash on unreachable code. Patch by Duncan Exon Smith with a small tweak by me. llvm-svn: 222984	2014-12-01 02:55:24 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
David Majnemer	c6a5e1dd4f	InstSimplify: Restore optimizations lost in r210006 This restores our ability to optimize: (X & C) ? X & ~C : X into X & ~C (X & C) ? X : X & ~C into X (X & C) ? X \| C : X into X (X & C) ? X : X \| C into X \| C llvm-svn: 222868	2014-11-27 06:32:46 +00:00
Hans Wennborg	45172aceb3	LazyValueInfo: Actually re-visit partially solved block-values in solveBlockValue() If solveBlockValue() needs results from predecessors that are not already computed, it returns false with the intention of resuming when the dependencies have been resolved. However, the computation would never be resumed since an 'overdefined' result had been placed in the cache, preventing any further computation. The point of placing the 'overdefined' result in the cache seems to have been to break cycles, but we can check for that when inserting work items in the BlockValue stack instead. This makes the "stop and resume" mechanism of solveBlockValue() work as intended, unlocking more analysis. Using this patch shaves 120 KB off a 64-bit Chromium build on Linux. I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in compile time. Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me. Differential Revision: http://reviews.llvm.org/D6397 llvm-svn: 222768	2014-11-25 17:23:05 +00:00
Chandler Carruth	1a3c2c414c	Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite clearly only exactly equal width ptrtoint and inttoptr casts are no-op casts, it says so right there in the langref. Make the code agree. Original log from r220277: Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 222739	2014-11-25 08:20:27 +00:00
David Majnemer	bd9ce4ea51	InstSimplify: Handle some simple tautological comparisons This handles cases where we are comparing a masked value against itself. The analysis could be further improved by making it recursive but such expense is not currently justified. llvm-svn: 222716	2014-11-25 02:55:48 +00:00
Philip Reames	00d3b279cb	Factor check for the assume intrinsic out of checks in computeKnownBitsFromAssume We were matching against the assume intrinsic in every check. Since we know that it must be an assume, this is just wasted work. Somewhat surprisingly, matching an intrinsic id is actually relatively expensive. It devolves to a string construction and comparison in Function::isIntrinsic. I originally spotted this because it showed up in a performance profile of my compiler. I've since discovered a separate issue which seems to be the actual root cause, but this is minor perf goodness regardless. I'm likely to follow up with another change to factor out the comparison matching. There's no need to match the compare instruction in every single one of the tests. Differential Revision: http://reviews.llvm.org/D6312 llvm-svn: 222709	2014-11-24 23:44:28 +00:00
Rafael Espindola	64ac12d626	Remove the unused FindUsedTypes pass. It was dead since r134829. llvm-svn: 222684	2014-11-24 20:53:26 +00:00
Rafael Espindola	ffbfcf29f2	Add and use Type::subtypes. NFC. llvm-svn: 222682	2014-11-24 20:44:36 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
David Majnemer	4efa9ff8ca	InstSimplify: Simplify (sub 0, X) -> X if it's NUW This is a generalization of the X - (0 - Y) -> X transform. llvm-svn: 222611	2014-11-22 07:15:16 +00:00
Hans Wennborg	cbb18e342f	LazyValueInfo: range'ify some for-loops. No functional change. llvm-svn: 222557	2014-11-21 19:07:46 +00:00
Hans Wennborg	c5ec73d801	LazyValueInfo: fix some typos and indentation, etc. NFC. llvm-svn: 222554	2014-11-21 18:58:23 +00:00
David Majnemer	3563938ee4	AliasSet: Simplify mergeSetIn No functional change intended. llvm-svn: 222376	2014-11-19 19:36:18 +00:00
David Majnemer	b7adf34ee0	AliasSetTracker: UnknownInsts should contribute to the refcount AliasSetTracker::addUnknown may create an AliasSet devoid of pointers just to contain an instruction if no suitable AliasSet already exists. It will then AliasSet::addUnknownInst and we will be done. However, it's possible for addUnknown to choose an existing AliasSet to addUnknownInst. If this were to occur, we are in a bit of a pickle: removing pointers from the AliasSet can cause the entire AliasSet to become destroyed, taking our unknown instructions out with them. Instead, keep track whether or not our AliasSet has any unknown instructions. This fixes PR21582. llvm-svn: 222338	2014-11-19 09:41:05 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
David Majnemer	5d2670c52a	ScalarEvolution: Construct SCEVDivision's Derived type instead of itself SCEVDivision::divide constructed an object of SCEVDivision<Derived> instead of Derived. divide would call visit which would cast the SCEVDivision<Derived> to type Derived. As it happens, SCEVDivision<Derived> and Derived currently have the same layout but this is fragile and grounds for UB. Instead, just construct Derived. No functional change intended. llvm-svn: 222126	2014-11-17 11:27:45 +00:00
David Majnemer	32b8ccf480	ScalarEvolution: Introduce SCEVSDivision and SCEVUDivision It turns out that not all users of SCEVDivision want the same signedness. Let the users determine which operation they'd like by explicitly choosing SCEVUDivision or SCEVSDivision. findArrayDimensions and computeAccessFunctions will use SCEVSDivision while HowFarToZero will use SCEVUDivision. llvm-svn: 222104	2014-11-16 20:35:19 +00:00
Jingyue Wu	0fa125a77d	[DependenceAnalysis] Allow subscripts of different types Summary: Several places in DependenceAnalysis assumes both SCEVs in a subscript pair share the same integer type. For instance, isKnownPredicate calls SE->getMinusSCEV(X, Y) which asserts X and Y share the same type. However, DependenceAnalysis fails to ensure this assumption when producing a subscript pair, causing tests such as NonCanonicalizedSubscript to crash. With this patch, DependenceAnalysis runs unifySubscriptType before producing any subscript pair, ensuring the assumption. Test Plan: Added NonCanonicalizedSubscript.ll on which DependenceAnalysis before the fix crashed because subscripts have different types. Reviewers: spop, sebpop, jingyue Reviewed By: jingyue Subscribers: eliben, meheff, llvm-commits Differential Revision: http://reviews.llvm.org/D6289 llvm-svn: 222100	2014-11-16 16:52:44 +00:00
David Majnemer	0df1d12476	ScalarEvolution: HowFarToZero was wrongly using signed division HowFarToZero was supposed to use unsigned division in order to calculate the backedge taken count. However, SCEVDivision::divide performs signed division. Unless I am mistaken, no users of SCEVDivision actually want signed arithmetic: switch to udiv and urem. This fixes PR21578. llvm-svn: 222093	2014-11-16 07:30:35 +00:00
David Majnemer	5854e9fae8	InstSimplify: Optimize ICmpInst xform that uses computeKnownBits A few things: - computeKnownBits is relatively expensive, let's delay its use as long as we can. - Don't create two APInt values just to run computeKnownBits on a ConstantInt, we already know the exact value! - Avoid creating a temporary APInt value in order to calculate unary negation. llvm-svn: 222092	2014-11-16 02:20:08 +00:00
Reid Kleckner	007239863e	Revert "Don't make assumptions about the name of private global variables." This reverts commit r222061. It's causing linker errors. llvm-svn: 222077	2014-11-15 02:03:53 +00:00
Rafael Espindola	2fc723099f	Don't make assumptions about the name of private global variables. Private variables are can be renamed, so it is not reliable to make decisions on the name. The name is also dropped by the assembler before getting to the linker, so using the name causes a disconnect between how llvm makes a decision (var name) and how the linker makes a decision (section it is in). This patch changes one case where we were looking at the variable name to use the section instead. Test tuning by Michael Gottesman. llvm-svn: 222061	2014-11-14 23:17:47 +00:00
David Blaikie	711cd9c53c	Remove redundant virtual on overriden functions. llvm-svn: 222023	2014-11-14 19:06:36 +00:00
Hal Finkel	45ba2c10e4	Revert r219432 - "Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information.""" Let's try this again... This reverts r219432, plus a bug fix. Description of the bug in r219432 (by Nick): The bug was using AllPositive to break out of the loop; if the loop break condition i != e is changed to i != e && AllPositive then the test_modulo_analysis_with_global test I've added will fail as the Modulo will be calculated incorrectly (as the last loop iteration is skipped, so Modulo isn't updated with its Scale). Nick also adds this comment: ComputeSignBit is safe to use in loops as it takes into account phi nodes, and the == EK_ZeroEx check is safe in loops as, no matter how the variable changes between iterations, zero-extensions will always guarantee a zero sign bit. The isValueEqualInPotentialCycles check is therefore definitely not needed as all the variable analysis holds no matter how the variables change between loop iterations. And this patch also adds another enhancement to GetLinearExpression - basically to convert ConstantInts to Offsets (see test_const_eval and test_const_eval_scaled for the situations this improves). Original commit message: This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick): The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 221876	2014-11-13 09:16:54 +00:00
Sanjoy Das	c5676df3ec	Teach ScalarEvolution to sharpen range information. If x is known to have the range [a, b), in a loop predicated by (icmp ne x, a) its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. This change was originally landed in r219834 but had a bug and broke ASan. It was reverted in r219878, and is now being re-landed after fixing the original bug. phabricator: http://reviews.llvm.org/D5639 reviewed by: atrick llvm-svn: 221839	2014-11-13 00:00:58 +00:00
Sanjay Patel	4c219fd248	CGSCC should not treat intrinsic calls like function calls (PR21403) Make the handling of calls to intrinsics in CGSCC consistent: they are not treated like regular function calls because they are never lowered to function calls. Without this patch, we can get dangling pointer asserts from the subsequent loop that processes callsites because it already ignores intrinsics. See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion. Differential Revision: http://reviews.llvm.org/D6124 llvm-svn: 221802	2014-11-12 18:25:47 +00:00
Duncan P. N. Exon Smith	de36e8040f	Revert "IR: MDNode => Value" Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711	2014-11-11 21:30:22 +00:00
Tom Roeder	eb7a303d1b	Add Forward Control-Flow Integrity. This commit adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails. The default failure function ignores the error and continues. This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site. Also, this removes support for jumptables in ARM, pending further performance analysis and discussion. Review: http://reviews.llvm.org/D4167 llvm-svn: 221708	2014-11-11 21:08:02 +00:00
Michael Liao	736bac6482	Indentation fixes llvm-svn: 221472	2014-11-06 19:05:57 +00:00
Sanjay Patel	8f093f4138	remove extra breaks; NFC llvm-svn: 221374	2014-11-05 18:00:07 +00:00
David Majnemer	bf7550e7ec	InstSimplify: Exact shifts of X by Y are X if X has the lsb set Exact shifts may not shift out any non-zero bits. Use computeKnownBits to determine when this occurs and just return the left hand side. This fixes PR21477. llvm-svn: 221325	2014-11-05 00:59:59 +00:00
David Majnemer	f20d7c4c61	Analysis: Make isSafeToSpeculativelyExecute fire less for divides Divides and remainder operations do not behave like other operations when they are given poison: they turn into undefined behavior. It's really hard to know if the operands going into a div are or are not poison. Because of this, we should only choose to speculate if there are constant operands which we can easily reason about. This fixes PR21412. llvm-svn: 221318	2014-11-04 23:49:08 +00:00
David Majnemer	2de97fcd9a	InstSimplify: Fold a hasNoSignedWrap() call into a match() expression No functionality change intended, it's just a little more concise. llvm-svn: 221281	2014-11-04 17:47:13 +00:00
David Majnemer	4f438377fb	InstSimplify: Fold a hasNoUnsignedWrap() call into a match() expression No functionality change intended, it's just a little more concise. llvm-svn: 221280	2014-11-04 17:38:50 +00:00
Sanjay Patel	aee8421088	remove function names from comments; NFC llvm-svn: 221274	2014-11-04 16:27:42 +00:00
Sanjay Patel	547e9752ff	fix typo in comment llvm-svn: 221273	2014-11-04 16:09:50 +00:00
Hal Finkel	840257a49c	Use AA in LoadCombine LoadCombine can be smarter about aborting when a writing instruction is encountered, instead of aborting upon encountering any writing instruction, use an AliasSetTracker, and only abort when encountering some write that might alias with the loads that could potentially be combined. This was originally motivated by comments made (and a test case provided) by David Majnemer in response to PR21448. It turned out that LoadCombine was not responsible for that PR, but LoadCombine should also be improved so that unrelated stores (and @llvm.assume) don't interrupt load combining. llvm-svn: 221203	2014-11-03 23:19:16 +00:00
Duncan P. N. Exon Smith	3872d0084c	IR: MDNode => Value: Instruction::getMetadata() Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024	2014-11-01 00:10:31 +00:00
Bradley Smith	9992b167ae	[SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags In a case where we have a no {un,}signed wrap flag on the increment, if RHS - Start is constant then we can avoid inserting a max operation bewteen the two, since we can statically determine which is greater. This allows us to unroll loops such as: void testcase3(int v) { for (int i=v; i<=v+1; ++i) f(i); } llvm-svn: 220960	2014-10-31 11:40:32 +00:00
Philip Reames	4cb4d3e048	Add handling for range metadata in ValueTracking isKnownNonZero If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes. This helps to remove redundant checks and canonicalize checks for other optimization passes. This particular patch checks whether a value is known to be non-zero from the range metadata. Currently, these tests are against InstCombine. In theory, all of these should be InstSimplify since we're not inserting any new instructions. Moving the code may follow in a separate change. Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D5947 llvm-svn: 220925	2014-10-30 20:25:19 +00:00
NAKAMURA Takumi	d0e13af22c	Reformat partially, where I touched for whitespace changes. llvm-svn: 220773	2014-10-28 11:54:52 +00:00
NAKAMURA Takumi	335a7bcf1e	Untabify and whitespace cleanups. llvm-svn: 220771	2014-10-28 11:53:30 +00:00
Benjamin Kramer	63207bc9c3	Clean up assume intrinsic pattern matching, no need to check that the argument is a value. Also make it const safe and remove superfluous casting. NFC. llvm-svn: 220616	2014-10-25 18:09:01 +00:00
Bruno Cardoso Lopes	c29520c5b3	[InstSimplify] Support constant folding to vector of pointers ConstantFolding crashes when trying to InstSimplify the following load: @a = private unnamed_addr constant %mst { i8* inttoptr (i64 -1 to i8), i8 inttoptr (i64 -1 to i8) }, align 8 %x = load <2 x i8>* bitcast (%mst* @a to <2 x i8>), align 8 This patch fix this by adding support to this type of folding: %x = load <2 x i8> bitcast (%mst* @a to <2 x i8>), align 8 ==> gets folded to: %x = <2 x i8> <i8 inttoptr (i64 -1 to i8), i8 inttoptr (i64 -1 to i8*)> llvm-svn: 220380	2014-10-22 12:18:48 +00:00
Hans Wennborg	0b39fc0d16	Revert "Teach the load analysis to allow finding available values which require" (r220277) This seems to have caused PR21330. llvm-svn: 220349	2014-10-21 23:49:52 +00:00
Matt Arsenault	d6511b49ac	Add minnum / maxnum intrinsics These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341	2014-10-21 23:00:20 +00:00
Sanjay Patel	d5aa255146	remove function names from comments; NFC llvm-svn: 220309	2014-10-21 18:26:57 +00:00
Chandler Carruth	aa72a6dd3b	Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 220277	2014-10-21 09:00:40 +00:00
Philip Reames	5a3f5f751b	Introduce enum values for previously defined metadata types. (NFC) Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248	2014-10-21 00:13:20 +00:00
Philip Reames	cdb72f369f	Introduce a 'nonnull' metadata on Load instructions. The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns. Long term, it would be nice to combine these into a single construct. The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull. Reviewed by: Hal Finkel Differential Revision: http://reviews.llvm.org/D5220 llvm-svn: 220240	2014-10-20 22:40:55 +00:00
Chandler Carruth	a32038b006	Fix a miscompile introduced in r220178. The original code had an implicit assumption that if the test for allocas or globals was reached, the two pointers were not equal. With my changes to make the pointer analysis more powerful here, I also had to guard against circumstances where the results weren't useful. That in turn violated the assumption and gave rise to a circumstance in which we could have a store with both the queried pointer and stored pointer rooted at the same alloca. Clearly, we cannot ignore such a store. There are other things we might do in this code to better handle the case of both pointers ending up at the same alloca or global, but it seems best to at least make the test explicit in what it intends to check. I've added tests for both the alloca and global case here. llvm-svn: 220190	2014-10-20 10:03:01 +00:00
Chandler Carruth	eeec35ae1c	Teach the load analysis driving core instcombine logic and other bits of logic to look through pointer casts, making them trivially stronger in the face of loads and stores with intervening pointer casts. I've included a few test cases that demonstrate the kind of folding instcombine can do without pointer casts and then variations which obfuscate the logic through bitcasts. Without this patch, the variations all fail to optimize fully. This is more important now than it has been in the past as I've started moving the load canonicialization to more closely follow the value type requirements rather than the pointer type requirements and thus this needs to be prepared for more pointer casts. When I made the same change to stores several test cases regressed without logic along these lines so I wanted to systematically improve matters first. llvm-svn: 220178	2014-10-20 00:24:14 +00:00
Chandler Carruth	5b8cd2f73c	Move previously dead code to handle computing the known bits of an alias up to where it actually works as intended. The problem is that a GlobalAlias isa GlobalValue and so the prior block handled all of the cases. This allows us to constant fold based on the actual constant expression in the global alias. As an example, see the last function in the newly added test case which explicitly aligns an unaligned pointer using constant expression math. Without this change, we fail to see that and fold an alignment test to zero. llvm-svn: 220164	2014-10-19 09:06:56 +00:00
Chandler Carruth	a801dd5799	Fix a long-standing miscompile in the load analysis that was uncovered by my refactoring of this code. The method isSafeToLoadUnconditionally assumes that the load will proceed with the preferred type alignment. Given that, it has to ensure that the alloca or global is at least that aligned. It has always done this historically when a datalayout is present, but has never checked it when the datalayout is absent. When I refactored the code in r220156, I exposed this path when datalayout was present and that turned the latent bug into a patent bug. This fixes the issue by just removing the special case which allows folding things without datalayout. This isn't worth the complexity of trying to tease apart when it is or isn't safe without actually knowing the preferred alignment. llvm-svn: 220161	2014-10-19 08:17:50 +00:00
Chandler Carruth	8a99373812	Switch how the datalayout availability test is handled in this code to make much more sense and in theory be more correct. If you trace the code alllll the way back to when it was first introduced, the comments make it slightly more clear what was going on here. At that time, the only way Base != V was if DL (then TD) was non-null. As a consequence, if DL was null, that meant we were loading directly from the alloca or global found above the test. After refactoring, this has become at least terribly subtle and potentially incorrect. There are many forms of pointer manipulation that can be traversed without DataLayout, and some of them would in fact change the size of object being loaded vs. allocated. Rather than this subtlety, I've hoisted the actual 'return true' bits into the code which actually found an alloca or global and based them on the loaded pointer being that alloca or global. This is both more clear and safer. I've also added comments about exactly why this set of predicates is used. I've also corrected a misleading comment about globals -- if overridden they may not just have a different size, they may be null and completely unsafe to load from! Hopefully this confuses the next reader a bit less. I don't have any test cases or anything, the patch is motivated strictly to improve the readability of the code. llvm-svn: 220156	2014-10-19 00:42:16 +00:00
Chandler Carruth	38e98d5782	Rename 'TD' to 'DL' in this function as the argument is now a DataLayout argument. llvm-svn: 220151	2014-10-18 23:47:22 +00:00
Chandler Carruth	1f27f03849	Fix the other comment to use modern doxygen style and be a bit more direct. Notably, comment on the fact that the loaded type is significant in that it determines how wide of an access must be safe. llvm-svn: 220150	2014-10-18 23:46:17 +00:00
Chandler Carruth	be49df3d2c	More formatting cleanup brought to you by clang-format. llvm-svn: 220149	2014-10-18 23:41:25 +00:00
Chandler Carruth	b56052f44d	Clean up doxygen syntax and reword comments to flow better, have a brief section, and not have unfinished sentence fragments. llvm-svn: 220147	2014-10-18 23:31:55 +00:00
Chandler Carruth	d67244df4e	Clean up the formatting and trailing whitespace of a routine before editting it. llvm-svn: 220146	2014-10-18 23:19:03 +00:00
Hal Finkel	2400c96cc3	[LVI] Add some additional comments about caching and context instructions Philip Reames and I had a long conversation about this, mostly because it is not obvious why the current logic is correct. Hopefully, these comments will prevent such confusion in the future. llvm-svn: 219882	2014-10-16 00:40:05 +00:00
Sanjoy Das	360b1ed5f2	Revert "r219834 - Teach ScalarEvolution to sharpen range information" This change breaks the asan buildbots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468 llvm-svn: 219878	2014-10-15 23:46:04 +00:00
Sanjoy Das	90c2f1455a	Teach ScalarEvolution to sharpen range information. If x is known to have the range [a, b) in a loop predicated by (icmp ne x, a), its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. phabricator: http://reviews.llvm.org/D5639 llvm-svn: 219834	2014-10-15 19:25:28 +00:00
Hal Finkel	8683d2b0d2	Treat the WorkSet used to find ephemeral values as double-ended We need to make sure that we visit all operands of an instruction before moving deeper in the operand graph. We had been pushing operands onto the back of the work set, and popping them off the back as well, meaning that we might visit an instruction before visiting all of its uses that sit in between it and the call to @llvm.assume. To provide an explicit example, given the following: %q0 = extractelement <4 x float> %rd, i32 0 %q1 = extractelement <4 x float> %rd, i32 1 %q2 = extractelement <4 x float> %rd, i32 2 %q3 = extractelement <4 x float> %rd, i32 3 %q4 = fadd float %q0, %q1 %q5 = fadd float %q2, %q3 %q6 = fadd float %q4, %q5 %qi = fcmp olt float %q6, %q5 call void @llvm.assume(i1 %qi) %q5 is used by both %qi and %q6. When we visit %qi, it will be marked as ephemeral, and we'll queue %q6 and %q5. %q6 will be marked as ephemeral and we'll queue %q4 and %q5. Under the old system, we'd then visit %q4, which would become ephemeral, %q1 and then %q0, which would become ephemeral as well, and now we have a problem. We'd visit %rd, but it would not be marked as ephemeral because we've not yet visited %q2 and %q3 (because we've not yet visited %q5). This will be covered by a test case in a follow-up commit that enables ephemeral-value awareness in the SLP vectorizer. llvm-svn: 219815	2014-10-15 17:34:48 +00:00
Hal Finkel	db5f86a9bf	[CFL-AA] CFL-AA should not assert on an va_arg instruction The CFL-AA implementation was missing a visit* routine for va_arg instructions, causing it to assert when run on a function that had one. For now, handle these in a conservative way. Fixes PR20954. llvm-svn: 219718	2014-10-14 20:51:26 +00:00
Hal Finkel	a3f23e3725	[LVI] Check for @llvm.assume dominating the edge branch When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value constraints, we should check for intrinsics that dominate the edge's branch, not just any potential context instructions. An assumption that dominates the edge's branch represents a truth on that edge. This is specifically useful, for example, if multiple predecessors assume a pointer to be nonnull, allowing us to simplify a later null comparison. The test case, and an initial patch, were provided by Philip Reames. Thanks! llvm-svn: 219688	2014-10-14 16:04:49 +00:00
Richard Smith	dc69ce32ef	[modules] Stop excluding Support/Debug.h from the Support module. This header has been modular since r206822, and excluding it was leading to workarounds such as the one in r219592, which this change removes. llvm-svn: 219593	2014-10-13 00:41:03 +00:00
Benjamin Kramer	24165219b1	[Modules] Add some missing includes to make files compile stand-alone. llvm-svn: 219592	2014-10-12 22:49:26 +00:00
Benjamin Kramer	603c2c79ed	AssumptionTracker: Don't create temporary CallbackVHs. Those are expensive to create in cold cache scenarios. NFC. llvm-svn: 219575	2014-10-11 19:13:01 +00:00
David Majnemer	cb9d596655	InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow consider: C1 = INT_MIN C2 = -1 C1 * C2 overflows without a doubt but consider the following: %x = i32 INT_MIN This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1. N. B. Move the unsigned version of this transform to InstSimplify, it doesn't create any new instructions. This fixes PR21243. llvm-svn: 219567	2014-10-11 10:20:01 +00:00
Chandler Carruth	6666c27e99	[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose. I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers. The core problem seems to stem from an assumption that the latch is the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a generic fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually wants to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case. I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples. Thansk to Mark for the help debugging and tracking down the right fix here! llvm-svn: 219550	2014-10-11 00:12:11 +00:00
Sanjoy Das	1f05c51e5e	This patch teaches ScalarEvolution to pick and use !range metadata. It also makes it more aggressive in querying range information by adding a call to isKnownPredicateWithRanges to isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond. phabricator: http://reviews.llvm.org/D5638 Reviewed by: atrick, hfinkel llvm-svn: 219532	2014-10-10 21:22:34 +00:00
Mark Heffernan	2beab5f0b4	This patch de-pessimizes the calculation of loop trip counts in ScalarEvolution in the presence of multiple exits. Previously all loops exits had to have identical counts for a loop trip count to be considered computable. This pessimization was implemented by calling getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock) inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME in the comments of that function). The pessimization was added to fix a corner case involving undefined behavior (pr/16130). This patch more precisely handles the undefined behavior case allowing the pessimization to be removed. ControlsExit replaces IsSubExpr to more precisely track the case where undefined behavior is expected to occur. Because undefined behavior is tracked more precisely we can remove MustExit from ExitLimit. MustExit was used to track the case where the limit was computed potentially assuming undefined behavior even if undefined behavior didn't necessarily occur. llvm-svn: 219517	2014-10-10 17:39:11 +00:00
Hal Finkel	49dadc0bc3	[LVI] Revert the remainder of "r218231 - Add two thresholds lvi-overdefined-BB-threshold and lvi-overdefined-threshold" Some of r218231 was reverted with the code that used it in r218971, but not all of it. This removes the rest (which is now dead). llvm-svn: 219469	2014-10-10 03:56:24 +00:00
Hal Finkel	cbbd3df836	Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information."" This reverts commit r219135 -- still causing miscompiles in SPEC it seems... llvm-svn: 219432	2014-10-09 19:48:12 +00:00
Hal Finkel	43ce71f1b1	[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information." This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick) The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 219135	2014-10-06 18:37:59 +00:00
Duncan P. N. Exon Smith	a7a90a2f19	BFI: Improve assertion message, since it's actually firing This assertion is firing because -loop-unroll is failing to preserve -loop-info (see PR20987). Improve it. llvm-svn: 219130	2014-10-06 17:42:00 +00:00
Hal Finkel	8eae3ad2ff	[CFL-AA] Update for handling of globals and more tests We used to return PartialAlias if either variable being queried interacted with arguments or globals. AFAICT, we can change this to only returning MayAlias iff both variables being queried interacted with arguments or globals. Also, adding some basic functionality tests: some basic IPA tests, checking that we give conservative responses with arguments/globals thrown in the mix, and ensuring that we trace values through stores and loads. Note that saying that 'x' interacted with arguments or globals means that the Attributes of the StratifiedSet that 'x' belongs to has any bits set. Patch by George Burgess IV, thanks! llvm-svn: 219122	2014-10-06 14:42:56 +00:00
Benjamin Kramer	12a2d10769	Simplify code. No functionality change. llvm-svn: 219082	2014-10-05 12:21:57 +00:00
Benjamin Kramer	2e52f02864	Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it. llvm-svn: 219068	2014-10-04 22:44:29 +00:00
Richard Smith	1ed4229f6f	PR21145: Teach LLVM about C++14 sized deallocation functions. C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014	2014-10-03 20:17:06 +00:00
James Molloy	cb7449d058	Revert r215343. This was contentious and needs invesigation. llvm-svn: 218971	2014-10-03 09:29:24 +00:00
Lang Hames	89e9c17235	[BasicAA] Revert r218714 - Make better use of zext and sign information. This patch broke 447.dealII on Darwin. I'm currently working on a reduced test-case, but reverting for now to keep the bots happy. <rdar://problem/18530107> llvm-svn: 218944	2014-10-03 01:33:47 +00:00
Sanjay Patel	0d7dee654d	Remove duplicate function names from comments. NFC. llvm-svn: 218875	2014-10-02 15:13:22 +00:00
Aaron Ballman	254dd7e439	Silence a -Wsign-compare warning. NFC. llvm-svn: 218868	2014-10-02 13:17:11 +00:00
Argyrios Kyrtzidis	0b9f5507c8	Adds 'override' to overriding methods. NFC. llvm-svn: 218815	2014-10-01 21:00:44 +00:00
Sanjay Patel	7b2cd9ad86	Make the sqrt intrinsic return undef for a negative input. As discussed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html And again here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html The sqrt of a negative number when using the llvm intrinsic is undefined. We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref. This change should not affect any code that isn't using "no-nans-fp-math"; ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call. Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. We knowingly approve of this difference with the other compilers in an attempt to flag code that is invoking undefined behavior. A front-end warning should also try to convince the user that the program will fail: http://llvm.org/bugs/show_bug.cgi?id=21093 Differential Revision: http://reviews.llvm.org/D5527 llvm-svn: 218803	2014-10-01 20:36:33 +00:00
Bruno Cardoso Lopes	e3c513a965	[MemoryDepAnalysis] Fix compile time slowdown - Problem One program takes ~3min to compile under -O2. This happens after a certain function A is inlined ~700 times in a function B, inserting thousands of new BBs. This leads to 80% of the compilation time spent in GVN::processNonLocalLoad and MemoryDependenceAnalysis::getNonLocalPointerDependency, while searching for nonlocal information for basic blocks. Usually, to avoid spending a long time to process nonlocal loads, GVN bails out if it gets more than 100 deps as a result from MD->getNonLocalPointerDependency. However this only happens after all nonlocal information for BBs have been computed, which is the bottleneck in this scenario. For instance, there are 8280 times where getNonLocalPointerDependency returns deps with more than 100 bbs and from those, 600 times it returns more than 1000 blocks. - Solution Bail out early during the nonlocal info computation whenever we reach a specified threshold. This patch proposes a 100 BBs threshold, it also reduces the compile time from 3min to 23s. - Testing The test-suite presented no compile nor execution time regressions. Some numbers from my machine (x86_64 darwin): - 17s under -Oz (which avoids inlining). - 1.3s under -O1. - 2m51s under -O2 ToT *** 23s under -O2 w/ Result.size() > 100 - 1m54s under -O2 w/ Result.size() > 500 With NumResultsLimit = 100, GVN yields the same outcome as in the unlimited 3min version. http://reviews.llvm.org/D5532 rdar://problem/18188041 llvm-svn: 218792	2014-10-01 20:07:13 +00:00
Hal Finkel	fd86317989	[BasicAA] Make better use of zext and sign information Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 218714	2014-09-30 22:43:40 +00:00
David Peixotto	472b05b36c	Ignore annotation function calls in cost computation The annotation instructions are dropped during codegen and have no impact on size. In some cases, the annotations were preventing the unroller from unrolling a loop because the annotation calls were pushing the cost over the unrolling threshold. Differential Revision: http://reviews.llvm.org/D5335 llvm-svn: 218525	2014-09-26 17:48:40 +00:00
David Peixotto	0d4d5e64ec	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387	2014-09-24 16:48:31 +00:00
Jiangning Liu	cd1d79e77c	Add two thresholds lvi-overdefined-BB-threshold and lvi-overdefined-threshold for LVI algorithm. For a specific value to be lowered, when the number of basic blocks being checked for overdefined lattice value is larger than lvi-overdefined-BB-threshold, or the times of encountering overdefined value for a single basic block is larger than lvi-overdefined-threshold, the LVI algorithm will stop further lowering the lattice value. llvm-svn: 218231	2014-09-22 02:23:05 +00:00
Eric Christopher	d4838554ac	Add file to CMake build as well. llvm-svn: 218005	2014-09-18 00:39:20 +00:00
Eric Christopher	d85ffb1fc0	Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004	2014-09-18 00:34:14 +00:00
David Majnemer	b435a4214e	InstSimplify: Don't allow (x srem y) urem y -> x srem y Let's consider the case where: %x i16 = 32768 %y i16 = 384 %x srem %y = 65408 (%x srem %y) urem %y = 128 llvm-svn: 217939	2014-09-17 04:16:35 +00:00
David Majnemer	ac717f0972	InstSimplify: ((X % Y) % Y) -> (X % Y) Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D5350 llvm-svn: 217937	2014-09-17 03:34:34 +00:00
David Majnemer	a315bd80c2	InstSimplify: Simplify trivial and/or of icmps Some ICmpInsts when anded/ored with another ICmpInst trivially reduces to true or false depending on whether or not all integers or no integers satisfy the intersected/unioned range. This sort of trivial looking code can come about when InstCombine performs a range reduction-type operation on sdiv and the like. This fixes PR20916. llvm-svn: 217750	2014-09-15 08:15:28 +00:00
Benjamin Kramer	cfd8d90969	Fix an ODR violation consisting of two 'struct Query' in the global namespace. Put them in their own anonymous namespaces. Found by GCC's new -Wodr (PR20915). llvm-svn: 217662	2014-09-12 08:56:53 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Hal Finkel	cebf0cc210	Make use @llvm.assume for loop guards in ScalarEvolution This adds a basic (but important) use of @llvm.assume calls in ScalarEvolution. When SE is attempting to validate a condition guarding a loop (such as whether or not the loop count can be zero), this check should also include dominating assumptions. llvm-svn: 217348	2014-09-07 21:37:59 +00:00
Hal Finkel	7e1844940e	Make use of @llvm.assume from LazyValueInfo This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with the known-bits change (r217342), this requires feeding a "context" instruction pointer through many functions. Aside from a little refactoring to reuse the logic that turns predicates into constant ranges in LVI, the only new code is that which can 'merge' the range from an assumption into that otherwise computed. There is also a small addition to JumpThreading so that it can have LVI use assumptions in the same block as the comparison feeding a conditional branch. With this patch, we can now simplify this as expected: int foo(int a) { __builtin_assume(a > 5); if (a > 3) { bar(); return 1; } return 0; } llvm-svn: 217345	2014-09-07 20:29:59 +00:00
Hal Finkel	15aeaaf24a	Add additional patterns for @llvm.assume in ValueTracking This builds on r217342, which added the infrastructure to compute known bits using assumptions (@llvm.assume calls). That original commit added only a few patterns (to catch common cases related to determining pointer alignment); this change adds several other patterns for simple cases. r217342 contained that, for assume(v & b = a), bits in the mask that are known to be one, we can propagate known bits from the a to v. It also had a known-bits transfer for assume(a = b). This patch adds: assume(~(v & b) = a) : For those bits in the mask that are known to be one, we can propagate inverted known bits from the a to v. assume(v \| b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. assume(~(v \| b) = a): For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. assume(v ^ b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. For those bits in b that are known to be one, we can propagate inverted known bits from the a to v. assume(~(v ^ b) = a) : For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. For those bits in b that are known to be one, we can propagate known bits from the a to v. assume(v << c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v << c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >> c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v >> c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >=_s c) where c is non-negative: The sign bit of v is zero assume(v >_s c) where c is at least -1: The sign bit of v is zero assume(v <=_s c) where c is negative: The sign bit of v is one assume(v <_s c) where c is non-positive: The sign bit of v is one assume(v <=_u c): Transfer the known high zero bits assume(v <_u c): Transfer the known high zero bits (if c is know to be a power of 2, transfer one more) A small addition to InstCombine was necessary for some of the test cases. The problem is that when InstCombine was simplifying and, or, etc. it would fail to check the 'do I know all of the bits' condition before checking less specific conditions and would not fully constant-fold the result. I'm not sure how to trigger this aside from using assumptions, so I've just included the change here. llvm-svn: 217343	2014-09-07 19:21:07 +00:00
Hal Finkel	60db05896a	Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342	2014-09-07 18:57:58 +00:00
Hal Finkel	57f03dda49	Add functions for finding ephemeral values This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or indirectly), and thus will be removed prior to code generation, implying that they should be considered free for certain purposes (like inlining). The inliner's cost analysis, and a few other passes, have been updated to account for ephemeral values using the provided functionality. This functionality is important for the usability of @llvm.assume, because it limits the "non-local" side-effects of adding llvm.assume on inlining, loop unrolling, etc. (these are hints, and do not generate code, so they should not directly contribute to estimates of execution cost). llvm-svn: 217335	2014-09-07 13:49:57 +00:00
Hal Finkel	74c2f355d2	Add an Assumption-Tracking Pass This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale functions and intrinsics out of the map, and it relies on any code that creates new @llvm.assume calls to notify it of the new instructions. The benefit is that code needing to find @llvm.assume intrinsics can do so directly, without scanning the function, thus allowing the cost of @llvm.assume handling to be negligible when none are present. The current design is intended to be lightweight. We don't keep track of anything until we need a list of assumptions in some function. The first time this happens, we scan the function. After that, we add/remove @llvm.assume calls from the cache in response to registration calls and ValueHandle callbacks. There are no new direct test cases for this pass, but because it calls it validation function upon module finalization, we'll pick up detectable inconsistencies from the other tests that touch @llvm.assume calls. This pass will be used by follow-up commits that make use of @llvm.assume. llvm-svn: 217334	2014-09-07 12:44:26 +00:00
Benjamin Kramer	8c90fd71f7	Add override to overriden virtual methods, remove virtual keywords. No functionality change. Changes made by clang-tidy + some manual cleanup. llvm-svn: 217028	2014-09-03 11:41:21 +00:00
Hal Finkel	85f2692d2f	[CFLAA] Remove one final initializer list Maybe MSVC will be happy now... llvm-svn: 217000	2014-09-03 00:06:47 +00:00
Hal Finkel	1ae325f53d	[CFLAA] And even more MSVC fixes Remove a couple more initializer lists and constexpr dependencies. llvm-svn: 216998	2014-09-02 23:50:01 +00:00
Hal Finkel	ca616acd73	[CFLAA] More cleanup for MSVC Remove more initializer lists, etc. llvm-svn: 216994	2014-09-02 23:29:48 +00:00
Hal Finkel	8d1590dc4b	[CFLAA] No initializer lists for MSVC MSVC 2012 does not understand initializer lists; remove them. llvm-svn: 216991	2014-09-02 22:52:30 +00:00
Hal Finkel	42b7e01f7c	[CFLAA] Remove tautological comparison Fixes this (the warning is right, the unsigned value is not negative): lib/Analysis/StratifiedSets.h:689:53: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] bool inbounds(StratifiedIndex N) const { return N >= 0 && N < Links.size(); } llvm-svn: 216987	2014-09-02 22:36:58 +00:00
Hal Finkel	981602a84c	[CFLAA] LLVM_CONSTEXPR -> const The number is just a constant, and this should make MSVC happy (or at least happier). llvm-svn: 216981	2014-09-02 22:26:06 +00:00
Hal Finkel	7d7087c124	[CFLAA] constexpr -> LLVM_CONSTEXPR Attempt to fix the MSVC build by not using constexpr. llvm-svn: 216979	2014-09-02 22:13:00 +00:00
Hal Finkel	7529c55c02	Add a CFL Alias Analysis implementation This provides an implementation of CFL alias analysis (including some supporting data structures). Currently, we don't have any extremely fancy features, sans some interprocedural analysis (i.e. no field sensitivity, etc.), and we do best sitting behind BasicAA + TBAA. In such a configuration, we take ~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA answered with NoAlias by this algorithm. Patch by George Burgess IV (with minor modifications by me -- mostly adapting some BasicAA tests), thanks! llvm-svn: 216970	2014-09-02 21:43:13 +00:00
Robin Morisset	4f6b93b1a8	Fix MemoryDependenceAnalysis in cases where QueryInstr is a CmpXchg or a AtomicRMW Summary: MemoryDependenceAnalysis is currently cautious when the QueryInstr is an atomic load or store, but I forgot to check for atomic cmpxchg/atomicrmw. This patch is a way of fixing that, and making it less brittle (i.e. no risk that I forget another possible kind of atomic, even if the IR ends up changing in the future), by adding a fallback checking mayReadOrWriteFromMemory. Thanks to Philip Reames for finding this bug and suggesting this solution in http://reviews.llvm.org/D4845 Sadly, I don't see how to add a test for this, since the passes depending on MemoryDependenceAnalysis won't trigger for an atomic rmw anyway. Does anyone see a way for testing it? Test Plan: none possible at first sight Reviewers: jfb, reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5019 llvm-svn: 216940	2014-09-02 20:17:52 +00:00
Nick Lewycky	97756409ea	Remove an errant outer loop that contains nothing but an inner loop over exactly the same elements. While no functionality is change intended (and hence there are no changes to tests), you don't want to skip this revision if bisecting for errors. llvm-svn: 216864	2014-09-01 05:17:15 +00:00
Craig Topper	fd38cbebda	Remove 'virtual' keyword from methods markedwith 'override' keyword. llvm-svn: 216823	2014-08-30 16:48:34 +00:00
Robin Morisset	039781ef26	Fix typos in comments, NFC Summary: Just fixing comments, no functional change. Test Plan: N/A Reviewers: jfb Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D5130 llvm-svn: 216784	2014-08-29 21:53:01 +00:00
Robin Morisset	163ef0402a	Relax the constraint more in MemoryDependencyAnalysis.cpp Even loads/stores that have a stronger ordering than monotonic can be safe. The rule is no release-acquire pair on the path from the QueryInst, assuming that the QueryInst is not atomic itself. llvm-svn: 216771	2014-08-29 20:32:58 +00:00
Matt Arsenault	85cbc7e371	Make fabs safe to speculatively execute llvm-svn: 216736	2014-08-29 16:01:17 +00:00
David Majnemer	76d06bc613	InstSimplify: Move a transform from InstCombine to InstSimplify Several combines involving icmp (shl C2, %X) C1 can be simplified without introducing any new instructions. Move them to InstSimplify; while we are at it, make them more powerful. llvm-svn: 216642	2014-08-28 03:34:28 +00:00
David Majnemer	11ca2971e8	InstSimplify: Don't simplify gep X, (Y-X) to Y if types differ It's incorrect to perform this simplification if the types differ. A bitcast would need to be inserted for this to work. This fixes PR20771. llvm-svn: 216597	2014-08-27 20:08:34 +00:00
Nico Weber	48c82400ed	Reland r216439 215441, majnemer has a real fix for PR20771. llvm-svn: 216586	2014-08-27 20:06:19 +00:00
Nico Weber	7b343e3cc6	Revert r216439 (and r216441, else the former doesn't revert cleanly). It caused PR 20771. I'll land a test on the clang side. llvm-svn: 216582	2014-08-27 20:00:13 +00:00
David Majnemer	d6d1671c1e	InstSimplify: Compute comparison ranges for left shift instructions 'shl nuw CI, x' produces [CI, CI << CLZ(CI)] 'shl nsw CI, x' produces [CI << CLO(CI)-1, CI] if CI is negative 'shl nsw CI, x' produces [CI, CI << CLZ(CI)-1] if CI is non-negative llvm-svn: 216570	2014-08-27 18:03:46 +00:00
David Majnemer	788d0ab8c8	InstSimplify: Fold gep X, (sub 0, ptrtoint(X)) to null Save InstCombine some work if we can perform this fold during InstSimplify. llvm-svn: 216441	2014-08-26 07:08:03 +00:00
David Majnemer	bc4981323f	InstSimplify: Simplify trivial pointer expressions like b + (e - b) consider: long long f(long long b, long long e) { return b + (e - b); } we would lower this to something like: define i64 @f(i64* %b, i64* %e) { %1 = ptrtoint i64* %e to i64 %2 = ptrtoint i64* %b to i64 %3 = sub i64 %1, %2 %4 = ashr exact i64 %3, 3 %5 = getelementptr inbounds i64* %b, i64 %4 ret i64* %5 } This should fold away to just 'e'. N.B. This adds m_SpecificInt as a convenient way to match against a particular 64-bit integer when using LLVM's match interface. llvm-svn: 216439	2014-08-26 05:55:16 +00:00
Dylan Noblesmith	43f49cad78	Analysis: cleanup Address review comments. llvm-svn: 216432	2014-08-26 02:03:40 +00:00
Dylan Noblesmith	4ffafefdaa	Revert "Analysis: unique_ptr-ify DependenceAnalysis::collectCoeffInfo" This reverts commit r216358. llvm-svn: 216431	2014-08-26 02:03:38 +00:00
Rafael Espindola	3fd1e9933f	Modernize raw_fd_ostream's constructor a bit. Take a StringRef instead of a "const char *". Take a "std::error_code &" instead of a "std::string &" for error. A create static method would be even better, but this patch is already a bit too big. llvm-svn: 216393	2014-08-25 18:16:47 +00:00
Karthik Bhat	7f33ff7dea	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Dylan Noblesmith	3ecd22fcf5	Analysis: unique_ptr-ify DependenceAnalysis::collectCoeffInfo llvm-svn: 216358	2014-08-25 00:28:43 +00:00
Dylan Noblesmith	2cae60e730	Analysis: unique_ptr-ify DependenceAnalysis::depends llvm-svn: 216357	2014-08-25 00:28:39 +00:00
Dylan Noblesmith	d96ce66cb1	Analysis: take a reference instead of pointer This parameter is never null. llvm-svn: 216356	2014-08-25 00:28:35 +00:00
Craig Topper	4627679cec	Use range based for loops to avoid needing to re-mention SmallPtrSet size. llvm-svn: 216351	2014-08-24 23:23:06 +00:00
David Majnemer	97ddca3224	ValueTracking: Figure out more bits when looking at add/sub Given something like X01XX + X01XX, we know that the result must look like X1XXX. Adapted from a patch by Richard Smith, test-case written by me. llvm-svn: 216250	2014-08-22 00:40:43 +00:00
Craig Topper	71b7b68b74	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 216158	2014-08-21 05:55:13 +00:00
Robin Morisset	9e98e7f7fc	Answer to Philip Reames comments - add check for volatile (probably unneeded, but I agree that we should be conservative about it). - strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered). - separate a condition in two. - lengthen comment about aliasing and loads - add tests in GVN/atomic.ll llvm-svn: 215943	2014-08-18 22:18:14 +00:00
Robin Morisset	4ffe8aaa69	Weak relaxing of the constraints on atomics in MemoryDependencyAnalysis Monotonic accesses do not have to kill the analysis, as long as the QueryInstr is not itself atomic. llvm-svn: 215942	2014-08-18 22:18:11 +00:00
Craig Topper	6230691c91	Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870	2014-08-18 00:24:38 +00:00
Craig Topper	5229cfd163	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 215868	2014-08-17 23:47:00 +00:00
Jiangning Liu	40b04fd994	In LVI(Lazy Value Info), originally value on a BB can only be caculated once, and the lattice will be updated to be a state other than "undefined". This limiation could miss some opportunities of lowering "overdefined" to be an even accurate value. So this patch ask the algorithm to try to lower the lattice value again even if the value has been lowered to be "overdefined". llvm-svn: 215343	2014-08-11 05:02:04 +00:00
Richard Smith	56579b6324	Remove Support/IncludeFile.h and its only user. This is actively harmful, since it breaks the modules builds (where CallGraph.h can be quite reasonably transitively included by an unimported portion of a module, and CallGraph.cpp not linked in), and appears to have been entirely redundant since PR780 was fixed back in 2008. If this breaks anything, please revert; I have only tested this with a single configuration, and it's possible that this is still somehow fixing something (though I doubt it, since no other similar file uses this mechanism any more). llvm-svn: 215142	2014-08-07 20:41:17 +00:00
James Molloy	2b8933c354	Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859	2014-08-05 12:30:34 +00:00
Hal Finkel	36eff0f854	Fix ScalarEvolutionExpander when creating a PHI in a block with duplicate predecessors It seems that when I fixed this, almost exactly a year ago, I did not quite do it correctly. When we have duplicate block predecessors, we can indeed not have different incoming values for the same block, but we must have duplicate entries. So, instead of skipping the duplicates, we explicitly add the duplicate incoming values. Fixes PR20442. llvm-svn: 214423	2014-07-31 19:13:38 +00:00
David Majnemer	cd4fbcd1bb	InstSimplify: Simplify (X - (0 - Y)) if the second sub is NUW If the NUW bit is set for 0 - Y, we know that all values for Y other than 0 would produce a poison value. This allows us to replace (0 - Y) with 0 in the expression (X - (0 - Y)) which will ultimately leave us with X. This partially fixes PR20189. llvm-svn: 214384	2014-07-31 04:49:18 +00:00
Hal Finkel	930469107d	Add @llvm.assume, lowering, and some basic properties This is the first commit in a series that add an @llvm.assume intrinsic which can be used to provide the optimizer with a condition it may assume to be true (when the control flow would hit the intrinsic call). Some basic properties are added here: - llvm.invariant(true) is dead. - llvm.invariant(false) is unreachable (this directly corresponds to the documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef). The intrinsic is tagged as writing arbitrarily, in order to maintain control dependencies. BasicAA has been updated, however, to return NoModRef for any particular location-based query so that we don't unnecessarily block code motion. llvm-svn: 213973	2014-07-25 21:13:35 +00:00
Hal Finkel	029cde639c	Simplify and improve scoped-noalias metadata semantics In the process of fixing the noalias parameter -> metadata conversion process that will take place during inlining (which will be committed soon, but not turned on by default), I have come to realize that the semantics provided by yesterday's commit are not really what we want. Here's why: void foo(noalias a, noalias b, noalias c, bool x) { q = x ? a : b; c = q; } Generically, we know that c does not alias with a and with b (so there is an 'and' in what we know we're not), and we know that q might be derived from a or from *b (so there is an 'or' in what we know that we are). So we do not want the semantics currently, where any noalias scope matching any alias.scope causes a NoAlias return. What we want to know is that the noalias scopes form a superset of the alias.scope list (meaning that all the things we know we're not is a superset of all of things the other instruction might be). Making that change, however, introduces a composibility problem. If we inline once, adding the noalias metadata, and then inline again adding more, and we append new scopes onto the noalias and alias.scope lists each time. But, this means that we could change what was a NoAlias result previously into a MayAlias result because we appended an additional scope onto one of the alias.scope lists. So, instead of giving scopes the ability to have parents (which I had borrowed from the TBAA implementation, but seems increasingly unlikely to be useful in practice), I've given them domains. The subset/superset condition now applies within each domain independently, and we only need it to hold in one domain. Each time we inline, we add the new scopes in a new scope domain, and everything now composes nicely. In addition, this simplifies the implementation. llvm-svn: 213948	2014-07-25 15:50:02 +00:00
Hal Finkel	9414665a3b	Add scoped-noalias metadata This commit adds scoped noalias metadata. The primary motivations for this feature are: 1. To preserve noalias function attribute information when inlining 2. To provide the ability to model block-scope C99 restrict pointers Neither of these two abilities are added here, only the necessary infrastructure. In fact, there should be no change to existing functionality, only the addition of new features. The logic that converts noalias function parameters into this metadata during inlining will come in a follow-up commit. What is added here is the ability to generally specify noalias memory-access sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA nodes: !scope0 = metadata !{ metadata !"scope of foo()" } !scope1 = metadata !{ metadata !"scope 1", metadata !scope0 } !scope2 = metadata !{ metadata !"scope 2", metadata !scope0 } !scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 } !scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 } Loads and stores can be tagged with an alias-analysis scope, and also, with a noalias tag for a specific scope: ... = load %ptr1, !alias.scope !{ !scope1 } ... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 } When evaluating an aliasing query, if one of the instructions is associated with an alias.scope id that is identical to the noalias scope associated with the other instruction, or is a descendant (in the scope hierarchy) of the noalias scope associated with the other instruction, then the two memory accesses are assumed not to alias. Note that is the first element of the scope metadata is a string, then it can be combined accross functions and translation units. The string can be replaced by a self-reference to create globally unqiue scope identifiers. [Note: This overview is slightly stylized, since the metadata nodes really need to just be numbers (!0 instead of !scope0), and the scope lists are also global unnamed metadata.] Existing noalias metadata in a callee is "cloned" for use by the inlined code. This is necessary because the aliasing scopes are unique to each call site (because of possible control dependencies on the aliasing properties). For example, consider a function: foo(noalias a, noalias b) { a = b; } that gets inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } -- now just because we know that a1 does not alias with b1 at the first call site, and a2 does not alias with b2 at the second call site, we cannot let inlining these functons have the metadata imply that a1 does not alias with b2. llvm-svn: 213864	2014-07-24 14:25:39 +00:00
Hal Finkel	cc39b67530	AA metadata refactoring (introduce AAMDNodes) In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859	2014-07-24 12:16:19 +00:00
Hal Finkel	ccc7090671	Make use of the align parameter attribute for all pointer arguments We previously supported the align attribute on all (pointer) parameters, but we only used it for byval parameters. However, it is completely consistent at the IR level to treat 'align n' on all pointer parameters as an alignment assumption on the pointer, and now we wll. Specifically, this causes computeKnownBits to use the align attribute on all pointer parameters, not just byval parameters. I've also added an explicit parameter attribute test for this to test/Bitcode/attributes.ll. And I've updated the LangRef to document the align parameter attribute (as it turns out, it was not documented at all previously, although the byval documentation mentioned that it could be used). There are (at least) two benefits to doing this: - It allows enhancing alignment based on the pointer alignment after inlining callees. - It allows simplification of pointer arithmetic. llvm-svn: 213670	2014-07-22 16:58:55 +00:00
Hal Finkel	d32803b669	Match semantics of PointerMayBeCapturedBefore to its name by default As it turns out, the capture tracker named CaptureBefore used by AA, and now available via the PointerMayBeCapturedBefore function, would have been more-aptly named CapturedBeforeOrAt, because it considers captures at the instruction provided. This is not always what one wants, and it is difficult to get the strictly-before behavior given only the current interface. This adds an additional parameter which controls whether or not you want to include captures at the provided instruction. The default is not to include the instruction provided, so that 'Before' matches its name. No functionality change intended. llvm-svn: 213582	2014-07-21 21:30:22 +00:00
Duncan P. N. Exon Smith	6c99015fe2	Revert "[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges." This reverts commit r213474 (and r213475), which causes a miscompile on a stage2 LTO build. I'll reply on the list in a moment. llvm-svn: 213562	2014-07-21 17:06:51 +00:00
Hal Finkel	b035621720	Move the CapturesBefore tracker from AA into CaptureTracking There were two generally-useful CaptureTracker classes defined in LLVM: the simple tracker defined in CaptureTracking (and made available via the PointerMayBeCaptured utility function), and the CapturesBefore tracker available only inside of AA. This change moves the CapturesBefore tracker into CaptureTracking, generalizes it slightly (by adding a ReturnCaptures parameter), and makes it generally available via a PointerMayBeCapturedBefore utility function. This logic will be needed, for example, to perform noalias function parameter attribute inference. No functionality change intended. llvm-svn: 213519	2014-07-21 13:15:48 +00:00
Hal Finkel	c782aa5a9b	Move isIdentifiedFunctionLocal from BasicAA to AA The ability to identify function locals will exist outside of BasicAA (for example, logic for inferring noalias function arguments will need this), so make this concept generally accessible without code duplication. No functionality change. llvm-svn: 213514	2014-07-21 12:27:23 +00:00
Manuel Jacob	6d5cc8d656	Remove braces around single-statement block and rangify outer loop. This is a follow-up to r213474. llvm-svn: 213475	2014-07-20 09:20:47 +00:00
Manuel Jacob	d11beffef4	[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges. Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended. Test Plan: All tests (make check-all) still pass. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4481 llvm-svn: 213474	2014-07-20 09:10:11 +00:00
NAKAMURA Takumi	8eb82fc453	Fix msc17 build. RegionInfo::RegionInfo::recalculate() doesn't make sense. llvm-svn: 213466	2014-07-20 03:57:51 +00:00
NAKAMURA Takumi	118b0c789d	Fix -Asserts build introduced since r213456. llvm-svn: 213465	2014-07-20 00:00:42 +00:00
Matt Arsenault	1b8d83796d	Templatify RegionInfo so it works on MachineBasicBlocks llvm-svn: 213456	2014-07-19 18:29:29 +00:00
David Blaikie	b61064ed39	Remove uses of the redundant ".reset(nullptr)" of unique_ptr, in favor of ".reset()" It's also possible to just write "= nullptr", but there's some question of whether that's as readable, so I leave it up to authors to pick which they prefer for now. If we want to discuss standardizing on one or the other, we can do that at some point in the future. llvm-svn: 213438	2014-07-19 01:05:11 +00:00
Hal Finkel	b0407ba071	Add a dereferenceable attribute This attribute indicates that the parameter or return pointer is dereferenceable. Practically speaking, loads from such a pointer within the associated byte range are safe to speculatively execute. Such pointer parameters are common in source languages (C++ references, for example). llvm-svn: 213385	2014-07-18 15:51:28 +00:00
Suyog Sarda	1a212203bc	Rectify r213231. Use proper version of 'ComputeNumSignBits'. Earlier when the code was in InstCombine, we were calling the version of ComputeNumSignBits in InstCombine.h that automatically added the DataLayout* before calling into ValueTracking. When the code moved to InstSimplify, we are calling into ValueTracking directly without passing in the DataLayout*. This patch rectifies the same by passing DataLayout in ComputeNumSignBits. llvm-svn: 213295	2014-07-17 19:07:00 +00:00
Suyog Sarda	68862414b5	Move ashr optimization from InstCombineShift to InstSimplify. Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231	2014-07-17 06:28:15 +00:00
Hal Finkel	354e23b029	Improve BasicAA CS-CS queries (redux) This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219	2014-07-17 01:28:25 +00:00
Matt Arsenault	f1a7e62033	Teach computeKnownBits to look through addrspacecast. This fixes inferring alignment through an addrspacecast. llvm-svn: 213030	2014-07-15 01:55:03 +00:00
Matt Arsenault	70f4db88d5	Teach GetUnderlyingObject / BasicAA about addrspacecast llvm-svn: 213025	2014-07-15 00:56:40 +00:00
Nick Lewycky	7a63c3b389	Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303. llvm-svn: 213024	2014-07-15 00:53:38 +00:00
Matt Arsenault	caa9c71f32	Look through addrspacecast in IsConstantOffsetFromGlobal llvm-svn: 213000	2014-07-14 22:39:26 +00:00
Matt Arsenault	fd78d0c934	Look through addrspacecast in GetPointerBaseWithConstantOffset llvm-svn: 212999	2014-07-14 22:39:22 +00:00
David Majnemer	af9180fd04	InstSimplify: Correct sdiv x / -1 Determining the bounds of x/ -1 would start off with us dividing it by INT_MIN. Suffice to say, this would not work very well. Instead, handle it upfront by checking for -1 and mapping it to the range: [INT_MIN + 1, INT_MAX. This means that the result of our division can be any value other than INT_MIN. llvm-svn: 212981	2014-07-14 20:38:45 +00:00
David Majnemer	5ea4fc0b33	InstSimplify: The upper bound of X / C was missing a rounding step Summary: When calculating the upper bound of X / -8589934592, we would perform the following calculation: Floor[INT_MAX / 8589934592] However, flooring the result would make us wrongly come to the conclusion that 1073741824 was not in the set of possible values. Instead, use the ceiling of the result. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4502 llvm-svn: 212976	2014-07-14 19:49:57 +00:00
Matt Arsenault	4181ea36a9	Templatify DominanceFrontier. Theoretically this should now work for MachineBasicBlocks. llvm-svn: 212885	2014-07-12 21:59:52 +00:00
Duncan P. N. Exon Smith	6075510839	BFI: Add constructor for Weight llvm-svn: 212868	2014-07-12 00:26:00 +00:00
Duncan P. N. Exon Smith	345c287da9	BFI: Clean up BlockMass Implementation is small now -- the interesting logic was moved to `BranchProbability` a while ago. Move it into `bfi_detail` and get rid of the related TODOs. I was originally planning to define it within `BlockFrequencyInfoImpl` (or `BFIIBase`), but it seems cleaner in a namespace. Besides, `isPodLike` needs to be specialized before `BlockMass` can be used in some of the other data structures, and there isn't a clear way to do that. llvm-svn: 212866	2014-07-12 00:21:30 +00:00
Duncan P. N. Exon Smith	b5650e5eae	BFI: Mark the end of namespaces llvm-svn: 212861	2014-07-11 23:56:50 +00:00
Hal Finkel	2e42c34d05	Allow isDereferenceablePointer to look through some bitcasts isDereferenceablePointer should not give up upon encountering any bitcast. If we're casting from a pointer to a larger type to a pointer to a small type, we can continue by examining the bitcast's operand. This missing capability was noted in a comment in the function. In order for this to work, isDereferenceablePointer now takes an optional DataLayout pointer (essentially all callers already had such a pointer available). Most code uses isDereferenceablePointer though isSafeToSpeculativelyExecute (which already took an optional DataLayout pointer), and to enable the LICM test case, LICM needs to actually provide its DL pointer to isSafeToSpeculativelyExecute (which it was not doing previously). llvm-svn: 212686	2014-07-10 05:27:53 +00:00
Hal Finkel	8ae0f8d618	Improve BasicAA CS-CS queries BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 212572	2014-07-08 23:16:49 +00:00
Sanjay Patel	784a5a41e7	fixed typos in comments llvm-svn: 212424	2014-07-06 23:24:53 +00:00
David Majnemer	651ed5e8fd	InstSimplify: Fix a bug when INT_MIN is in a sdiv When INT_MIN is the numerator in a sdiv, we would not properly handle overflow when calculating the bounds of possible values; abs(INT_MIN) is not a meaningful number. Instead, check and handle INT_MIN by reasoning that the largest value is INT_MIN/-2 and the smallest value is INT_MIN. This fixes PR20199. llvm-svn: 212307	2014-07-04 00:23:39 +00:00
Andrea Di Biagio	c8e8bda58f	[CostModel][x86] Improved cost model for alternate shuffles. This patch: 1) Improves the cost model for x86 alternate shuffles (originally added at revision 211339); 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles. Alternate shuffles are a special kind of blend; on x86, we can often easily lowered alternate shuffled into single blend instruction (depending on the subtarget features). The existing cost model didn't take into account subtarget features. Also, it had a couple of "dead" entries for vector types that are never legal (example: on x86 types v2i32 and v2f32 are not legal; those are always either promoted or widened to 128-bit vector types). The new x86 cost model takes into account what target features we have before returning the shuffle cost (i.e. the number of instructions after the blend is lowered/expanded). This patch also teaches the Cost Model Analysis how to identify and analyze alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions): - added function 'isAlternateVectorMask'; - added some logic to check if an instruction is a alternate shuffle and, in case, call the target specific TTI to get the corresponding shuffle cost; - added a test to verify the cost model analysis on alternate shuffles. llvm-svn: 212296	2014-07-03 22:24:18 +00:00
Richard Trieu	f2a795241a	Add new lines to debugging information. Differential Revision: http://reviews.llvm.org/D4262 llvm-svn: 212250	2014-07-03 02:11:49 +00:00
Gerolf Hoflehner	734f4c8984	Suppress inlining when the block address is taken Inlining functions with block addresses can cause many problem and requires a rich infrastructure to support including escape analysis. At this point the safest approach to address these problems is by blocking inlining from happening. Background: There have been reports on Ruby segmentation faults triggered by inlining functions with block addresses like //Ruby code snippet vm_exec_core() { finish_insn_seq_0 = &&INSN_LABEL_finish; INSN_LABEL_finish: ; } This kind of scenario can also happen when LLVM picks a subset of blocks for inlining, which is the case with the actual code in the Ruby environment. LLVM suppresses inlining for such functions when there is an indirect branch. The attached patch does so even when there is no indirect branch. Note that user code like above would not make much sense: using the global for jumping across function boundaries would be illegal. Why was there a segfault: In the snipped above the block with the label is recognized as dead So it is eliminated. Instead of a block address the cloner stores a constant (sic!) into the global resulting in the segfault (when the global is used in a goto). Why had it worked in the past then: By luck. In older versions vm_exec_core was also inlined but the label address used was the block label address in vm_exec_core. So the global jump ended up in the original function rather than in the caller which accidentally happened to work. Test case ./tools/clang/test/CodeGen/indirect-goto.c will fail as a result of this commit. rdar://17245966 llvm-svn: 212077	2014-07-01 00:19:34 +00:00
Alp Toker	e69170a110	Revert "Introduce a string_ostream string builder facilty" Temporarily back out commits r211749, r211752 and r211754. llvm-svn: 211814	2014-06-26 22:52:05 +00:00
Dinesh Dwivedi	99281a0615	This patch removed duplicate code for matching patterns which are now handled in SimplifyUsingDistributiveLaws() (after r211261) Differential Revision: http://reviews.llvm.org/D4253 llvm-svn: 211768	2014-06-26 08:57:33 +00:00
Alp Toker	2251672878	MSVC build fix following r211749 Avoid strndup() llvm-svn: 211752	2014-06-26 00:25:41 +00:00
Alp Toker	614717388c	Introduce a string_ostream string builder facilty string_ostream is a safe and efficient string builder that combines opaque stack storage with a built-in ostream interface. small_string_ostream<bytes> additionally permits an explicit stack storage size other than the default 128 bytes to be provided. Beyond that, storage is transferred to the heap. This convenient class can be used in most places an std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair would previously have been used, in order to guarantee consistent access without byte truncation. The patch also converts much of LLVM to use the new facility. These changes include several probable bug fixes for truncated output, a programming error that's no longer possible with the new interface. llvm-svn: 211749	2014-06-26 00:00:48 +00:00
Duncan P. N. Exon Smith	84553d8f1f	Support: Move class ScaledNumber ScaledNumber has been cleaned up enough to pull out of BFI now. Still work to do there (tests for shifting, bloated printing code, etc.), but it seems clean enough for its new home. llvm-svn: 211562	2014-06-24 00:38:09 +00:00
Duncan P. N. Exon Smith	beaf813dd4	BFI: Un-floatify more language llvm-svn: 211561	2014-06-24 00:26:13 +00:00
Duncan P. N. Exon Smith	e488c4a835	Support: Extract ScaledNumbers::MinScale and MaxScale llvm-svn: 211558	2014-06-24 00:15:19 +00:00
Duncan P. N. Exon Smith	b6bbd3f569	BFI: Change language from "exponent" to "scale" llvm-svn: 211557	2014-06-23 23:57:12 +00:00
Duncan P. N. Exon Smith	c379c87a78	BFI: Rename UnsignedFloat => ScaledNumber A lot of the docs and API are out of date, but I'll leave that for a separate commit. llvm-svn: 211555	2014-06-23 23:36:17 +00:00
Benjamin Kramer	8dd637aa04	SCEVExpander: Fold constant PHIs harder. The logic below only understands proper IVs. PR20093. llvm-svn: 211433	2014-06-21 11:47:18 +00:00
Richard Trieu	c1485223a6	Add back functionality removed in r210497. Instead of asserting, output a message stating that a null pointer was found. llvm-svn: 211430	2014-06-21 02:43:02 +00:00
Duncan P. N. Exon Smith	411840d963	Support: Write ScaledNumber::getQuotient() and getProduct() llvm-svn: 211409	2014-06-20 21:47:47 +00:00
Jingyue Wu	37fcb5919d	[ValueTracking] Extend range metadata to call/invoke Summary: With this patch, range metadata can be added to call/invoke including IntrinsicInst. Previously, it could only be added to load. Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because range metadata is not only used by load. Update the language reference to reflect this change. Test Plan: Add several tests in range-2.ll to confirm the verifier is happy with having range metadata on call/invoke. Add two tests in AddOverFlow.ll to confirm annotating range metadata to call/invoke can benefit InstCombine. Reviewers: meheff, nlewycky, reames, hfinkel, eliben Reviewed By: eliben Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4187 llvm-svn: 211281	2014-06-19 16:50:16 +00:00
Nick Lewycky	8561a49c27	Move optimization of some cases of (A & C1)\|(B & C2) from instcombine to instsimplify. Patch by Rahul Jain, plus some last minute changes by me -- you can blame me for any bugs. llvm-svn: 211252	2014-06-19 03:51:46 +00:00
Nick Lewycky	c961030ac2	Make instsimplify's analysis of icmp eq/ne use computeKnownBits to determine whether the icmp is always true or false. Patch by Suyog Sarda! llvm-svn: 211251	2014-06-19 03:35:49 +00:00
Richard Trieu	a23043cb9c	Removing an "if (!this)" check from two print methods. The condition will never be true in a well-defined context. The checking for null pointers has been moved into the caller logic so it does not rely on undefined behavior. llvm-svn: 210497	2014-06-09 22:53:16 +00:00
Alp Toker	51420a8d62	Remove old fenv.h workaround for a historic clang driver bug Tested and works fine with clang using libstdc++. All indications are that this was fixed some time ago and isn't a problem with any clang version we support. I've added a note in PR6907 which is still open for some reason. llvm-svn: 210485	2014-06-09 19:00:52 +00:00
Alp Toker	c817d6a5b5	Fold FEnv.h into the implementation Support headers shouldn't use config.h definitions, and they should never be undefined like this. ConstantFolding.cpp was the only user of this facility and already includes config.h for other math features, so it makes sense to move the checks there at point of use. (The implicit config.h was also quite dangerous -- removing the FEnv.h include would have silently disabled math constant folding without causing any tests to fail. Need to investigate -Wundef once the cleanup is done.) This eliminates the last config.h include from LLVM headers, paving the way for more consistent configuration checks. llvm-svn: 210483	2014-06-09 18:28:53 +00:00
Tobias Grosser	40ac10085a	ScalarEvolution: Derive element size from the type of the loaded element Before, we where looking at the size of the pointer type that specifies the location from which to load the element. This did not make any sense at all. This change fixes a bug in the delinearization where we failed to delinerize certain load instructions. llvm-svn: 210435	2014-06-08 19:21:20 +00:00
Tom Roeder	44cb65fff1	Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute. It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables. This also adds backend support for generating the jump-instruction tables on ARM and X86. Note that since the jumptable attribute creates a second function pointer for a function, any function marked with jumptable must also be marked with unnamed_addr. llvm-svn: 210280	2014-06-05 19:29:43 +00:00
Rafael Espindola	78598d9ab5	Add a Constant version of stripPointerCasts. Thanks to rnk for the suggestion. llvm-svn: 210205	2014-06-04 19:01:48 +00:00
Sebastian Pop	20daf3276d	implement missing SCEVDivision case without this case we would end on an infinite recursion: the remainder is zero, so Numerator - Remainder is equal to Numerator and so we would recursively ask for the division of Numerator by Denominator. llvm-svn: 209838	2014-05-29 19:44:09 +00:00
Sebastian Pop	5352408169	fail to find dimensions when ElementSize is nullptr when ScalarEvolution::getElementSize returns nullptr it is safe to early return in ScalarEvolution::findArrayDimensions such that we avoid later problems when we try to divide the terms by ElementSize. llvm-svn: 209837	2014-05-29 19:44:05 +00:00
Sanjay Patel	26b6edcf44	test check-in: added missing parenthesis in comment llvm-svn: 209763	2014-05-28 19:03:33 +00:00
Sebastian Pop	f93ef12330	avoid type mismatch when building SCEVs This is a corner case I have stumbled upon when dealing with ARM64 type conversions. I was not able to extract a testcase for the community codebase to fail on. The patch conservatively discards a division that would have ended up in an ICE due to a type mismatch when building a multiply expression. I have also added code to a place that builds add expressions and in which we should be careful not to pass in operands of different types. llvm-svn: 209694	2014-05-27 22:42:00 +00:00
Sebastian Pop	e30bd351cc	do not use the GCD to compute the delinearization strides We do not need to compute the GCD anymore after we removed the constant coefficients from the terms: the terms are now all parametric expressions and there is no need to recognize constant terms that divide only a subset of the terms. We only rely on the size of the terms, i.e., the number of operands in the multiply expressions, to sort the terms and recognize the parametric dimensions. llvm-svn: 209693	2014-05-27 22:41:56 +00:00
Sebastian Pop	28e6b97b5d	remove BasePointer before delinearizing No functional change is intended: instead of relying on the delinearization to come up with the base pointer as a remainder of the divisions in the delinearization, we just compute it from the array access and use that value. We substract the base pointer from the SCEV to be delinearized and that simplifies the work of the delinearizer. llvm-svn: 209692	2014-05-27 22:41:51 +00:00
Sebastian Pop	a6e5860513	remove constant terms The delinearization is needed only to remove the non linearity induced by expressions involving multiplications of parameters and induction variables. There is no problem in dealing with constant times parameters, or constant times an induction variable. For this reason, the current patch discards all constant terms and multipliers before running the delinearization algorithm on the terms. The only thing remaining in the term expressions are parameters and multiply expressions of parameters: these simplified term expressions are passed to the array shape recognizer that will not recognize constant dimensions anymore: these will be recognized as different strides in parametric subscripts. The only important special case of a constant dimension is the size of elements. Instead of relying on the delinearization to infer the size of an element, compute the element size from the base address type. This is a much more precise way of computing the element size than before, as we would have mixed together the size of an element with the strides of the innermost dimension. llvm-svn: 209691	2014-05-27 22:41:45 +00:00
Michael Zolotukhin	265dfa411c	Some cleanup for r209568. llvm-svn: 209634	2014-05-26 14:49:46 +00:00
Michael Zolotukhin	d4c724625a	Implement sext(C1 + C2X) --> sext(C1) + sext(C2X) and sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar Evolution. That helps SLP-vectorizer to recognize consecutive loads/stores. <rdar://problem/14860614> llvm-svn: 209568	2014-05-24 08:09:57 +00:00
Andrew Trick	839e30b2c0	Fix and improve SCEV ComputeBackedgeTankCount. This is a follow-up to r209358: PR19799: Indvars miscompile due to an incorrect max backedge taken count from SCEV. That fix was incomplete as pointed out by Arnold and Michael Z. The code was also too confusing. It needed a careful rewrite with more unit tests. This version will also happen to optimize more cases. <rdar://17005101> PR19799: Indvars miscompile... llvm-svn: 209545	2014-05-23 19:47:13 +00:00
Justin Bogner	cbb8438bb3	ScalarEvolution: Fix handling of AddRecs in isKnownPredicate ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison when both the LHS and RHS are SCEVAddRecExprs. This checks that both LHS and RHS are guarded in the case when both are SCEVAddRecExprs. The test case is against indvars because I could not find a way to directly test SCEV. Patch by Sanjay Patel! llvm-svn: 209487	2014-05-23 00:06:56 +00:00
Andrew Trick	e255359b57	Fix a bug in SCEV's backedge taken count computation from my prior fix in Jan. This has to do with the trip count computation for loops with multiple exits, which is quite subtle. Most passes just ask for a single trip count number, so we must be conservative assuming any exit could be taken. Normally, we rely on the "exact" trip count, which was correctly given as "unknown". However, SCEV also gives a "max" back-edge taken count. The loops max BE taken count is conservatively a maximum over the max of each exit's non-exiting iterations count. Note that some exit tests can be skipped so the max loop back-edge taken count can actually exceed the max non-exiting iterations for some exits. However, when we know the loop latch cannot be skipped, we can directly use its max taken count disregarding other exits. I previously took the minimum here without checking whether the other exit could be skipped. The correct, and simpler thing to do here is just to directly use the loop latch's max non-exiting iterations as the loops max back-edge count. In the problematic test case, the first loop exit had a max of zero non-exiting iterations, but could be skipped. The loop latch was known not to be skipped but had max of one non-exiting iteration. We incorrectly claimed the loop back-edge could be taken zero times, when it is actually taken one time. Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count. Loop %for.body.i: max backedge-taken count is 1. llvm-svn: 209358	2014-05-22 00:37:03 +00:00
Eric Christopher	650c8f2a06	Clean up language and grammar. Based on a patch by jfcaron3@gmail.com! PR19806 llvm-svn: 209216	2014-05-20 17:11:11 +00:00
Nick Lewycky	ec373545b8	Teach isKnownNonNull that a nonnull return is not null. Add a test for this case as well as the case of a nonnull attribute (already handled but not tested). llvm-svn: 209193	2014-05-20 05:13:21 +00:00
Nick Lewycky	d52b1528c0	Add 'nonnull', a new parameter and return attribute which indicates that the pointer is not null. Instcombine will elide comparisons between these and null. Patch by Luqman Aden! llvm-svn: 209185	2014-05-20 01:23:40 +00:00
Peter Collingbourne	68a889757d	Check the alwaysinline attribute on the call as well as on the caller. Differential Revision: http://reviews.llvm.org/D3815 llvm-svn: 209150	2014-05-19 18:25:54 +00:00
David Majnemer	78910fc4da	InstSimplify: Improve handling of ashr/lshr Summary: Analyze the range of values produced by ashr/lshr cst, %V when it is being used in an icmp. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3774 llvm-svn: 209000	2014-05-16 17:14:03 +00:00
David Majnemer	ea8d5dbf24	InstSimplify: Optimize using dividend in sdiv Summary: The dividend in an sdiv tells us the largest and smallest possible results. Use this fact to optimize comparisons against an sdiv with a constant dividend. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3795 llvm-svn: 208999	2014-05-16 16:57:04 +00:00
Juergen Ributzka	34390c70a5	Add C API for thread yielding callback. Sometimes a LLVM compilation may take more time then a client would like to wait for. The problem is that it is not possible to safely suspend the LLVM thread from the outside. When the timing is bad it might be possible that the LLVM thread holds a global mutex and this would block any progress in any other thread. This commit adds a new yield callback function that can be registered with a context. LLVM will try to yield by calling this callback function, but there is no guaranteed frequency. LLVM will only do so if it can guarantee that suspending the thread won't block any forward progress in other LLVM contexts in the same process. Once the client receives the call back it can suspend the thread safely and resume it at another time. Related to <rdar://problem/16728690> llvm-svn: 208945	2014-05-16 02:33:15 +00:00
Jay Foad	5a29c367f7	Instead of littering asserts throughout the code after every call to computeKnownBits, consolidate them into one assert at the end of computeKnownBits itself. llvm-svn: 208876	2014-05-15 12:12:55 +00:00
Chandler Carruth	a0e5695ad9	Teach the constant folder to look through bitcast constant expressions much more effectively when trying to constant fold a load of a constant. Previously, we only handled bitcasts by trying to find a totally generic byte representation of the constant and use that. Now, we look through the bitcast to see what constant we might fold the load into, and then try to form a constant expression cast of the found value that would be equivalent to loading the value. You might wonder why on earth this actually matters. Well, turns out that the Itanium ABI causes us to create a single array for a vtable where the first elements are virtual base offsets, followed by the virtual function pointers. Because the array is homogenous the element type is consistently i8* and we inttoptr the virtual base offsets into the initial elements. Then constructors bitcast these pointers to i64 pointers prior to loading them. Boom, no more constant folding of virtual base offsets. This is the first fix to LLVM to address the insane performance Eric Niebler discovered with Clang on his range comprehensions[1]. There is more to come though, this doesn't really fix the problem fully. [1]: http://ericniebler.com/2014/04/27/range-comprehensions/ llvm-svn: 208856	2014-05-15 09:56:28 +00:00
Alp Toker	beaca19c7c	Fix typos llvm-svn: 208839	2014-05-15 01:52:21 +00:00
Jay Foad	a0653a3e6c	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
David Majnemer	2d6c023576	InstSimplify: Optimize signed icmp of -(zext V) Summary: We know that -(zext V) will always be <= zero, simplify signed icmps that have these. Uncovered using http://www.cs.utah.edu/~regehr/souper/ Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3754 llvm-svn: 208809	2014-05-14 20:16:28 +00:00
Jay Foad	e48d9e8efe	Update the comments for ComputeMaskedBits, which lost its Mask parameter in r154011. llvm-svn: 208757	2014-05-14 08:00:07 +00:00
Sebastian Pop	05719e486f	use nullptr instead of NULL llvm-svn: 208622	2014-05-12 20:11:01 +00:00
Sebastian Pop	b1a548f72d	do not assert when delinearization fails llvm-svn: 208615	2014-05-12 19:01:53 +00:00
Sebastian Pop	0e75c5cb64	use isZero() llvm-svn: 208614	2014-05-12 19:01:49 +00:00
Benjamin Kramer	8cff45aa20	SCEV: Use range-based for loop and fold variable into assert. llvm-svn: 208476	2014-05-10 17:47:18 +00:00
Sebastian Pop	47fe7de1b5	move findArrayDimensions to ScalarEvolution we do not use the information from SCEVAddRecExpr to compute the shape of the array, so a better place for this function is in ScalarEvolution. llvm-svn: 208456	2014-05-09 22:45:07 +00:00
Sebastian Pop	444621abe1	fix typo in debug message llvm-svn: 208455	2014-05-09 22:45:02 +00:00
Tobias Grosser	1e9db7e639	Correct formatting. Sorry for the commit spam. My clang-format crashed on me and the vim plugin did not print an error, but instead just left the formatting untouched. llvm-svn: 208358	2014-05-08 21:43:19 +00:00
Tobias Grosser	b2101c3ca5	Use std::remove_if to remove elements from a vector Suggested-by: Benjamin Kramer <benny.kra@gmail.com> llvm-svn: 208357	2014-05-08 21:32:59 +00:00
Rafael Espindola	0c4eea74d7	Use a range loop. llvm-svn: 208343	2014-05-08 17:57:50 +00:00
Tobias Grosser	3080cf16a5	Revert "SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time" as committed in r208282. The original commit was incorrect. llvm-svn: 208286	2014-05-08 07:55:34 +00:00
Tobias Grosser	ecfe9d06eb	SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time llvm-svn: 208282	2014-05-08 07:12:44 +00:00
Sebastian Pop	b8d56f42b7	avoid segfaulting Quotient and Remainder don't have to be initialized. llvm-svn: 208238	2014-05-07 19:00:37 +00:00
Sebastian Pop	a7d3d6ab9f	do not collect undef terms llvm-svn: 208237	2014-05-07 19:00:32 +00:00
Sebastian Pop	448712b1a6	split delinearization pass in 3 steps To compute the dimensions of the array in a unique way, we split the delinearization analysis in three steps: - find parametric terms in all memory access functions - compute the array dimensions from the set of terms - compute the delinearized access functions for each dimension The first step is executed on all the memory access functions such that we gather all the patterns in which an array is accessed. The second step reduces all this information in a unique description of the sizes of the array. The third step is delinearizing each memory access function following the common description of the shape of the array computed in step 2. This rewrite of the delinearization pass also solves a problem we had with the previous implementation: because the previous algorithm was by induction on the structure of the SCEV, it would not correctly recognize the shape of the array when the memory access was not following the nesting of the loops: for example, see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll ; void foo(long n, long m, long o, double A[n][m][o]) { ; ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; for (long k = 0; k < o; k++) ; A[i][k][j] = 1.0; Starting with this patch we no longer delinearize access functions that do not contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll ;; for (long int i = 0; i < 100; i++) ;; for (long int j = 0; j < 100; j++) { ;; A[2i - 4j] = i; ;; B++ = A[6i + 8*j]; these accesses will not be delinearized as the upper bound of the loops are constants, and their access functions do not contain SCEVUnknown parameters. llvm-svn: 208232	2014-05-07 18:01:20 +00:00
Tobias Grosser	924221cb37	[C++11] Add NArySCEV->Operands iterator range llvm-svn: 208158	2014-05-07 06:07:47 +00:00
Duncan P. N. Exon Smith	87c40fdfdb	blockfreq: Move include to .cpp llvm-svn: 208035	2014-05-06 01:57:42 +00:00
Chandler Carruth	312dddfb81	[LCG] Add the last (and most complex) of the edge insertion mutation operations on the call graph. This one forms a cycle, and while not as complex as removing an internal edge from an SCC, it involves a reasonable amount of work to find all of the nodes newly connected in a cycle. Also somewhat alarming is the worst case complexity here: it might have to walk roughly the entire SCC inverse DAG to insert a single edge. This is carefully documented in the API (I hope). llvm-svn: 207935	2014-05-04 09:38:32 +00:00
Juergen Ributzka	d35c114d15	[TBAA] Fix handling of mixed TBAA (path-aware and non-path-aware TBAA). This fix simply ensures that both metadata nodes are path-aware before performing path-aware alias analysis. This issue isn't normally triggered in LLVM, because we perform an autoupgrade of the TBAA metadata to the new format when reading in LL or BC files. This issue only appears when a client creates the IR manually and mixes old and new TBAA metadata format. This fixes <rdar://problem/16760860>. llvm-svn: 207923	2014-05-03 22:32:52 +00:00
Chandler Carruth	7cc4ed8202	[LCG] Add the other simple edge insertion API to the call graph. This just connects an SCC to one of its descendants directly. Not much of an impact. The last one is the hard one -- connecting an SCC to one of its ancestors, and thereby forming a cycle such that we have to merge all the SCCs participating in the cycle. llvm-svn: 207751	2014-05-01 12:18:20 +00:00
Chandler Carruth	034d0d6805	[LCG] Don't lookup the child SCC twice. Spotted this by inspection, and no functionality changed. llvm-svn: 207750	2014-05-01 12:16:31 +00:00
Chandler Carruth	4b096741b4	[LCG] Add some basic methods for querying the parent/child relationships of SCCs in the SCC DAG. Exercise them in the big graph test case. These will be especially useful for establishing invariants in insertion logic. llvm-svn: 207749	2014-05-01 12:12:42 +00:00
Chandler Carruth	5217c94522	[LCG] Add the really, really boring edge insertion case: adding an edge entirely within an existing SCC. Shockingly, making the connected component more connected is ... a total snooze fest. =] Anyways, its wired up, and I even added a test case to make sure it pretty much sorta works. =D llvm-svn: 207631	2014-04-30 10:48:36 +00:00
Chandler Carruth	c5026b670e	[LCG] Actually test the basic edge removal bits (IE, the non-SCC bits), and discover that it's totally broken. Yay tests. Boo bug. Fix the basic edge removal so that it works by nulling out the removed edges rather than actually removing them. This leaves the indices valid in the map from callee to index, and preserves some of the locality for iterating over edges. The iterator is made bidirectional to reflect that it now has to skip over null entries, and the skipping logic is layered onto it. As future work, I would like to track essentially the "load factor" of the edge list, and when it falls below a threshold do a compaction. An alternative I considered (and continue to consider) is storing the callees in a doubly linked list where each element of the list is in a set (which is essentially the classical linked-hash-table datastructure). The problem with that approach is that either you need to heap allocate the linked list nodes and use pointers to them, or use a bucket hash table (with even more linked list pointer overhead!), etc. It's pretty easy to get 5x overhead for values that are just pointers. So far, I think punching holes in the vector, and periodic compaction is likely to be much more efficient overall in the space/time tradeoff. llvm-svn: 207619	2014-04-30 07:45:27 +00:00
Benjamin Kramer	d59664f4f7	raw_ostream: Forward declare OpenFlags and include FileSystem.h only where necessary. llvm-svn: 207593	2014-04-29 23:26:49 +00:00
Duncan P. N. Exon Smith	d22bea7dad	blockfreq: Defer to BranchProbability::scale() `BlockMass` can now defer to `BranchProbability::scale()`. llvm-svn: 207547	2014-04-29 16:20:05 +00:00
Duncan P. N. Exon Smith	295b5e7481	blockfreq: Remove more extra typenames from r207438 llvm-svn: 207440	2014-04-28 20:22:29 +00:00
Duncan P. N. Exon Smith	c5a3139ebd	Reapply "blockfreq: Approximate irreducible control flow" This reverts commit r207287, reapplying r207286. I'm hoping that declaring an explicit struct and instantiating `addBlockEdges()` directly works around the GCC crash from r207286. This is a lot more boilerplate, though. llvm-svn: 207438	2014-04-28 20:02:29 +00:00
Chandler Carruth	c00a7ff4b7	[LCG] Add the most basic of edge insertion to the lazy call graph. This just handles the pre-DFS case. Also add some test cases for this case to make sure it works. llvm-svn: 207411	2014-04-28 11:10:23 +00:00
Chandler Carruth	3f5f5fe164	[LCG] Make the return of the IntraSCC removal method actually match its contract (and be much more useful). It now provides exactly the post-order traversal a caller might need to perform on newly formed SCCs. llvm-svn: 207410	2014-04-28 10:49:06 +00:00
Chandler Carruth	e01fd5f63a	[inliner] Significantly improve the compile time in cases like PR19499 by avoiding inlining massive switches merely because they have no instructions in them. These switches still show up where we fail to form lookup tables, and in those cases they are actually going to cause a very significant code size hit anyways, so inlining them is not the right call. The right way to fix any performance regressions stemming from this is to enhance the switch-to-lookup-table logic to fire in more places. This makes PR19499 about 5x less bad. It uncovers a second compile time problem in that test case that is unrelated (surprisingly!). llvm-svn: 207403	2014-04-28 08:52:44 +00:00
Craig Topper	e73658ddbb	[C++] Use 'nullptr'. llvm-svn: 207394	2014-04-28 04:05:08 +00:00
Chandler Carruth	aa839b22c9	[LCG] Re-organize the methods for mutating a call graph to make their API requirements much more obvious. The key here is that there are two totally different use cases for mutating the graph. Prior to doing any SCC formation, it is very easy to mutate the graph. There may be users that want to do small tweaks here, and then use the already-built graph for their SCC-based operations. This method remains on the graph itself and is documented carefully as being cheap but unavailable once SCCs are formed. Once SCCs are formed, and there is some in-flight DFS building them, we have to be much more careful in how we mutate the graph. These mutation operations are sunk onto the SCCs themselves, which both simplifies things (the code was already there!) and helps make it obvious that these interfaces are only applicable within that context. The other primary constraint is that the edge being mutated is actually related to the SCC on which we call the method. This helps make it obvious that you cannot arbitrarily mutate some other SCC. I've tried to write much more complete documentation for the interesting mutation API -- intra-SCC edge removal. Currently one aspect of this documentation is a lie (the result list of SCCs) but we also don't even have tests for that API. =[ I'm going to add tests and fix it to match the documentation next. llvm-svn: 207339	2014-04-27 01:59:50 +00:00
Chandler Carruth	90821c2a93	[LCG] Rather than removing nodes from the SCC entry set when we process them, just skip over any DFS-numbered nodes when finding the next root of a DFS. This allows the entry set to just be a vector as we populate it from a uniqued source. It also removes the possibility for a linear scan of the entry set to actually do the removal which can make things go quadratic if we get unlucky. llvm-svn: 207312	2014-04-26 09:45:55 +00:00
Chandler Carruth	5e2d70b9a3	[LCG] Rotate the full SCC finding algorithm to avoid round-trips through the DFS stack for leaves in the call graph. As mentioned in my previous commit, this is particularly interesting for graphs which have high fan out but low connectivity resulting in many leaves. For such graphs, this can remove a large % of the DFS stack traffic even though it doesn't make the stack much smaller. It's a bit easier to formulate this for the full algorithm because that one stops completely for each SCC. For example, I was able to directly eliminate the "Recurse" boolean used to continue an outer loop from the inner loop. llvm-svn: 207311	2014-04-26 09:28:00 +00:00
Chandler Carruth	aca48d0443	[LCG] Hoist the main DFS loop out of the edge removal function. This makes working through the worklist much cleaner, and makes it possible to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge difference, but I think this is approaching as polished as I can make it. llvm-svn: 207310	2014-04-26 09:06:53 +00:00
Chandler Carruth	680af7a78c	[LCG] In the incremental SCC re-formation, lift the node currently being processed in the DFS out of the stack completely. Keep it exclusively in a variable. Re-shuffle some code structure to make this easier. This can have a very dramatic effect in some cases because call graphs tend to look like a high fan-out spanning tree. As a consequence, there are a large number of leaf nodes in the graph, and this technique causes leaf nodes to never even go into the stack. While this only reduces the max depth by 1, it may cause the total number of round trips through the stack to drop by a lot. Now, most of this isn't really relevant for the incremental version. =] But I wanted to prototype it first here as this variant is in ways more complex. As long as I can get the code factored well here, I'll next make the primary walk look the same. There are several refactorings this exposes I think. llvm-svn: 207306	2014-04-26 03:36:42 +00:00
Chandler Carruth	a7205b6154	[LCG] Special case the removal of self edges. These don't impact the SCC graph in any way because we don't track edges in the SCC graph, just nodes. This also lets us add a nice assert about the invariant that we're working on at least a certain number of nodes within the SCC. llvm-svn: 207305	2014-04-26 03:36:37 +00:00
Chandler Carruth	8f92d6db22	[LCG] Refactor the duplicated code I added in my last commit here into a helper function. Also factor the other two places where we did the same thing into the helper function. =] Much cleaner this way. NFC. llvm-svn: 207300	2014-04-26 01:03:46 +00:00
Duncan P. N. Exon Smith	42292ceaa9	Revert "blockfreq: Approximate irreducible control flow" This reverts commit r207286. It causes an ICE on the cmake-llvm-x86_64-linux buildbot [1]: llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function: llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035 [1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio llvm-svn: 207287	2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith	384d0e8ad4	blockfreq: Approximate irreducible control flow Previously, irreducible backedges were ignored. With this commit, irreducible SCCs are discovered on the fly, and modelled as loops with multiple headers. This approximation specifies the headers of irreducible sub-SCCs as its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if we insert a new block that intercepts all the edges to the headers. All backedges and entries to the irreducible SCC point to this imaginary block. This imaginary block has an edge (with even probability) to each header block. The result is now reasonable enough that I've added a number of testcases for irreducible control flow. I've outlined in `BlockFrequencyInfoImpl.h` ways to improve the approximation. <rdar://problem/14292693> llvm-svn: 207286	2014-04-25 23:08:57 +00:00
Duncan P. N. Exon Smith	da5eaeda01	blockfreq: Further shift logic to LoopData Move a lot of the loop-related logic that was sprinkled around the code into `LoopData`. <rdar://problem/14292693> llvm-svn: 207258	2014-04-25 18:47:04 +00:00
Duncan P. N. Exon Smith	d2b2facb07	SCC: Change clients to use const, NFC It's fishy to be changing the `std::vector<>` owned by the iterator, and no one actual does it, so I'm going to remove the ability in a subsequent commit. First, update the users. <rdar://problem/14292693> llvm-svn: 207252	2014-04-25 18:24:50 +00:00
Chandler Carruth	9ba7762d7f	[LCG] During the incremental update of an SCC, switch to using the SCCMap to test for nodes that have been re-added to the root SCC rather than a set vector. We already have done the SCCMap lookup, we juts need to test it in two different ways. In turn, do most of the processing of these nodes as they go into the root SCC rather than lazily. This simplifies the final loop to just stitch the root SCC into its children's parent sets. No functionlatiy changed. However, this makes a few things painfully obvious, which was my intent. =] There is tons of repeated code introduced here and elsewhere. I'm splitting the refactoring of that code into helpers from this change so its clear that this is the change which switches the datastructures used around, and the other is a pure factoring & deduplication of code change. llvm-svn: 207217	2014-04-25 09:52:44 +00:00
Chandler Carruth	2e6ef0e80f	[LCG] During the incremental re-build of an SCC after removing an edge, remove the nodes in the SCC from the SCC map entirely prior to the DFS walk. This allows the SCC map to represent both the state of not-yet-re-added-to-an-SCC and added-back-to-this-SCC independently. The first is being missing from the SCC map, the second is mapping back to 'this'. In a subsequent commit, I'm going to use this property to simplify the new node list for this SCC. In theory, I think this also makes the contract for orphaning a node from the graph slightly less confusing. Now it is also orphaned from the SCC graph. Still, this isn't quite right either, and so I'm not adding test cases here. I'll add test cases for the behavior of orphaning nodes when the code actually supports it. The change here is mostly incidental, my goal is simplifying the algorithm. llvm-svn: 207213	2014-04-25 09:08:10 +00:00
Chandler Carruth	770060ddfa	[LCG] Rather than doing a linear time SmallSetVector removal of each child from the worklist, wait until we actually need to pop another element off of the worklist and skip over any that were already visited by the DFS. This also enables swapping the nodes of the SCC into the worklist. No functionality changed. llvm-svn: 207212	2014-04-25 09:08:05 +00:00
Chandler Carruth	6b88e3a545	[LCG] Remove a completely unnecessary loop. It wasn't even doing any thing, just mucking up the code. I feel bad that I even wrote this loop. Very sorry. The diff is huge because of the indent change, but I promise all this is doing is realizing that the outer two loops were actually the exact same loops, and we didn't need two of them. llvm-svn: 207202	2014-04-25 06:45:06 +00:00
Chandler Carruth	774c9320c0	[LCG] Now that the loop structure of the core SCC finding routine is factored into a more reasonable form, replace the tail call with a simple outer-loop continuation. It's sad that C++ makes this so awkward to write, but it seems more direct and clear than the tail call at this point. llvm-svn: 207201	2014-04-25 06:38:58 +00:00
Duncan P. N. Exon Smith	cb7d29d30c	blockfreq: Only one mass distribution per node Remove the concepts of "forward" and "general" mass distributions, which was wrong. The split might have made sense in an early version of the algorithm, but it's definitely wrong now. <rdar://problem/14292693> llvm-svn: 207195	2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith	ebf7626988	blockfreq: Document assertion <rdar://problem/14292693> llvm-svn: 207194	2014-04-25 04:38:40 +00:00
Duncan P. N. Exon Smith	3f086789ff	blockfreq: Document high-level functions <rdar://problem/14292693> llvm-svn: 207191	2014-04-25 04:38:32 +00:00
Duncan P. N. Exon Smith	5291d2a561	blockfreq: Scale LoopData::Scale on the way down Rather than scaling loop headers and then scaling all the loop members by the header frequency, scale `LoopData::Scale` itself, and scale the loop members by it. It's much more obvious what's going on this way, and doesn't cost any extra multiplies. <rdar://problem/14292693> llvm-svn: 207189	2014-04-25 04:38:27 +00:00
Duncan P. N. Exon Smith	0633f0ec29	blockfreq: unwrapLoopPackage() => unwrapLoop() <rdar://problem/14292693> llvm-svn: 207188	2014-04-25 04:38:25 +00:00
Duncan P. N. Exon Smith	da0b21cf96	blockfreq: Pass the Loop directly into unwrapLoopPackage() <rdar://problem/14292693> llvm-svn: 207187	2014-04-25 04:38:23 +00:00
Duncan P. N. Exon Smith	575bd8c81b	blockfreq: Unwrap from Loops When unwrapping loops, just visit the loops rather than all nodes. <rdar://problem/14292693> llvm-svn: 207186	2014-04-25 04:38:20 +00:00
Duncan P. N. Exon Smith	46d9a56ce6	blockfreq: Separate unwrapLoops() from finalizeMetrics() <rdar://problem/14292693> llvm-svn: 207185	2014-04-25 04:38:17 +00:00
Duncan P. N. Exon Smith	c9b7cfea2f	blockfreq: Expose getPackagedNode() Make `getPackagedNode()` a member function of `BlockFrequencyInfoImplBase` so that it's available for templated code. <rdar://problem/14292693> llvm-svn: 207183	2014-04-25 04:38:12 +00:00
Duncan P. N. Exon Smith	1cab8a0708	blockfreq: Store the header with the members <rdar://problem/14292693> llvm-svn: 207182	2014-04-25 04:38:09 +00:00
Duncan P. N. Exon Smith	39cc64827e	blockfreq: Encapsulate LoopData::Header <rdar://problem/14292693> llvm-svn: 207181	2014-04-25 04:38:06 +00:00
Duncan P. N. Exon Smith	d132040ed6	blockfreq: Use LoopData directly Instead of passing around loop headers, pass around `LoopData` directly. <rdar://problem/14292693> llvm-svn: 207179	2014-04-25 04:38:01 +00:00
Duncan P. N. Exon Smith	fc7dc93031	blockfreq: Use a std::list for Loops As pointed out by David Blaikie in code review, a `std::list<T>` is simpler than a `std::vector<std::unique_ptr<T>>`. Another option is a `std::deque<T>` (which allocates in chunks), but I'd like to leave open the option of inserting in the middle of the sequence for handling irreducible control flow on the fly. <rdar://problem/14292693> llvm-svn: 207177	2014-04-25 04:30:06 +00:00
Chandler Carruth	91dcf0f977	[LCG] Switch a weird do/while loop that actually couldn't fail its condition into an obviously infinite loop with an assert about the degenerate condition. No functionality changed. llvm-svn: 207147	2014-04-24 21:19:30 +00:00
Chandler Carruth	24553934f8	[LCG] Incorporate the core trick of improvements on the naive Tarjan's algorithm here: http://dl.acm.org/citation.cfm?id=177301. The idea of isolating the roots has even more relevance when using the stack not just to implement the DFS but also to implement the recursive step. Because we use it for the recursive step, to isolate the roots we need to maintain two stacks: one for our recursive DFS walk, and another of the nodes that have been walked. The nice thing is that the latter will be half the size. It also fixes a complete hack where we scanned backwards over the stack to find the next potential-root to continue processing. Now that is always the top of the DFS stack. While this is a really nice improvement already (IMO) it further opens the door for two important simplifications: 1) De-duplicating some of the code across the two different walks. I've actually made the duplication a bit worse in some senses with this patch because the two are starting to converge. 2) Dramatically simplifying the loop structures of both walks. I wanted to do those separately as they'll be essentially just CFG restructuring. This patch on the other hand actually uses different datastructures to implement the algorithm itself. llvm-svn: 207098	2014-04-24 11:05:20 +00:00
Chandler Carruth	09751bf173	[LCG] Rotate logic applied to the top of the DFSStack to instead be applied prior to pushing a node onto the DFSStack. This is the first step toward avoiding the stack entirely for leaf nodes. It also simplifies things a bit and I think is pointing the way toward factoring some more of the shared logic out of the two implementations. It is also making it more obvious how to restructure the loops themselves to be a bit easier to read (although no different in terms of functionality). llvm-svn: 207095	2014-04-24 09:59:59 +00:00
Chandler Carruth	493e0a6ad0	[LCG] Switch the parent SCC tracking from a SmallSetVector to a SmallPtrSet. Currently, there is no need for stable iteration in this dimension, and I now thing there won't need to be going forward. If this is ever re-introduced in any form, it needs to not be a SetVector based solution because removal cannot be linear. There will be many SCCs with large numbers of parents. When encountering these, the incremental SCC update for intra-SCC edge removal was quadratic due to linear removal (kind of). I'm really hoping we can avoid having an ordering property here at all though... llvm-svn: 207091	2014-04-24 09:22:31 +00:00
Chandler Carruth	d52f8e0e4d	[LCG] We don't actually need a set in each SCC to track the nodes. We can use the node -> SCC mapping in the top-level graph to test this on the rare occasions we need it. llvm-svn: 207090	2014-04-24 08:55:36 +00:00
Craig Topper	353eda484c	[C++] Use 'nullptr'. llvm-svn: 207083	2014-04-24 06:44:33 +00:00
Chandler Carruth	6a4fee87bc	[LCG] Normalize the post-order SCC iterator to just iterate over the SCC values rather than having pointers in weird places. llvm-svn: 207053	2014-04-23 23:51:07 +00:00
Chandler Carruth	bd5d3082c4	[LCG] Switch the primary node iterator to be a much more normal C++ iterator, returning a Node by reference on dereference. llvm-svn: 207048	2014-04-23 23:34:48 +00:00
Chandler Carruth	2a898e0df6	[LCG] Make the insertion and query paths into the LCG which cannot fail return references to better model this property. No functionality changed. llvm-svn: 207047	2014-04-23 23:20:36 +00:00
Chandler Carruth	a10e240377	[LCG] Switch the SCC lookup to be in terms of call graph nodes rather than functions. So far, this access pattern is much more common. It seems likely that any user of this interface is going to have nodes at the point that they are querying the SCCs. No functionality changed. llvm-svn: 207045	2014-04-23 23:12:06 +00:00
Chandler Carruth	b4a04da0b9	[LCG] Switch the primary SCC building code to use the negative low-link values rather than an expensive dense map query to test whether children have already been popped into an SCC. This matches the incremental SCC building code. I've also included the assert that I put there but updated both of their text. No functionality changed here. I still don't have any great ideas for sharing the code between the two implementations, but I may try a brute-force approach to factoring it at some point. llvm-svn: 207042	2014-04-23 22:28:13 +00:00
Chandler Carruth	9302fbf0ae	[LCG] Add the first round of mutation support to the lazy call graph. This implements the core functionality necessary to remove an edge from the call graph and correctly update both the basic graph and the SCC structure. As part of that it has to run a tiny (in number of nodes) Tarjan-style DFS walk of an SCC being mutated to compute newly formed SCCs, etc. This is very rough and a WIP. I have a bunch of FIXMEs for code cleanup that will reduce the boilerplate in this change substantially. I also have a bunch of simplifications to various parts of both algorithms that I want to make, but first I'd like to have a more holistic picture. Ideally, I'd also like more testing. I'll probably add quite a few more unit tests as I go here to cover the various different aspects and corner cases of removing edges from the graph. Still, this is, so far, successfully updating the SCC graph in-place without disrupting the identity established for the existing SCCs even when we do challenging things like delete the critical edge that made an SCC cycle at all and have to reform things as a tree of smaller SCCs. Getting this to work is really critical for the new pass manager as it is going to associate significant state with the SCC instance and needs it to be stable. That is also the motivation behind the return of the newly formed SCCs. Eventually, I'll wire this all the way up to the public API so that the pass manager can use it to correctly re-enqueue newly formed SCCs into a fresh postorder traversal. llvm-svn: 206968	2014-04-23 11:03:03 +00:00
Chandler Carruth	cace6623c4	[LCG] Implement Tarjan's algorithm correctly this time. We have to walk up the stack finishing the exploration of each entries children before we're finished in addition to accounting for their low-links. Added a unittest that really hammers home the need for this with interlocking cycles that would each appear distinct otherwise and crash or compute the wrong result. As part of this, nuke a stale fixme and bring the rest of the implementation still more closely in line with the original algorithm. llvm-svn: 206966	2014-04-23 10:31:17 +00:00
Chandler Carruth	c7bad9a5a0	[LCG] Add a unittest for the LazyCallGraph. I had a weak moment and resisted this for too long. Just with the basic testing here I was able to exercise the analysis in more detail and sift out both type signature bugs in the API and a bug in the DFS numbering. All of these are fixed here as well. The unittests will be much more important for the mutation support where it is necessary to craft minimal mutations and then inspect the state of the graph. There is just no way to do that with a standard FileCheck test. However, unittesting these kinds of analyses is really quite easy, especially as they're designed with the new pass manager where there is essentially no infrastructure required to rig up the core logic and exercise it at an API level. As a minor aside about the DFS numbering bug, the DFS numbering used in LCG is a bit unusual. Rather than numbering from 0, we number from 1, and use 0 as the sentinel "unvisited" state. Other implementations often use '-1' for this, but I find it easier to deal with 0 and it shouldn't make any real difference provided someone doesn't write silly bugs like forgetting to actually initialize the DFS numbering. Oops. ;] llvm-svn: 206954	2014-04-23 08:08:49 +00:00
Chandler Carruth	3f9869a8e2	[LCG] Hoist the logic for forming a new SCC from the top of the DFSStack into a helper function. I plan to re-use it for doing incremental DFS-based updates to the SCCs when we mutate the call graph. llvm-svn: 206948	2014-04-23 06:09:03 +00:00
Chandler Carruth	0b623baeb3	[LCG] Switch the Callee sets to be DenseMaps pointing to the index into the Callee list. This is going to be quite important to prevent removal from going quadratic. No functionality changed at this point, this is one of the refactoring patches I've broken out of my initial work toward mutation updates of the call graph. llvm-svn: 206938	2014-04-23 04:00:17 +00:00
Duncan P. N. Exon Smith	b3380ea60a	blockfreq: Skip irreducible backedges inside functions The branch that skips irreducible backedges was only active when propagating mass at the top-level. In particular, when propagating mass through a loop recognized by `LoopInfo` with irreducible control flow inside, irreducible backedges would not be skipped. Not sure where that idea came from, but the result was that mass was lost until after loop exit. Added a testcase that covers this case. llvm-svn: 206860	2014-04-22 03:31:53 +00:00
Duncan P. N. Exon Smith	d1aec79d7a	blockfreq: Rename PackagedLoops => Loops llvm-svn: 206859	2014-04-22 03:31:50 +00:00
Duncan P. N. Exon Smith	2984a64bae	blockfreq: Use a pointer for ContainingLoop too llvm-svn: 206858	2014-04-22 03:31:44 +00:00
Duncan P. N. Exon Smith	e1423639bb	blockfreq: Use pointers to loops instead of an index Store pointers directly to loops inside the nodes. This could have been done without changing the type stored in `std::vector<>`. However, rather than computing the number of loops before constructing them (which `LoopInfo` doesn't provide directly), I've switched to a `vector<unique_ptr<LoopData>>`. This adds some heap overhead, but the number of loops is typically small. llvm-svn: 206857	2014-04-22 03:31:37 +00:00
Duncan P. N. Exon Smith	dc2d66e7b3	blockfreq: Implement clear() explicitly This was implicitly with copy assignment before, which fails to actually clear `std::vector<>`'s heap storage. Move assignment would work, but since MSVC can't imply those anyway, explicitly `clear()`-ing members makes more sense. llvm-svn: 206856	2014-04-22 03:31:34 +00:00
Duncan P. N. Exon Smith	cc88ebfa5f	blockfreq: Rename PackagedLoopData => LoopData No functionality change. llvm-svn: 206855	2014-04-22 03:31:31 +00:00
Chandler Carruth	f1221bd01b	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all the header #include lines, lib/Analysis/... edition. This one has a bit extra as there were other #define's before #include lines in addition to DEBUG_TYPE. I've sunk all of them as a block. llvm-svn: 206843	2014-04-22 02:48:03 +00:00
Chandler Carruth	1b9dde087e	[Modules] Remove potential ODR violations by sinking the DEBUG_TYPE define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837	2014-04-22 02:02:50 +00:00
Chandler Carruth	e96dd8975f	[Modules] Make Support/Debug.h modular. This requires it to not change behavior based on other files defining DEBUG_TYPE, which means it cannot define DEBUG_TYPE at all. This is actually better IMO as it forces folks to define relevant DEBUG_TYPEs for their files. However, it requires all files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't already. I've updated all such files in LLVM and will do the same for other upstream projects. This still leaves one important change in how LLVM uses the DEBUG_TYPE macro going forward: we need to only define the macro after header files have been #include-ed. Previously, this wasn't possible because Debug.h required the macro to be pre-defined. This commit removes that. By defining DEBUG_TYPE after the includes two things are fixed: - Header files that need to provide a DEBUG_TYPE for some inline code can do so by defining the macro before their inline code and undef-ing it afterward so the macro does not escape. - We no longer have rampant ODR violations due to including headers with different DEBUG_TYPE definitions. This may be mostly an academic violation today, but with modules these types of violations are easy to check for and potentially very relevant. Where necessary to suppor headers with DEBUG_TYPE, I have moved the definitions below the includes in this commit. I plan to move the rest of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big enough. The comments in Debug.h, which were hilariously out of date already, have been updated to reflect the recommended practice going forward. llvm-svn: 206822	2014-04-21 22:55:11 +00:00
Duncan P. N. Exon Smith	254689fcf9	blockfreq: Some cleanup of UnsignedFloat Change `PositiveFloat` to `UnsignedFloat`, and fix some of the comments to indicate that it's disappearing eventually. llvm-svn: 206771	2014-04-21 18:31:58 +00:00
Duncan P. N. Exon Smith	10be9a8868	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206707, reapplying r206704. The preceding commit to CalcSpillWeights should have sorted out the failing buildbots. <rdar://problem/14292693> llvm-svn: 206766	2014-04-21 17:57:07 +00:00
Chandler Carruth	572e3407c3	[PM] Add a new-PM-style CGSCC pass manager using the newly added LazyCallGraph analysis framework. Wire it up all the way through the opt driver and add some very basic testing that we can build pass pipelines including these components. Still a lot more to do in terms of testing that all of this works, but the basic pieces are here. There is a lot of boiler plate here. It's something I'm going to actively look at reducing, but I don't have any immediate ideas that don't end up making the code terribly complex in order to fold away the boilerplate. Until I figure out something to minimize the boilerplate, almost all of this is based on the code for the existing pass managers, copied and heavily adjusted to suit the needs of the CGSCC pass management layer. The actual CG management still has a bunch of FIXMEs in it. Notably, we don't do any updating of the CG as it is potentially invalidated. I wanted to get this in place to motivate the new analysis, and add update APIs to the analysis and the pass management layers in concert to make sure that the right APIs are present. llvm-svn: 206745	2014-04-21 11:12:00 +00:00
Chandler Carruth	99b756db04	[LCG] Add some basic debug output to the LCG pass. llvm-svn: 206730	2014-04-21 05:04:24 +00:00
Duncan P. N. Exon Smith	e63327e967	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206704, as expected. llvm-svn: 206707	2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith	875ddfac75	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite. I've done a careful audit, added some asserts, and fixed a couple of bugs (unfortunately, they were in unlikely code paths). There's a small chance that this will appease the failing bots [1][2]. (If so, great!) If not, I have a follow-up commit ready that will temporarily add -debug-only=block-freq to the two failing tests, allowing me to compare the code path between what the failing bots and what my machines (and the rest of the bots) are doing. Once I've triggered those builds, I'll revert both commits so the bots go green again. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 <rdar://problem/14292693> llvm-svn: 206704	2014-04-19 22:34:26 +00:00
Duncan P. N. Exon Smith	76b813619a	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206666, as planned. Still stumped on why the bots are failing. Sanitizer bots haven't turned anything up. If anyone can help me debug either of the failures (referenced in r206666) I'll owe them a beer. (In the meantime, I'll be auditing my patch for undefined behaviour.) llvm-svn: 206677	2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith	b3caf3646f	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206628, reapplying r206622 (and r206626). Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind don't tell us anything, but we're hoping the msan bot will catch it. So, I'm applying this again to get more feedback from the bots. I'll leave it in long enough to trigger builds in at least the sanitizer buildbots (it was failing for reasons unrelated to my commit last time it was in), and hopefully a few others.... and then I expect to revert a third time. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 llvm-svn: 206666	2014-04-18 22:30:03 +00:00
Chandler Carruth	2174f44f61	[LCG] Fix the bugs that Ben pointed out in code review (and the MSan bot caught). Sad that we don't have warnings for these things, but bleh, no idea how to fix that. llvm-svn: 206646	2014-04-18 20:44:16 +00:00
Benjamin Kramer	147644d400	Remove a couple of redundant copies of SmallVector::operator==. No functionality change. llvm-svn: 206635	2014-04-18 19:48:03 +00:00
Duncan P. N. Exon Smith	0842ff36a6	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206622 and the MSVC fixup in r206626. Apparently the remotely failing tests are still failing, despite my attempt to fix the nondeterminism in r206621. llvm-svn: 206628	2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith	38fe464df0	Fixing MSVC after r206622? llvm-svn: 206626	2014-04-18 17:38:01 +00:00
Duncan P. N. Exon Smith	f8361d127a	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206622	2014-04-18 17:22:25 +00:00
Chandler Carruth	d8d865e266	[LCG] Remove all of the complexity stemming from supporting copying. Reality is that we're never going to copy one of these. Supporting this was becoming a nightmare because nothing even causes it to compile most of the time. Lots of subtle errors built up that wouldn't have been caught by any "normal" testing. Also, make the move assignment actually work rather than the bogus swap implementation that would just infloop if used. As part of that, factor out the graph pointer updates into a helper to share between move construction and move assignment. llvm-svn: 206583	2014-04-18 11:02:33 +00:00
Chandler Carruth	18eadd9260	[LCG] Add support for building persistent and connected SCCs to the LazyCallGraph. This is the start of the whole point of this different abstraction, but it is just the initial bits. Here is a run-down of what's going on here. I'm planning to incorporate some (or all) of this into comments going forward, hopefully with better editing and wording. =] The crux of the problem with the traditional way of building SCCs is that they are ephemeral. The new pass manager however really needs the ability to associate analysis passes and results of analysis passes with SCCs in order to expose these analysis passes to the SCC passes. Making this work is kind-of the whole point of the new pass manager. =] So, when we're building SCCs for the call graph, we actually want to build persistent nodes that stick around and can be reasoned about later. We'd also like the ability to walk the SCC graph in more complex ways than just the traditional postorder traversal of the current CGSCC walk. That means that in addition to being persistent, the SCCs need to be connected into a useful graph structure. However, we still want the SCCs to be formed lazily where possible. These constraints are quite hard to satisfy with the SCC iterator. Also, using that would bypass our ability to actually add data to the nodes of the call graph to facilite implementing the Tarjan walk. So I've re-implemented things in a more direct and embedded way. This immediately makes it easy to get the persistence and connectivity correct, and it also allows leveraging the existing nodes to simplify the algorithm. I've worked somewhat to make this implementation more closely follow the traditional paper's nomenclature and strategy, although it is still a bit obtuse because it isn't recursive, using an explicit stack and a tail call instead, and it is interruptable, resuming each time we need another SCC. The other tricky bit here, and what actually took almost all the time and trials and errors I spent building this, is exactly what graph structure to build for the SCCs. The naive thing to build is the call graph in its newly acyclic form. I wrote about 4 versions of this which did precisely this. Inevitably, when I experimented with them across various use cases, they became incredibly awkward. It was all implementable, but it felt like a complete wrong fit. Square peg, round hole. There were two overriding aspects that pushed me in a different direction: 1) We want to discover the SCC graph in a postorder fashion. That means the root node will be the last node we find. Using the call-SCC DAG as the graph structure of the SCCs results in an orphaned graph until we discover a root. 2) We will eventually want to walk the SCC graph in parallel, exploring distinct sub-graphs independently, and synchronizing at merge points. This again is not helped by the call-SCC DAG structure. The structure which, quite surprisingly, ended up being completely natural to use is the inverse of the call-SCC DAG. We add the leaf SCCs to the graph as "roots", and have edges to the caller SCCs. Once I switched to building this structure, everything just fell into place elegantly. Aside from general cleanups (there are FIXMEs and too few comments overall) that are still needed, the other missing piece of this is support for iterating across levels of the SCC graph. These will become useful for implementing #2, but they aren't an immediate priority. Once SCCs are in good shape, I'll be working on adding mutation support for incremental updates and adding the pass manager that this analysis enables. llvm-svn: 206581	2014-04-18 10:50:32 +00:00
Duncan P. N. Exon Smith	e576167df8	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556	2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith	878cf2b804	blockfreq: Really fix r206548 (and r206549) Turns out this code is dead. llvm-svn: 206554	2014-04-18 02:10:09 +00:00
Duncan P. N. Exon Smith	c7abca54cf	blockfreq: Fixing MSVC after r206548? llvm-svn: 206549	2014-04-18 02:06:24 +00:00
Duncan P. N. Exon Smith	12e68e1733	blockfreq: Rewrite BlockFrequencyInfoImpl Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is not incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548	2014-04-18 01:57:45 +00:00
Nuno Lopes	9ced19abe8	remove some dead code lib/Analysis/IPA/InlineCost.cpp \| 18 ------------------ lib/Analysis/RegionPass.cpp \| 1 - lib/Analysis/TypeBasedAliasAnalysis.cpp \| 1 - lib/Transforms/Scalar/LoopUnswitch.cpp \| 21 --------------------- lib/Transforms/Utils/LCSSA.cpp \| 2 -- lib/Transforms/Utils/LoopSimplify.cpp \| 6 ------ utils/TableGen/AsmWriterEmitter.cpp \| 13 ------------- utils/TableGen/DFAPacketizerEmitter.cpp \| 7 ------- utils/TableGen/IntrinsicEmitter.cpp \| 2 -- 9 files changed, 71 deletions(-) llvm-svn: 206506	2014-04-17 22:26:44 +00:00
Gerolf Hoflehner	ecebc3730e	Reverse 206485. After some discussions the preferred semantics of the always_inline attribute is inline always when the compiler can determine that it it safe to do so. llvm-svn: 206487	2014-04-17 19:14:06 +00:00
Chandler Carruth	b60cb315bc	[LCG] Just move the allocator (now that we can) when moving a call graph. This simplifies the custom move constructor operation to one of walking the graph and updating the 'up' pointers to point to the new location of the graph. Switch the nodes from a reference to a pointer for the 'up' edge to facilitate this. llvm-svn: 206450	2014-04-17 07:25:59 +00:00
Chandler Carruth	81f497d176	[LCG] Remove the Module reference member which we weren't using for anything and doesn't make sense if assigning. llvm-svn: 206449	2014-04-17 07:22:19 +00:00
Gerolf Hoflehner	5f6268a40e	Inline a function when the always_inline attribute is set even when it contains a indirect branch. The attribute overrules correctness concerns like the escape of a local block address. This is for rdar://16501761 llvm-svn: 206429	2014-04-17 00:21:52 +00:00
Tobias Grosser	8d941ef30d	RegionInfo: Do not access a value that was just moved away This fixes a regression introduced in r206310. llvm-svn: 206328	2014-04-15 22:09:36 +00:00
David Blaikie	ec649acb82	Use unique_ptr to manage ownership of child Regions within llvm::Region llvm-svn: 206310	2014-04-15 18:32:43 +00:00
Craig Topper	9f008867c0	[C++11] More 'nullptr' conversion. In some cases just using a boolean check instead of comparing to nullptr. llvm-svn: 206243	2014-04-15 04:59:12 +00:00
Akira Hatanaka	5638b89944	Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly. Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights. This fixes PR18705 and <rdar://problem/15991090>. Differential Revision: http://reviews.llvm.org/D3363 llvm-svn: 206194	2014-04-14 16:56:19 +00:00
Duncan P. N. Exon Smith	689a50736e	blockfreq: Rename BlockFrequencyImpl to BlockFrequencyInfoImpl This is a shared implementation class for BlockFrequencyInfo and MachineBlockFrequencyInfo, not for BlockFrequency, a related (but distinct) class. No functionality change. <rdar://problem/14292693> llvm-svn: 206083	2014-04-11 23:20:58 +00:00
Duncan P. N. Exon Smith	37bd529964	blockfreq: Use getSuccessorIndex() No functionality change. <rdar://problem/14292693> llvm-svn: 206082	2014-04-11 23:20:52 +00:00
Tobias Grosser	c3d9db2336	Delinearize: Extend informationin -analyze output llvm-svn: 205838	2014-04-09 07:53:49 +00:00
Sebastian Pop	b2fdacf3f2	divide by the result of the gcd used to fail with 'Step should divide Start with no remainder.' llvm-svn: 205802	2014-04-08 21:21:13 +00:00
Sebastian Pop	9738e83a7d	handle special cases when findGCD returns 1 used to fail with 'Step should divide Start with no remainder.' llvm-svn: 205801	2014-04-08 21:21:10 +00:00
Sebastian Pop	b5b84e0963	in findGCD of multiply expr return the gcd we used to return 1 instead of the gcd llvm-svn: 205800	2014-04-08 21:21:05 +00:00
Eric Christopher	beb2cd6b7c	Handle vlas during inline cost computation if they'll be turned into a constant size alloca by inlining. Ran a run over the testsuite, no results out of the noise, fixes the testcase in the PR. PR19115. llvm-svn: 205710	2014-04-07 13:36:21 +00:00
Hal Finkel	b4e001cc81	Use TopTTI->getGEPCost from within getUserCost The implementation of getUserCost had duplicated (and hard-coded) the default logic in getGEPCost. Instead, it is better to use getGEPCost directly, which limits the default logic to the implementation of one function, and allows targets to override the behavior. No functionality change intended. llvm-svn: 205346	2014-04-01 18:50:06 +00:00
Arnold Schwaighofer	1a444489e9	PR15967 Fix in basicaa for faulty returning no alias. This commit consist of two parts. The first part fix the PR15967. The wrong conclusion was made when the MaxLookup limit was reached. The fix introduce a out parameter (MaxLookupReached) to DecomposeGEPExpression that the function aliasGEP can act upon. The second part is introducing the constant MaxLookupSearchDepth to make sure that DecomposeGEPExpression and GetUnderlyingObject use the same search depth. This is a small cleanup to clarify the original algorithm. Patch by Karl-Johan Karlsson! llvm-svn: 204859	2014-03-26 21:30:19 +00:00
Duncan P. N. Exon Smith	3dbe10503a	blockfreq: Implement Pass::releaseMemory() Implement Pass::releaseMemory() in BlockFrequencyInfo and MachineBlockFrequencyInfo. Just delete the private implementation when not in use. Switch to a std::unique_ptr to make the logic more clear. <rdar://problem/14292693> llvm-svn: 204741	2014-03-25 18:01:38 +00:00

... 7 8 9 10 11 ...

5631 Commits