llvm-project

Commit Graph

Author	SHA1	Message	Date
Kai Nacke	a56bb78021	Add strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls and remove the ToDo comment. Reviewer: Duncan P.N. Exan Smith llvm-svn: 200736	2014-02-04 05:55:16 +00:00
Nick Lewycky	00703e76dc	Self-memcpy-elision and memcpy of constant byte to memset transforms don't care how many bytes you were trying to transfer. Sink that safety test after those transforms. Noticed by inspection. llvm-svn: 200726	2014-02-04 00:18:54 +00:00
Reid Kleckner	d47a59a4f8	inalloca: Don't remove dead arguments in the presence of inalloca args It disturbs the layout of the parameters in memory and registers, leading to problems in the backend. The plan for optimizing internal inalloca functions going forward is to essentially SROA the argument memory and demote any captured arguments (things that aren't trivially written by a load or store) to an indirect pointer to a static alloca. llvm-svn: 200717	2014-02-03 20:42:49 +00:00
Duncan P. N. Exon Smith	1ff08e389f	Lower llvm.expect intrinsic correctly for i1 LowerExpectIntrinsic previously only understood the idiom of an expect intrinsic followed by a comparison with zero. For llvm.expect.i1, the comparison would be stripped by the early-cse pass. Patch by Daniel Micay. llvm-svn: 200664	2014-02-02 22:43:55 +00:00
Arnold Schwaighofer	17455633c7	LoopVectorizer: Enable unrolling of conditional stores and the load/store unrolling heuristic per default Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed up some benchmarks while not causing any interesting regressions. llvm-svn: 200621	2014-02-02 03:12:34 +00:00
Chandler Carruth	1665152cce	[LPM] Apply a really big hammer to fix PR18688 by recursively reforming LCSSA when we promote to SSA registers inside of LICM. Currently, this is actually necessary. The promotion logic in LICM uses SSAUpdater which doesn't understand how to place LCSSA PHI nodes. Teaching it to do so would be a very significant undertaking. It may be worthwhile and I've left a FIXME about this in the code as well as starting a thread on llvmdev to try to figure out the right long-term solution. For now, the PR needs to be fixed. Short of using the promition SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes, I don't see a cleaner or cheaper way of achieving this. Fortunately, LCSSA is relatively lazy and sparse -- it should only update instructions which need it. We can also skip the recursive variant when we don't promote to SSA values. llvm-svn: 200612	2014-02-01 13:35:14 +00:00
Eli Bendersky	fc49d19834	Remove some unused #includes llvm-svn: 200611	2014-02-01 13:12:54 +00:00
Reid Kleckner	a04504fe97	Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..." This reverts commit r200576. It broke 32-bit self-host builds by vectorizing two calls to @llvm.bswap.i64, which we then fail to expand. llvm-svn: 200602	2014-02-01 01:37:30 +00:00
Chandler Carruth	b3da389e30	[SLPV] Recognize vectorizable intrinsics during SLP vectorization and transform accordingly. Based on similar code from Loop vectorization. Subsequent commits will include vectorization of function calls to vector intrinsics and form function calls to vector library calls. Patch by Raul Silvera! (Much delayed due to my not running dcommit) llvm-svn: 200576	2014-01-31 21:14:40 +00:00
Chandler Carruth	c12224cb93	[vectorizer] Tweak the way we do small loop runtime unrolling in the loop vectorizer to not do so when runtime pointer checks are needed and share code with the new (not yet enabled) load/store saturation runtime unrolling. Also ensure that we only consider the runtime checks when the loop hasn't already been vectorized. If it has, the runtime check cost has already been paid. I've fleshed out a test case to cover the scalar unrolling as well as the vector unrolling and comment clearly why we are or aren't following the pattern. llvm-svn: 200530	2014-01-31 10:51:08 +00:00
Bob Wilson	055a0b4ca2	Fix a bug in gcov instrumentation introduced by r195513. <rdar://15930350> The entry block of a function starts with all the static allocas. The change in r195513 splits the block before those allocas, which has the effect of turning them into dynamic allocas. That breaks all sorts of things. Change to split after the initial allocas, and also add a comment explaining why the block is split. llvm-svn: 200515	2014-01-31 05:24:01 +00:00
Chandler Carruth	d4be9dc02d	[LPM] Fix PR18643, another scary place where loop transforms failed to preserve loop simplify of enclosing loops. The problem here starts with LoopRotation which ends up cloning code out of the latch into the new preheader it is buidling. This can create a new edge from the preheader into the exit block of the loop which breaks LoopSimplify form. The code tries to fix this by splitting the critical edge between the latch and the exit block to get a new exit block that only the latch dominates. This sadly isn't sufficient. The exit block may be an exit block for multiple nested loops. When we clone an edge from the latch of the inner loop to the new preheader being built in the outer loop, we create an exiting edge from the outer loop to this exit block. Despite breaking the LoopSimplify form for the inner loop, this is fine for the outer loop. However, when we split the edge from the inner loop to the exit block, we create a new block which is in neither the inner nor outer loop as the new exit block. This is a predecessor to the old exit block, and so the split itself takes the outer loop out of LoopSimplify form. We need to split every edge entering the exit block from inside a loop nested more deeply than the exit block in order to preserve all of the loop simplify constraints. Once we try to do that, a problem with splitting critical edges surfaces. Previously, we tried a very brute force to update LoopSimplify form by re-computing it for all exit blocks. We don't need to do this, and doing this much will sometimes but not always overlap with the LoopRotate bug fix. Instead, the code needs to specifically handle the cases which can start to violate LoopSimplify -- they aren't that common. We need to see if the destination of the split edge was a loop exit block in simplified form for the loop of the source of the edge. For this to be true, all the predecessors need to be in the exact same loop as the source of the edge being split. If the dest block was originally in this form, we have to split all of the deges back into this loop to recover it. The old mechanism of doing this was conservatively correct because at least one of the exiting blocks it rewrote was the DestBB and so the DestBB's predecessors were fixed. But this is a much more targeted way of doing it. Making it targeted is important, because ballooning the set of edges touched prevents LoopRotate from being able to split edges it needs to split to preserve loop simplify in a coherent way -- the critical edge splitting would sometimes find the other edges in need of splitting but not others. Many, many thanks for help from Nick reducing these test cases mightily. And helping lots with the analysis here as this one was quite tricky to track down. llvm-svn: 200393	2014-01-29 13:16:53 +00:00
Chandler Carruth	66f0b16360	[LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" because of the inside-out run of LoopSimplify in the LoopPassManager and the fact that LoopSimplify couldn't be "preserved" across two independent LoopPassManagers. Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI node because it thought it was rewriting (via SCEV) the incoming value to a loop invariant value. While it may well be invariant for the current loop, it may be rewritten in terms of an enclosing loop's values. This in and of itself is fine, as the LCSSA PHI node in the enclosing loop for the inner loop value we're rewriting will have its own LCSSA PHI node if used outside of the enclosing loop. With me so far? Well, the current loop and the enclosing loop may share an exiting block and exit block, and when they do they also share LCSSA PHI nodes. In this case, its not valid to RAUW through the LCSSA PHI node. Expected crazy test included. llvm-svn: 200372	2014-01-29 04:40:19 +00:00
Arnold Schwaighofer	1aab75ab49	LoopVectorizer: Don't count the induction variable multiple times When estimating register pressure, don't count the induction variable mulitple times. It is unlikely to be unrolled. This is currently disabled and hidden behind a flag ("enable-ind-var-reg-heur"). llvm-svn: 200371	2014-01-29 04:36:12 +00:00
Rafael Espindola	ab73c493ea	Fix pr14893. When simplifycfg moves an instruction, it must drop metadata it doesn't know is still valid with the preconditions changes. In particular, it must drop the range and tbaa metadata. The patch implements this with an utility function to drop all metadata not in a white list. llvm-svn: 200322	2014-01-28 16:56:46 +00:00
Chandler Carruth	b783628560	[vectorizer] Completely disable the block frequency guidance of the loop vectorizer, placing it behind an off-by-default flag. It turns out that block frequency isn't what we want at all, here or elsewhere. This has been I think a nagging feeling for several of us working with it, but Arnold has given some really nice simple examples where the results are so comprehensively wrong that they aren't useful. I'm planning to email the dev list with a summary of why its not really useful and a couple of ideas about how to better structure these types of heuristics. llvm-svn: 200294	2014-01-28 09:10:41 +00:00
Reid Kleckner	26af2cae05	Update optimization passes to handle inalloca arguments Summary: I searched Transforms/ and Analysis/ for 'ByVal' and updated those call sites to check for inalloca if appropriate. I added tests for any change that would allow an optimization to fire on inalloca. Reviewers: nlewycky Differential Revision: http://llvm-reviews.chandlerc.com/D2449 llvm-svn: 200281	2014-01-28 02:38:36 +00:00
Chandler Carruth	d84f776e8a	[LPM] Fix PR18616 where the shifts to the loop pass manager to extract LCSSA from it caused a crasher with the LoopUnroll pass. This crasher is really nasty. We destroy LCSSA form in a suprising way. When unrolling a loop into an outer loop, we not only need to restore LCSSA form for the outer loop, but for all children of the outer loop. This is somewhat obvious in retrospect, but hey! While this seems pretty heavy-handed, it's not that bad. Fundamentally, we only do this when we unroll a loop, which is already a heavyweight operation. We're unrolling all of these hypothetical inner loops as well, so their size and complexity is already on the critical path. This is just adding another pass over them to re-canonicalize. I have a test case from PR18616 that is great for reproducing this, but pretty useless to check in as it relies on many 10s of nested empty loops that get unrolled and deleted in just the right order. =/ What's worse is that investigating this has exposed another source of failure that is likely to be even harder to test. I'll try to come up with test cases for these fixes, but I want to get the fixes into the tree first as they're causing crashes in the wild. llvm-svn: 200273	2014-01-28 01:25:38 +00:00
Arnold Schwaighofer	18865db3c1	LoopVectorize: Support conditional stores by scalarizing The vectorizer takes a loop like this and widens all instructions except for the store. The stores are scalarized/unrolled and hidden behind an "if" block. for (i = 0; i < 128; ++i) { if (a[i] < 10) a[i] += val; } for (i = 0; i < 128; i+=2) { v = a[i:i+1]; v0 = (extract v, 0) + 10; v1 = (extract v, 1) + 10; if (v0 < 10) a[i] = v0; if (v1 < 10) a[i] = v1; } The vectorizer relies on subsequent optimizations to sink instructions into the conditional block where they are anticipated. The flag "vectorize-num-stores-pred" controls whether and how many stores to handle this way. Vectorization of conditional stores is disabled per default for now. This patch also adds a change to the heuristic when the flag "enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small loops until load/store ports are saturated. This heuristic uses TTI's getMaxUnrollFactor as a measure for load/store ports. I also added a second flag -enable-cond-stores-vec. It will enable vectorization of conditional stores. But there is no cost model for vectorization of conditional stores in place yet so this will not do good at the moment. rdar://15892953 Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll -vectorize-num-stores-pred=1 (before the BFI change): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower) Applications/siod/siod 2.18% Performance improvements: mesa -4.42% libquantum -4.15% With a patch that slightly changes the register heuristics (by subtracting the induction variable on both sides of the register pressure equation, as the induction variable is probably not really unrolled): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.73% Applications/siod/siod 1.97% Performance Improvements: libquantum -13.05% (we now also unroll quantum_toffoli) mesa -4.27% llvm-svn: 200270	2014-01-28 01:01:53 +00:00
Manman Ren	f1cb16e481	PGO branch weight: keep halving the weights until they can fit into uint32. When folding branches to common destination, the updated branch weights can exceed uint32 by more than factor of 2. We should keep halving the weights until they can fit into uint32. llvm-svn: 200262	2014-01-27 23:39:03 +00:00
Chandler Carruth	e24f3973eb	[vectorize] Initial version of respecting PGO in the vectorizer: treat cold loops as-if they were being optimized for size. Nothing fancy here. Simply test case included. The nice thing is that we can now incrementally build on top of this to drive other heuristics. All of the infrastructure work is done to get the profile information into this layer. The remaining work necessary to make this a fully general purpose loop unroller for very hot loops is to make it a fully general purpose loop unroller. Things I know of but am not going to have time to benchmark and fix in the immediate future: 1) Don't disable the entire pass when the target is lacking vector registers. This really doesn't make any sense any more. 2) Teach the unroller at least and the vectorizer potentially to handle non-if-converted loops. This is trivial for the unroller but hard for the vectorizer. 3) Compute the relative hotness of the loop and thread that down to the various places that make cost tradeoffs (very likely only the unroller makes sense here, and then only when dealing with loops that are small enough for unrolling to not completely blow out the LSD). I'm still dubious how useful hotness information will be. So far, my experiments show that if we can get the correct logic for determining when unrolling actually helps performance, the code size impact is completely unimportant and we can unroll in all cases. But at least we'll no longer burn code size on cold code. One somewhat unrelated idea that I've had forever but not had time to implement: mark all functions which are only reachable via the global constructors rigging in the module as optsize. This would also decrease the impact of any more aggressive heuristics here on code size. llvm-svn: 200219	2014-01-27 13:11:50 +00:00
Benjamin Kramer	9e709bce86	ConstantHoisting: We can't insert instructions directly in front of a PHI node. Insert before the terminating instruction of the dominating block instead. llvm-svn: 200218	2014-01-27 13:11:43 +00:00
Chandler Carruth	edfa37effa	[vectorizer] Add an override for the target instruction cost and use it to stabilize a test that really is trying to test generic behavior and not a specific target's behavior. llvm-svn: 200215	2014-01-27 11:41:50 +00:00
Chandler Carruth	2bb03ba605	[vectorizer] Simplify code to use existing helpers on the Function object and fewer pointless variables. Also, add a clarifying comment and a FIXME because the code which disables all vectorization if we can't use implicit floating point instructions just makes no sense at all. llvm-svn: 200214	2014-01-27 11:27:37 +00:00
Chandler Carruth	147c23278f	[vectorizer] Teach the loop vectorizer's unroller to only unroll by powers of two. This is essentially always the correct thing given the impact on alignment, scaling factors that can be used in addressing modes, etc. Also, fix the management of the unroll vs. small loop cost to more accurately model things with this world. Enhance a test case to actually exercise more of the unroll machinery if using synthetic constants rather than a specific target model. Before this change, with the added flags this test will unroll 3 times instead of either 2 or 4 (the two sensible answers). While I don't expect this to make a huge difference, if there are lots of loops sitting right on the edge of hitting the 'small unroll' factor, they might change behavior. However, I've benchmarked moving the small loop cost up and down in many various ways and by a huge factor (2x) without seeing more than 0.2% code size growth. Small adjustments such as the series that led up here have led to about 1% improvement on some benchmarks, but it is very close to the noise floor so I mostly checked that nothing regressed. Let me know if you see bad behavior on other targets but I don't expect this to be a sufficiently dramatic change to trigger anything. llvm-svn: 200213	2014-01-27 11:12:24 +00:00
Chandler Carruth	7f90b4530b	[vectorizer] Add some flags which are useful for conducting experiments with the unrolling behavior in the loop vectorizer. No functionality changed at this point. These are a bit hack-y, but talking with Hal, there doesn't seem to be a cleaner way to easily experiment with different thresholds here and he was also interested in them so I wanted to commit them. Suggestions for improvement are very welcome here. llvm-svn: 200212	2014-01-27 11:12:19 +00:00
Chandler Carruth	328998b2f7	[vectorizer] Fix a trivial oversight where we always requested the number of vector registers rather than toggling between vector and scalar register number based on VF. I don't have a test case as I spotted this by inspection and on X86 it only makes a difference if your target is lacking SSE and thus has no vector registers. If someone wants to add a test case for this for ARM or somewhere else where this is more significant, that would be awesome. Also made the variable name a bit more sensible while I'm here. llvm-svn: 200211	2014-01-27 11:12:14 +00:00
Chandler Carruth	56612b204a	[vectorizer] Clean up the handling of unvectorized loop unrolling in the LoopVectorize pass. The logic here doesn't make much sense. We only unrolled if the unvectorized loop was a reduction loop with a single basic block and small loop body. The reduction part in particular doesn't make much sense. Instead, if we just fall through to the vectorized unroll logic it makes more sense of unrolling if there is a vectorized reduction that could be hacked on by the SLP vectorizer or if the loop is small. This is mostly a cleanup and nothing in the test suite really exercises this, but I did run benchmarks across this change and saw no really significant changes. llvm-svn: 200198	2014-01-27 08:17:58 +00:00
Chandler Carruth	3aebcb99f7	[LPM] Conclude my immediate work by making the LoopVectorizer a FunctionPass. With this change the loop vectorizer no longer is a loop pass and can readily depend on function analyses. In particular, with this change we no longer have to form a loop pass manager to run the loop vectorizer which simplifies the entire pass management of LLVM. The next step here is to teach the loop vectorizer to leverage profile information through the profile information providing analysis passes. llvm-svn: 200074	2014-01-25 10:01:55 +00:00
Chandler Carruth	8765cf702f	[LPM] Make LCSSA a utility with a FunctionPass that applies it to all the loops in a function, and teach LICM to work in the presance of LCSSA. Previously, LCSSA was a loop pass. That made passes requiring it also be loop passes and unable to depend on function analysis passes easily. It also caused outer loops to have a different "canonical" form from inner loops during analysis. Instead, we go into LCSSA form and preserve it through the loop pass manager run. Note that this has the same problem as LoopSimplify that prevents enabling its verification -- loop passes which run at the end of the loop pass manager and don't preserve these are valid, but the subsequent loop pass runs of outer loops that do preserve this pass trigger too much verification and fail because the inner loop no longer verifies. The other problem this exposed is that LICM was completely unable to handle LCSSA form. It didn't preserve it and it actually would give up on moving instructions in many cases when they were used by an LCSSA phi node. I've taught LICM to support detecting LCSSA-form PHI nodes and to hoist and sink around them. This may actually let LICM fire significantly more because we put everything into LCSSA form to rotate the loop before running LICM. =/ Now LICM should handle that fine and preserve it correctly. The down side is that LICM has to require LCSSA in order to preserve it. This is just a fact of life for LCSSA. It's entirely possible we should completely remove LCSSA from the optimizer. The test updates are essentially accomodating LCSSA phi nodes in the output of LICM, and the fact that we now completely sink every instruction in ashr-crash below the loop bodies prior to unrolling. With this change, LCSSA is computed only three times in the pass pipeline. One of them could be removed (and potentially a SCEV run and a separate LoopPassManager entirely!) if we had a LoopPass variant of InstCombine that ran InstCombine on the loop body but refused to combine away LCSSA PHI nodes. Currently, this also prevents loop unrolling from being in the same loop pass manager is rotate, LICM, and unswitch. There is one thing that I really don't like -- preserving LCSSA in LICM is quite expensive. We end up having to re-run LCSSA twice for some loops after LICM runs because LICM can undo LCSSA both in the current loop and the parent loop. I don't really see good solutions to this other than to completely move away from LCSSA and using tools like SSAUpdater instead. llvm-svn: 200067	2014-01-25 04:07:24 +00:00
Juergen Ributzka	f26beda7c7	Revert "Revert "Add Constant Hoisting Pass" (r200034)" This reverts commit r200058 and adds the using directive for ARMTargetTransformInfo to silence two g++ overload warnings. llvm-svn: 200062	2014-01-25 02:02:55 +00:00
Hans Wennborg	4d67a2e85a	Revert "Add Constant Hoisting Pass" (r200034) This commit caused -Woverloaded-virtual warnings. The two new TargetTransformInfo::getIntImmCost functions were only added to the superclass, and to the X86 subclass. The other targets were not updated, and the warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was hiding the two new getIntImmCost variants. We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost" to the various subclasses, or turning it off, but I suspect that it's wrong to leave the functions unimplemnted in those targets. The default implementations return TCC_Free, which I don't think is right e.g. for ARM. llvm-svn: 200058	2014-01-25 01:18:18 +00:00
Juergen Ributzka	4f3df4ad64	Add Constant Hoisting Pass Retry commit r200022 with a fix for the build bot errors. Constant expressions have (unlike instructions) module scope use lists and therefore may have users in different functions. The fix is to simply ignore these out-of-function uses. llvm-svn: 200034	2014-01-24 20:18:00 +00:00
Benjamin Kramer	09b0f88a7f	InstCombine: Don't try to use aggregate elements of ConstantExprs. PR18600. llvm-svn: 200028	2014-01-24 19:02:37 +00:00
Juergen Ributzka	50e7e80d00	Revert "Add Constant Hoisting Pass" This reverts commit r200022 to unbreak the build bots. llvm-svn: 200024	2014-01-24 18:40:30 +00:00
Juergen Ributzka	38b67d0caf	Add Constant Hoisting Pass This pass identifies expensive constants to hoist and coalesces them to better prepare it for SelectionDAG-based code generation. This works around the limitations of the basic-block-at-a-time approach. First it scans all instructions for integer constants and calculates its cost. If the constant can be folded into the instruction (the cost is TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't consider it expensive and leave it alone. This is the default behavior and the default implementation of getIntImmCost will always return TCC_Free. If the cost is more than TCC_BASIC, then the integer constant can't be folded into the instruction and it might be beneficial to hoist the constant. Similar constants are coalesced to reduce register pressure and materialization code. When a constant is hoisted, it is also hidden behind a bitcast to force it to be live-out of the basic block. Otherwise the constant would be just duplicated and each basic block would have its own copy in the SelectionDAG. The SelectionDAG recognizes such constants as opaque and doesn't perform certain transformations on them, which would create a new expensive constant. This optimization is only applied to integer constants in instructions and simple (this means not nested) constant cast experessions. For example: %0 = load i64* inttoptr (i64 big_constant to i64*) Reviewed by Eric llvm-svn: 200022	2014-01-24 18:23:08 +00:00
Alp Toker	cb40291100	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018	2014-01-24 17:20:08 +00:00
Chandler Carruth	cc497b6ab5	[LPM] Fix a logic error in LICM spotted by inspection. We completely skipped promotion in LICM if the loop has a preheader or dedicated exits, but not both. We hoist if there is a preheader, and sink if there are dedicated exits, but either hoisting or sinking can move loop invariant code out of the loop! I have no idea if this has a practical consequence. If anyone has ideas for a test case, let me know. llvm-svn: 199966	2014-01-24 02:24:47 +00:00
Chandler Carruth	abfa3e5652	[cleanup] Use the type-based preservation method rather than a string literal that bakes a pass name and forces parsing it in the pass manager. llvm-svn: 199963	2014-01-24 01:59:49 +00:00
Rafael Espindola	2a05ea5c0e	Remove tail marker when changing an argument to an alloca. Argument promotion can replace an argument of a call with an alloca. This requires clearing the tail marker as it is very likely that the callee is now using an alloca in the caller. This fixes pr14710. llvm-svn: 199909	2014-01-23 17:19:42 +00:00
Chandler Carruth	aa7fa5e4b2	[LPM] Make LoopSimplify no longer a LoopPass and instead both a utility function and a FunctionPass. This has many benefits. The motivating use case was to be able to compute function analysis passes after running LoopSimplify (to avoid invalidating them) and then to run other passes which require LoopSimplify. Specifically passes like unrolling and vectorization are critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so that they can be profile aware. For the LoopVectorize pass the only things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify and LCSSA is next on my list. There are also a bunch of other benefits of doing this: - It is now very feasible to make more passes preserve LoopSimplify because they can simply run it after changing a loop. Because subsequence passes can assume LoopSimplify is preserved we can reduce the runs of this pass to the times when we actually mutate a loop structure. - The new pass manager should be able to more easily support loop passes factored in this way. - We can at long, long last observe that LoopSimplify is preserved across SCEV. This halves the number of times we run LoopSimplify!!! Now, getting here wasn't trivial. First off, the interfaces used by LoopSimplify are all over the map regarding how analysis are updated. We end up with weird "pass" parameters as a consequence. I'll try to clean at least some of this up later -- I'll have to have it all clean for the new pass manager. Next up I discovered a really frustrating bug. LoopUnroll claims to preserve LoopSimplify. That's actually a lie. But the way the LoopPassManager ends up running the passes, it always ran LoopSimplify on the unrolled-into loop, rectifying this oversight before any verification could kick in and point out that in fact nothing was preserved. So I've added code to the unroller to actually simplify the surrounding loop when it succeeds at unrolling. The only functional change in the test suite is that we now catch a case that was previously missed because SCEV and other loop transforms see their containing loops as simplified and thus don't miss some opportunities. One test case has been converted to check that we catch this case rather than checking that we miss it but at least don't get the wrong answer. Note that I have #if-ed out all of the verification logic in LoopSimplify! This is a temporary workaround while extracting these bits from the LoopPassManager. Currently, there is no way to have a pass in the LoopPassManager which preserves LoopSimplify along with one which does not. The LPM will try to verify on each loop in the nest that LoopSimplify holds but the now-Function-pass cannot distinguish what loop is being verified and so must try to verify all of them. The inner most loop is clearly no longer simplified as there is a pass which didn't even attempt to preserve it. =/ Once I get LCSSA out (and maybe LoopVectorize and some other fixes) I'll be able to re-enable this check and catch any places where we are still failing to preserve LoopSimplify. If this causes problems I can back this out and try to commit all of this at once, but so far this seems to work and allow much more incremental progress. llvm-svn: 199884	2014-01-23 11:23:19 +00:00
Matt Arsenault	84de61148b	Handle an addrspacecast case in memcpyopt llvm-svn: 199836	2014-01-22 21:53:19 +00:00
Tim Northover	bc6659c4e9	Loop strength reduce: fix function name. llvm-svn: 199801	2014-01-22 13:27:00 +00:00
Chandler Carruth	4de315430c	[SROA] Fix a bug which could cause the common type finding to return inconsistent results for different orderings of alloca slices. The fundamental issue is that it is just always a mistake to return early from this function. There is no effective early exit to leverage. This patch stops trynig to do so and simplifies the code a bit as a consequence. Original diagnosis and patch by James Molloy with some name tweaks by me in part reflecting feedback from Duncan Smith on the mailing list. llvm-svn: 199771	2014-01-21 23:16:05 +00:00
Owen Anderson	1664dc8973	Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators. This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd. llvm-svn: 199629	2014-01-20 07:44:53 +00:00
Benjamin Kramer	b80e1699b3	InstCombine: Modernize a bunch of cast combines. Also make them vector-aware. llvm-svn: 199608	2014-01-19 20:05:13 +00:00
Benjamin Kramer	970f4959d4	InstCombine: Hoist 3 copies of AddOne/SubOne into a header. llvm-svn: 199605	2014-01-19 16:56:10 +00:00
Benjamin Kramer	7a74bd4703	InstCombine: Replace a hand-rolled version of isKnownToBeAPowerOfTwo with the real thing. llvm-svn: 199604	2014-01-19 16:48:41 +00:00
Benjamin Kramer	72196f3ae5	InstCombine: Teach most integer add/sub/mul/div combines how to deal with vectors. llvm-svn: 199602	2014-01-19 15:24:22 +00:00
Benjamin Kramer	76b15d04ff	InstCombine: Refactor fmul/fdiv combines to handle vectors. llvm-svn: 199598	2014-01-19 13:36:27 +00:00
Chandler Carruth	1bf38c6a71	Fix a really nasty SROA bug with how we handled out-of-bounds memcpy intrinsics. Reported on the list by Evan with a couple of attempts to fix, but it took a while to dig down to the root cause. There are two overlapping bugs here, both centering around the circumstance of discovering a memcpy operand which is known to be completely outside the bounds of the alloca. First, we need to kill the other side of the memcpy if it was added to this alloca. Otherwise we'll factor it into our slicing and try to rewrite it even though we know for a fact that it is dead. This is made more tricky because we can visit the sides in either order. So we have to both kill the other side and skip instructions marked as dead. The latter really should be goodness in every case, but here is a matter of correctness. Second, we need to actually remove the uses of the alloca by the memcpy when queuing it for later deletion. Otherwise it may still be using the alloca when we go to promote it (if the rewrite re-uses the existing alloca instruction). Do this by factoring out the use-clobbering used when for nixing a Phi argument and re-using it across the operands of a to-be-deleted instruction. llvm-svn: 199590	2014-01-19 12:16:54 +00:00
Arnold Schwaighofer	cc742dd9e4	LoopVectorizer: A reduction that has multiple uses of the reduction value is not a reduction. Really. Under certain circumstances (the use list of an instruction has to be set up right - hence the extra pass in the test case) we would not recognize when a value in a potential reduction cycle was used multiple times by the reduction cycle. Fixes PR18526. radar://15851149 llvm-svn: 199570	2014-01-19 03:18:31 +00:00
Nick Lewycky	a6a17d77d2	Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu! llvm-svn: 199564	2014-01-18 22:47:12 +00:00
Benjamin Kramer	fea9ac99b0	InstCombine: Make the (fmul X, -1.0) -> (fsub -0.0, X) transform handle vectors too. PR18532. llvm-svn: 199553	2014-01-18 16:43:14 +00:00
Owen Anderson	48b842ef7c	Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep). llvm-svn: 199528	2014-01-18 00:48:14 +00:00
Kostya Serebryany	714c67c31e	[asan] extend asan-coverage (still experimental). - add a mode for collecting per-block coverage (-asan-coverage=2). So far the implementation is naive (all blocks are instrumented), the performance overhead on top of asan could be as high as 30%. - Make sure the one-time calls to __sanitizer_cov are moved to function buttom, which in turn required to copy the original debug info into the call insn. Here is the performance data on SPEC 2006 (train data, comparing asan with asan-coverage={0,1,2}): asan+cov0 asan+cov1 diff 0-1 asan+cov2 diff 0-2 diff 1-2 400.perlbench, 65.60, 65.80, 1.00, 76.20, 1.16, 1.16 401.bzip2, 65.10, 65.50, 1.01, 75.90, 1.17, 1.16 403.gcc, 1.64, 1.69, 1.03, 2.04, 1.24, 1.21 429.mcf, 21.90, 22.60, 1.03, 23.20, 1.06, 1.03 445.gobmk, 166.00, 169.00, 1.02, 205.00, 1.23, 1.21 456.hmmer, 88.30, 87.90, 1.00, 91.00, 1.03, 1.04 458.sjeng, 210.00, 222.00, 1.06, 258.00, 1.23, 1.16 462.libquantum, 1.73, 1.75, 1.01, 2.11, 1.22, 1.21 464.h264ref, 147.00, 152.00, 1.03, 160.00, 1.09, 1.05 471.omnetpp, 115.00, 116.00, 1.01, 140.00, 1.22, 1.21 473.astar, 133.00, 131.00, 0.98, 142.00, 1.07, 1.08 483.xalancbmk, 118.00, 120.00, 1.02, 154.00, 1.31, 1.28 433.milc, 19.80, 20.00, 1.01, 20.10, 1.02, 1.01 444.namd, 16.20, 16.20, 1.00, 17.60, 1.09, 1.09 447.dealII, 41.80, 42.20, 1.01, 43.50, 1.04, 1.03 450.soplex, 7.51, 7.82, 1.04, 8.25, 1.10, 1.05 453.povray, 14.00, 14.40, 1.03, 15.80, 1.13, 1.10 470.lbm, 33.30, 34.10, 1.02, 34.10, 1.02, 1.00 482.sphinx3, 12.40, 12.30, 0.99, 13.00, 1.05, 1.06 llvm-svn: 199488	2014-01-17 11:00:30 +00:00
Quentin Colombet	dc0b2ea2bc	[opt][PassInfo] Allow opt to run passes that need target machine. When registering a pass, a pass can now specify a second construct that takes as argument a pointer to TargetMachine. The PassInfo class has been updated to reflect that possibility. If such a constructor exists opt will use it instead of the default constructor when instantiating the pass. Since such IR passes are supposed to be rare, no specific support has been added to this commit to allow an easy registration of such a pass. In other words, for such pass, the initialization function has to be hand-written (see CodeGenPrepare for instance). Now, codegenprepare can be tested using opt: opt -codegenprepare -mtriple=mytriple input.ll llvm-svn: 199430	2014-01-16 21:44:34 +00:00
Owen Anderson	e7321660c1	Fix two cases where we could lose fast math flags when optimizing FADD expressions. llvm-svn: 199427	2014-01-16 21:26:02 +00:00
Owen Anderson	4557a156e3	Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation. llvm-svn: 199425	2014-01-16 21:07:52 +00:00
Owen Anderson	e8537fc7e0	Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression. llvm-svn: 199424	2014-01-16 20:59:41 +00:00
Owen Anderson	f74cfe031f	Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X). llvm-svn: 199420	2014-01-16 20:36:42 +00:00
Evgeniy Stepanov	13665367a0	[asan] Remove -fsanitize-address-zero-base-shadow command line flag from clang, and disable zero-base shadow support on all platforms where it is not the default behavior. - It is completely unused, as far as we know. - It is ABI-incompatible with non-zero-base shadow, which means all objects in a process must be built with the same setting. Failing to do so results in a segmentation fault at runtime. - It introduces a backward dependency of compiler-rt on user code, which is uncommon and complicates testing. This is the LLVM part of a larger change. llvm-svn: 199371	2014-01-16 10:19:12 +00:00
Hans Wennborg	4744ac1733	Switch-to-lookup tables: set threshold to 3 cases There has been an old FIXME to find the right cut-off for when it's worth analyzing and potentially transforming a switch to a lookup table. The switches always have two or more cases. I could not measure any speed-up by transforming a switch with two cases. A switch with three cases gets a nice speed-up, and I couldn't measure any compile-time regression, so I think this is the right threshold. In a Clang self-host, this causes 480 new switches to be transformed, and reduces the final binary size with 8 KB. llvm-svn: 199294	2014-01-15 05:00:27 +00:00
Arnold Schwaighofer	dc4c9460a2	LoopVectorize: Only strip casts from integer types when replacing symbolic strides Fixes PR18480. llvm-svn: 199291	2014-01-15 03:35:46 +00:00
Matt Arsenault	2d353d1a10	Do pointer cast simplifications on addrspacecast llvm-svn: 199254	2014-01-14 20:00:45 +00:00
Matt Arsenault	f08a44f903	Remove a check for an illegal condition. Bitcasts can't be between address spaces anymore. llvm-svn: 199253	2014-01-14 19:56:57 +00:00
Matt Arsenault	e55a2c2e6b	Make nocapture analysis work with addrspacecast llvm-svn: 199246	2014-01-14 19:11:52 +00:00
Duncan P. N. Exon Smith	93be7c4fb3	Reapply "LTO: add API to set strategy for -internalize" Reapply r199191, reverted in r199197 because it carelessly broke Other/link-opts.ll. The problem was that calling createInternalizePass("main") would select createInternalizePass(bool("main")) instead of createInternalizePass(ArrayRef<const char >("main")). This commit fixes the bug. The original commit message follows. Add API to LTOCodeGenerator to specify a strategy for the -internalize pass. This is a new attempt at Bill's change in r185882, which he reverted in r188029 due to problems with the gold linker. This puts the onus on the linker to decide whether (and what) to internalize. In particular, running internalize before outputting an object file may change a 'weak' symbol into an internal one, even though that symbol could be needed by an external object file --- e.g., with arclite. This patch enables three strategies: - LTO_INTERNALIZE_FULL: the default (and the old behaviour). - LTO_INTERNALIZE_NONE: skip -internalize. - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden visibility. LTO_INTERNALIZE_FULL should be used when linking an executable. Outputting an object file (e.g., via ld -r) is more complicated, and depends on whether hidden symbols should be internalized. E.g., for ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and LTO_INTERNALIZE_HIDDEN can be used otherwise. However, LTO_INTERNALIZE_FULL is inappropriate, since the output object file will eventually need to link with others. lto_codegen_set_internalize_strategy() sets the strategy for subsequent calls to lto_codegen_write_merged_modules() and lto_codegen_compile(). <rdar://problem/14334895> llvm-svn: 199244	2014-01-14 18:52:17 +00:00
Nico Rieck	7157bb765e	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199218	2014-01-14 15:22:47 +00:00
Nico Rieck	9d2e0df049	Revert "Decouple dllexport/dllimport from linkage" Revert this for now until I fix an issue in Clang with it. This reverts commit r199204. llvm-svn: 199207	2014-01-14 12:38:32 +00:00
Nico Rieck	e43aaf7967	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199204	2014-01-14 11:55:03 +00:00
NAKAMURA Takumi	23c0ab53b2	Revert r199191, "LTO: add API to set strategy for -internalize" Please update also Other/link-opts.ll, in next time. llvm-svn: 199197	2014-01-14 09:40:18 +00:00
Duncan P. N. Exon Smith	43ea3478bf	LTO: add API to set strategy for -internalize Add API to LTOCodeGenerator to specify a strategy for the -internalize pass. This is a new attempt at Bill's change in r185882, which he reverted in r188029 due to problems with the gold linker. This puts the onus on the linker to decide whether (and what) to internalize. In particular, running internalize before outputting an object file may change a 'weak' symbol into an internal one, even though that symbol could be needed by an external object file --- e.g., with arclite. This patch enables three strategies: - LTO_INTERNALIZE_FULL: the default (and the old behaviour). - LTO_INTERNALIZE_NONE: skip -internalize. - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden visibility. LTO_INTERNALIZE_FULL should be used when linking an executable. Outputting an object file (e.g., via ld -r) is more complicated, and depends on whether hidden symbols should be internalized. E.g., for ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and LTO_INTERNALIZE_HIDDEN can be used otherwise. However, LTO_INTERNALIZE_FULL is inappropriate, since the output object file will eventually need to link with others. lto_codegen_set_internalize_strategy() sets the strategy for subsequent calls to lto_codegen_write_merged_modules() and lto_codegen_compile*(). <rdar://problem/14334895> llvm-svn: 199191	2014-01-14 06:37:26 +00:00
Chandler Carruth	73523021d0	[PM] Split DominatorTree into a concrete analysis result object which can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104	2014-01-13 13:07:17 +00:00
Chandler Carruth	e509db410a	[PM] Pull the generic graph algorithms and data structures for dominator trees into the Support library. These are all expressed in terms of the generic GraphTraits and CFG, with no reliance on any concrete IR types. Putting them in support clarifies that and makes the fact that the static analyzer in Clang uses them much more sane. When moving the Dominators.h file into the IR library I claimed that this was the right home for it but not something I planned to work on. Oops. So why am I doing this? It happens to be one step toward breaking the requirement that IR verification can only be performed from inside of a pass context, which completely blocks the implementation of verification for the new pass manager infrastructure. Fixing it will also allow removing the concept of the "preverify" step (WTF???) and allow the verifier to cleanly flag functions which fail verification in a way that precludes even computing dominance information. Currently, that results in a fatal error even when you ask the verifier to not fatally error. It's awesome like that. The yak shaving will continue... llvm-svn: 199095	2014-01-13 10:52:56 +00:00
Chandler Carruth	5ad5f15cff	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082	2014-01-13 09:26:24 +00:00
Chandler Carruth	07baed53e8	Re-sort #include lines again, prior to moving headers around. llvm-svn: 199080	2014-01-13 08:04:33 +00:00
Hans Wennborg	ac114a3ce7	Switch-to-lookup tables: Don't require a result for the default case when the lookup table doesn't have any holes. This means we can build a lookup table for switches like this: switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; case 3: return 4; default: exit(1); } The default case doesn't yield a constant result here, but that doesn't matter, since a default result is only necessary for filling holes in the lookup table, and this table doesn't have any holes. This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB off the resulting clang binary. llvm-svn: 199025	2014-01-12 00:44:41 +00:00
Arnold Schwaighofer	66c742aeea	LoopVectorizer: Enable strided memory accesses versioning per default I saw no compile or execution time regressions on x86_64 -mavx -O3. radar://13075509 llvm-svn: 199015	2014-01-11 20:40:34 +00:00
NAKAMURA Takumi	41c409ce0d	LoopVectorize.cpp: Appease MSC16. Excuse me, I hope msc16 builders would be fine till its end day. Introduce nullptr then. ;) llvm-svn: 199001	2014-01-11 09:59:27 +00:00
Diego Novillo	9518b63bfc	Extend and simplify the sample profile input file. 1- Use the line_iterator class to read profile files. 2- Allow comments in profile file. Lines starting with '#' are completely ignored while reading the profile. 3- Add parsing support for discriminators and indirect call samples. Our external profiler can emit more profile information that we are currently not handling. This patch does not add new functionality to support this information, but it allows profile files to provide it. I will add actual support later on (for at least one of these features, I need support for DWARF discriminators in Clang). A sample line may contain the following additional information: Discriminator. This is used if the sampled program was compiled with DWARF discriminator support (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This is currently only emitted by GCC and we just ignore it. Potential call targets and samples. If present, this line contains a call instruction. This models both direct and indirect calls. Each called target is listed together with the number of samples. For example, 130: 7 foo:3 bar:2 baz:7 The above means that at relative line offset 130 there is a call instruction that calls one of foo(), bar() and baz(). With baz() being the relatively more frequent call target. Differential Revision: http://llvm-reviews.chandlerc.com/D2355 4- Simplify format of profile input file. This implements earlier suggestions to simplify the format of the sample profile file. The symbol table is not necessary and function profiles do not need to know the number of samples in advance. Differential Revision: http://llvm-reviews.chandlerc.com/D2419 llvm-svn: 198973	2014-01-10 23:23:51 +00:00
Diego Novillo	0accb3d2bc	Propagation of profile samples through the CFG. This adds a propagation heuristic to convert instruction samples into branch weights. It implements a similar heuristic to the one implemented by Dehao Chen on GCC. The propagation proceeds in 3 phases: 1- Assignment of block weights. All the basic blocks in the function are initial assigned the same weight as their most frequently executed instruction. 2- Creation of equivalence classes. Since samples may be missing from blocks, we can fill in the gaps by setting the weights of all the blocks in the same equivalence class to the same weight. To compute the concept of equivalence, we use dominance and loop information. Two blocks B1 and B2 are in the same equivalence class if B1 dominates B2, B2 post-dominates B1 and both are in the same loop. 3- Propagation of block weights into edges. This uses a simple propagation heuristic. The following rules are applied to every block B in the CFG: - If B has a single predecessor/successor, then the weight of that edge is the weight of the block. - If all the edges are known except one, and the weight of the block is already known, the weight of the unknown edge will be the weight of the block minus the sum of all the known edges. If the sum of all the known edges is larger than B's weight, we set the unknown edge weight to zero. - If there is a self-referential edge, and the weight of the block is known, the weight for that edge is set to the weight of the block minus the weight of the other incoming edges to that block (if known). Since this propagation is not guaranteed to finalize for every CFG, we only allow it to proceed for a limited number of iterations (controlled by -sample-profile-max-propagate-iterations). It currently uses the same GCC default of 100. Before propagation starts, the pass builds (for each block) a list of unique predecessors and successors. This is necessary to handle identical edges in multiway branches. Since we visit all blocks and all edges of the CFG, it is cleaner to build these lists once at the start of the pass. Finally, the patch fixes the computation of relative line locations. The profiler emits lines relative to the function header. To discover it, we traverse the compilation unit looking for the subprogram corresponding to the function. The line number of that subprogram is the line where the function begins. That becomes line zero for all the relative locations. llvm-svn: 198972	2014-01-10 23:23:46 +00:00
Arnold Schwaighofer	c2e9d759f2	LoopVectorizer: Handle strided memory accesses by versioning for (i = 0; i < N; ++i) A[i * Stride1] += B[i * Stride2]; We take loops like this and check that the symbolic strides 'Strided1/2' are one and drop to the scalar loop if they are not. This is currently disabled by default and hidden behind the flag 'enable-mem-access-versioning'. radar://13075509 llvm-svn: 198950	2014-01-10 18:20:32 +00:00
Chandler Carruth	d48cdbf0c3	Put the functionality for printing a value to a raw_ostream as an operand into the Value interface just like the core print method is. That gives a more conistent organization to the IR printing interfaces -- they are all attached to the IR objects themselves. Also, update all the users. This removes the 'Writer.h' header which contained only a single function declaration. llvm-svn: 198836	2014-01-09 02:29:41 +00:00
Hao Liu	26abebbb2c	Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine. llvm-svn: 198730	2014-01-08 03:06:15 +00:00
Chandler Carruth	9aca918df9	Move the LLVM IR asm writer header files into the IR directory, as they are part of the core IR library in order to support dumping and other basic functionality. Rename the 'Assembly' include directory to 'AsmParser' to match the library name and the only functionality left their -- printing has been in the core IR library for quite some time. Update all of the #includes to match. All of this started because I wanted to have the layering in good shape before I started adding support for printing LLVM IR using the new pass infrastructure, and commandline support for the new pass infrastructure. llvm-svn: 198688	2014-01-07 12:34:26 +00:00
Chandler Carruth	8a8cd2bab9	Re-sort all of the includes with ./utils/sort_includes.py so that subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685	2014-01-07 11:48:04 +00:00
Andrew Trick	e4a18605e0	Reapply r198654 "indvars: sink truncates outside the loop." This doesn't seem to have actually broken anything. It was paranoia on my part. Trying again now that bots are more stable. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198678	2014-01-07 06:59:12 +00:00
Andrew Trick	3c0ed08996	Revert "indvars: sink truncates outside the loop." This reverts commit r198654. One of the bots reported a SciMark failure. llvm-svn: 198659	2014-01-07 01:50:58 +00:00
Andrew Trick	0b8e3b2cb4	indvars: sink truncates outside the loop. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198654	2014-01-07 01:02:55 +00:00
Andrew Trick	b70d9780ac	80 col. comment. llvm-svn: 198653	2014-01-07 01:02:52 +00:00
Andrew Trick	6796ab424c	Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted. We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted. llvm-svn: 198631	2014-01-06 19:43:14 +00:00
Alp Toker	f929e09b10	Add missed cleanup from r198456 All other uses of this macro in LLVM/clang have been moved to the function definition so follow suite (and the usage advice) here too for consistency. llvm-svn: 198516	2014-01-04 22:47:48 +00:00
Alp Toker	5e9f3265f8	Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." This commit was the source of crasher PR18384: While deleting: label %for.cond127 An asserting value handle still pointed to this value! UNREACHABLE executed at llvm/lib/IR/Value.cpp:671! Reverting to get the builders green, feel free to re-land after fixing up. (Renato has a handy isolated repro if you need it.) This reverts commit r198478. llvm-svn: 198503	2014-01-04 17:00:45 +00:00
Andrew Trick	aceac9746d	Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things. getSCEV for an ashr instruction creates an intermediate zext expression when it truncates its operand. The operand is initially inside the loop, so the narrow zext expression has a non-loop-invariant loop disposition. LoopSimplify then runs on an outer loop, hoists the ashr operand, and properly invalidate the SCEVs that are mapped to value. The SCEV expression for the ashr is now an AddRec with the hoisted value as the now loop-invariant start value. The LoopDisposition of this wide value was properly invalidated during LoopSimplify. However, if we later get the ashr SCEV again, we again try to create the intermediate zext expression. We get the same SCEV that we did earlier, and it is still cached because it was never mapped to a Value. When we try to create a new AddRec we abort because we're using the old non-loop-invariant LoopDisposition. I don't have a solution for this other than to clear LoopDisposition when LoopSimplify hoists things. I think the long-term strategy should be to perform LoopSimplify on all loops before computing SCEV and before running any loop opts on individual loops. It's possible we may want to rerun LoopSimplify on individual loops, but it should rarely do anything, so rarely require invalidating SCEV. llvm-svn: 198478	2014-01-04 05:52:49 +00:00
Nico Weber	7408c7066a	Add a LLVM_DUMP_METHOD macro. The motivation is to mark dump methods as used in debug builds so that they can be called from lldb, but to not do so in release builds so that they can be dead-stripped. There's lots of potential follow-up work suggested in the thread "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev, but everyone seems to agreen on this subset. Macro name chosen by fair coin toss. llvm-svn: 198456	2014-01-03 22:53:37 +00:00
David Peixotto	ea9ba446d5	Fix loop rerolling pass failure with non-consant loop lower bound The loop rerolling pass was failing with an assertion failure from a failed cast on loops like this: void foo(int A, int B, int m, int n) { for (int i = m; i < n; i+=4) { A[i+0] = B[i+0] * 4; A[i+1] = B[i+1] * 4; A[i+2] = B[i+2] * 4; A[i+3] = B[i+3] * 4; } } The code was casting the SCEV-expanded code for the new induction variable to a phi-node. When the loop had a non-constant lower bound, the SCEV expander would end the code expansion with an add insted of a phi node and the cast would fail. It looks like the cast to a phi node was only needed to get the induction variable value coming from the backedge to compute the end of loop condition. This patch changes the loop reroller to compare the induction variable to the number of times the backedge is taken instead of the iteration count of the loop. In other words, we stop the loop when the current value of the induction variable == IterationCount-1. Previously, the comparison was comparing the induction variable value from the next iteration == IterationCount. This problem only seems to occur on 32-bit targets. For some reason, the loop is not rerolled on 64-bit targets. PR18290 llvm-svn: 198425	2014-01-03 17:20:01 +00:00
Hal Finkel	decb024c86	Disable compare sinking in CodeGenPrepare when multiple condition registers are available As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively sinks compares to reduce pressure on the condition register(s), for targets such as PowerPC with multiple condition registers, this may not be the right thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is true. This functionality will be used by the PowerPC backend in an upcoming commit. Especially when the PowerPC backend starts tracking individual condition register bits as separate allocatable entities (which will happen in this upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is significantly suboptimial. llvm-svn: 198354	2014-01-02 21:13:43 +00:00
Andrew Trick	b6bc783060	indvars: cleanup the IV visitor. It does more than gather sext/zext info. llvm-svn: 198353	2014-01-02 21:12:11 +00:00
Matt Arsenault	461c8e0a8c	Delete unread globals through addrspacecast llvm-svn: 198346	2014-01-02 20:01:43 +00:00
Matt Arsenault	da1deabb16	Fix addrspacecast with metadata globals llvm-svn: 198345	2014-01-02 19:53:49 +00:00
Andrew Trick	020dd898fc	indvars: insert truncate at loop boundary to avoid redundant IVs. When widening an IV to remove s/zext, we generally try to eliminate the original narrow IV. However, LCSSA phi nodes outside the loop were still using the original IV. Clean this up more aggressively to avoid redundancy in generated code. llvm-svn: 198338	2014-01-02 19:29:38 +00:00
Nico Weber	1226531099	Set LLVM_EXPORTED_SYMBOL_FILE in CMakeLists whose corresponding Makefiles do so. (unittests/ExecutionEngine/JIT/CMakeLists.txt is still missing for now, since it handles export files in a strange way: It generates a .exports file from a .def file instead of the other way round.) llvm-svn: 198183	2013-12-29 23:06:49 +00:00
Alexander Potapenko	4f0335f863	[ASan] Fix the test for __asan_gen_ globals and actually fix http://llvm.org/bugs/show_bug.cgi?id=17976 by setting the correct linkage (as stated in the bug). llvm-svn: 198018	2013-12-25 16:46:27 +00:00
Alexander Potapenko	daf96ae81b	[ASan] Make sure none of the __asan_gen_ global strings end up in the symbol table, add a test. This should fix http://llvm.org/bugs/show_bug.cgi?id=17976 Another test checking for the global variables' locations and prefixes on Darwin will be committed separately. llvm-svn: 198017	2013-12-25 14:22:15 +00:00
Andrew Trick	0ba77a0740	Add support to indvars for optimizing sadd.with.overflow. Split sadd.with.overflow into add + sadd.with.overflow to allow analysis and optimization. This should ideally be done after InstCombine, which can perform code motion (eventually indvars should run after all canonical instcombines). We want ISEL to recombine the add and the check, at least on x86. This is currently under an option for reducing live induction variables: -liv-reduce. The next step is reducing liveness of IVs that are live out of the overflow check paths. Once the related optimizations are fully developed, reviewed and tested, I do expect this to become default. llvm-svn: 197926	2013-12-23 23:31:49 +00:00
Richard Sandiford	1fb5c13e3a	Fix Scalarizer insertion point when replacing PHIs with insertelements If the Scalarizer scalarized a vector PHI but could not scalarize all uses of it, it would insert a series of insertelements to reconstruct the vector PHI value from the scalar ones. The problem was that it would emit these insertelements immediately after the PHI, even if there were other PHIs after it. llvm-svn: 197909	2013-12-23 14:51:56 +00:00
Richard Sandiford	3548cbb980	Fix Scalarizer handling of vector GEPs with multiple index operands The old code only worked for one index operand. Also handle "inbounds". llvm-svn: 197908	2013-12-23 14:45:00 +00:00
Kostya Serebryany	530e207d8a	[asan] don't unpoison redzones on function exit in use-after-return mode. Summary: Before this change the instrumented code before Ret instructions looked like: <Unpoison Frame Redzones> if (Frame != OriginalFrame) // I.e. Frame is fake <Poison Complete Frame> Now the instrumented code looks like: if (Frame != OriginalFrame) // I.e. Frame is fake <Poison Complete Frame> else <Unpoison Frame Redzones> Reviewers: eugenis Reviewed By: eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2458 llvm-svn: 197907	2013-12-23 14:15:08 +00:00
Kostya Serebryany	ff7bde1582	[asan] produce fewer stores when poisoning stack shadow llvm-svn: 197904	2013-12-23 09:24:36 +00:00
Justin Bogner	0ba3f211c4	Transforms: Don't create bad weights when eliminating dead cases If we happen to eliminate every case in a switch that has branch weights, we currently try to create metadata for the one remaining branch, triggering an assert. Instead, we need to check that the metadata we're trying to create is sensible. llvm-svn: 197791	2013-12-20 08:21:30 +00:00
Kay Tiong Khoo	e37d52095e	Stay classy (and legal) LLVM. Remove links to 3rd party SMT solver whose links may not be permanent. llvm-svn: 197713	2013-12-19 18:35:54 +00:00
Kay Tiong Khoo	a570b5adb5	Improved fix for PR17827 (instcombine of shift/and/compare). This change fixes the case of arithmetic shift right - do not attempt to fold that case. This change also relaxes the conditions when attempting to fold the logical shift right and shift left cases. No additional IR-level test cases included at this time. See http://llvm.org/bugs/show_bug.cgi?id=17827 for proofs that these are correct transformations. llvm-svn: 197705	2013-12-19 18:07:17 +00:00
Evgeniy Stepanov	a284e559d7	[dfsan] Simplify code after r197677. llvm-svn: 197679	2013-12-19 14:37:03 +00:00
Evgeniy Stepanov	a9164e9e2a	Add an explicit insert point argument to SplitBlockAndInsertIfThen. Currently SplitBlockAndInsertIfThen requires that branch condition is an Instruction itself, which is very inconvenient, because it is sometimes an Operator, or even a Constant. llvm-svn: 197677	2013-12-19 13:29:56 +00:00
Arnold Schwaighofer	50b8302c55	LoopVectorizer: Don't if-convert constant expressions that can trap A phi node operand or an instruction operand could be a constant expression that can trap (division). Check that we don't vectorize such cases. PR16729 radar://15653590 llvm-svn: 197449	2013-12-17 01:11:01 +00:00
Yi Jiang	6ab044ee35	Enable double to float shrinking optimizations for binary functions like 'fmin/fmax'. Fix radar:15283121 llvm-svn: 197434	2013-12-16 22:42:40 +00:00
Hal Finkel	f59fd7dcb4	Fix a use-after-free error in GlobalOpt CleanupConstantGlobalUsers GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage constant users to be visited. The pointers in this array need to be weak handles because when we delete a constant array, we may also be holding a pointer to one of its elements (or an element of one of its elements if we're dealing with an array of arrays) in the worklist. Fixes PR17347. llvm-svn: 197178	2013-12-12 20:45:24 +00:00
Hal Finkel	26fc4c29c6	Initialize the barrier pass llvm::initializeIPO The barrier pass is a temporary hack, and should go away soon. Nevertheless, if we don't initialize it, then opt will not understand -barrier, and this will break bugpoint (because when it dumps the passes from the default pass manager -barrier will be there). llvm-svn: 197177	2013-12-12 20:45:08 +00:00
Yi Jiang	f92a574246	Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) llvm-svn: 197109	2013-12-12 01:55:04 +00:00
NAKAMURA Takumi	8bc9bfaa5a	Prune redundant dependencies in LLVMBuild.txt. llvm-svn: 196988	2013-12-11 00:30:57 +00:00
Reid Kleckner	30b2a9a59f	[asan] Fix the coverage.cc test broken by r196939 It was failing because ASan was adding all of the following to one function: - dynamic alloca - stack realignment - inline asm This patch avoids making the static alloca dynamic when coverage is used. ASan should probably not be inserting empty inline asm blobs to inhibit duplicate tail elimination. llvm-svn: 196973	2013-12-10 21:49:28 +00:00
NAKAMURA Takumi	396d4d3c7e	Add proper dependencies to LLVMBuild.txt in llvm/lib. I'll prune redundant deps in LLVMBuild.txt, later. llvm-svn: 196881	2013-12-10 05:39:34 +00:00
NAKAMURA Takumi	e3afe2ef62	Whitespaces. llvm-svn: 196880	2013-12-10 05:39:12 +00:00
Justin Bogner	a41a7b3ee5	Transforms: Don't create bad branch weights when folding a switch This avoids creating branch weight metadata of length one when we fold cases into the default of a switch instruction, which was triggering an assert. llvm-svn: 196845	2013-12-10 00:13:41 +00:00
Manman Ren	2e06c8c777	Revert 196544 due to internal bot failures. llvm-svn: 196732	2013-12-08 20:28:33 +00:00
Mark Seaborn	1b3dd3527e	Fix inlining to not lose the "cleanup" clause from landingpads This fixes PR17872. This bug can lead to C++ destructors not being called when they should be, when an exception is thrown. llvm-svn: 196711	2013-12-08 00:51:21 +00:00
Mark Seaborn	ef3dbb93ec	Fix inlining to not produce duplicate landingpad clauses Before this change, inlining one "invoke" into an outer "invoke" call site can lead to the outer landingpad's catch/filter clauses being copied multiple times into the resulting landingpad. This happens: * when the inlined function contains multiple "resume" instructions, because forwardResume() copies the clauses but is called multiple times; * when the inlined function contains a "resume" and a "call", because HandleCallsInBlockInlinedThroughInvoke() copies the clauses but is redundant with forwardResume(). Fix this by deduplicating the code. This problem doesn't lead to any incorrect execution; it's only untidy. This change will make fixing PR17872 a little easier. llvm-svn: 196710	2013-12-08 00:50:58 +00:00
Jakub Staszak	3ab283c157	Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces overall time of LLVM compilation by ~1%. llvm-svn: 196667	2013-12-07 21:20:17 +00:00
Matt Arsenault	bbf18c6958	Fix assert with copy from global through addrspacecast llvm-svn: 196638	2013-12-07 02:58:45 +00:00
Duncan P. N. Exon Smith	ce5f93efd5	Don't use isNullValue to evaluate ConstantExpr ConstantExpr can evaluate to false even when isNullValue gives false. Fixes PR18143. llvm-svn: 196611	2013-12-06 21:48:36 +00:00
Kostya Serebryany	152d48d360	[asan] fix ndebug build with strict warnings (-Wunused-variable) llvm-svn: 196574	2013-12-06 09:26:09 +00:00
Kostya Serebryany	4fb7801b3f	[asan] rewrite asan's stack frame layout Summary: Rewrite asan's stack frame layout. First, most of the stack layout logic is moved into a separte file to make it more testable and (potentially) useful for other projects. Second, make the frames more compact by using adaptive redzones (smaller for small objects, larger for large objects). Third, try to minimized gaps due to large alignments (this is hypothetical since today we don't see many stack vars aligned by more than 32). The frames indeed become more compact, but I'll still need to run more benchmarks before committing, but I am sking for review now to get early feedback. This change will be accompanied by a trivial change in compiler-rt tests to match the new frame sizes. Reviewers: samsonov, dvyukov Reviewed By: samsonov CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2324 llvm-svn: 196568	2013-12-06 09:00:17 +00:00
Yi Jiang	01cfa94212	Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) llvm-svn: 196544	2013-12-05 22:42:50 +00:00
Renato Golin	729a3ae90a	Add #pragma vectorize enable/disable to LLVM The intended behaviour is to force vectorization on the presence of the flag (either turn on or off), and to continue the behaviour as expected in its absence. Tests were added to make sure the all cases are covered in opt. No tests were added in other tools with the assumption that they should use the PassManagerBuilder in the same way. This patch also removes the outdated -late-vectorize flag, which was on by default and not helping much. The pragma metadata is being attached to the same place as other loop metadata, but nothing forbids one from attaching it to a function (to enable #pragma optimize) or basic blocks (to hint the basic-block vectorizers), etc. The logic should be the same all around. Patches to Clang to produce the metadata will be produced after the initial implementation is agreed upon and committed. Patches to other vectorizers (such as SLP and BB) will be added once we're happy with the pass manager changes. llvm-svn: 196537	2013-12-05 21:20:02 +00:00
Michael Gottesman	2bf0173b16	Change std::deque => std::vector. No functionality change. There is no reason to use std::deque here over std::vector. Thus given the performance differences inbetween the two it makes sense to change deque to vector. llvm-svn: 196524	2013-12-05 18:42:12 +00:00
Rafael Espindola	cdbde3aacc	Fix non-deterministic behavior. We use CSEBlocks to initialize a worklist: SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end()); so it must have a deterministic order. llvm-svn: 196520	2013-12-05 18:28:01 +00:00
Arnold Schwaighofer	7ee53cac80	SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use We were creating external uses for scalar values in MustGather entries that also had a ScalarToTreeEntry (they also are present in a vectorized tuple). This meant we would keep a value 'alive' as a scalar and vectorized causing havoc. This is not necessary because when we create a MustGather vector we explicitly create external uses entries for the insertelement instructions of the MustGather vector elements. Fixes PR18129. radar://15582184 llvm-svn: 196508	2013-12-05 15:14:40 +00:00
Kostya Serebryany	2460c3fc73	[tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations) llvm-svn: 196507	2013-12-05 15:03:02 +00:00
Alp Toker	f907b891da	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Yuchen Wu	c15bf89122	llvm-cov: Replace size() with empty() in bool check. llvm-svn: 196400	2013-12-04 19:18:23 +00:00
Daniel Jasper	87a24d5c27	Un-revert r196358: "llvm-cov: Added support for function checksums." And add the proper fix. llvm-svn: 196367	2013-12-04 08:57:17 +00:00
Daniel Jasper	c176b5d1d6	Revert r196358: "llvm-cov: Added support for function checksums." This currently breaks clang/test/CodeGen/code-coverage.c. The root cause is that the newly introduced access to Funcs[j] is out of bounds. llvm-svn: 196365	2013-12-04 08:23:33 +00:00
Yuchen Wu	06655f3570	llvm-cov: Added support for function checksums. The function checksums are hashed from the concatenation of the function name and line number. llvm-svn: 196358	2013-12-04 06:00:17 +00:00
Yunzhong Gao	9163e8bce6	Teach the internalize pass to skip dllexported symbols because they could be referenced in a way that even the linker does not see. Differential Revision: http://llvm-reviews.chandlerc.com/D2280 llvm-svn: 196300	2013-12-03 18:05:14 +00:00
Kay Tiong Khoo	d7b00cac10	Use local variable for repeated use rather than 'get' method. No functional change intended. llvm-svn: 196164	2013-12-02 22:23:32 +00:00
Kay Tiong Khoo	64b732005f	Move variables to where they are used and give them better names. No functional change intended. llvm-svn: 196163	2013-12-02 22:20:40 +00:00
Kay Tiong Khoo	564560f911	Rename variables to be consistent (CST -> Cst). No functional change intended. llvm-svn: 196161	2013-12-02 22:11:56 +00:00
Mark Seaborn	d91fa22b06	InlineFunction.cpp: Remove a return value that is always false Remove some associated dead code. This cleanup is associated with PR17872. llvm-svn: 196147	2013-12-02 20:50:59 +00:00
Kay Tiong Khoo	5389f74655	Conservative fix for PR17827 - don't optimize a shift + and + compare sequence where the shift is logical unless the comparison is unsigned llvm-svn: 196129	2013-12-02 18:43:59 +00:00
Kostya Serebryany	08b9cf56be	[tsan] fix instrumentation of vector vptr updates (https://code.google.com/p/thread-sanitizer/issues/detail?id=43 ) llvm-svn: 196079	2013-12-02 08:07:15 +00:00
Bill Wendling	cbcb02c35a	Use accessor methods instead. llvm-svn: 196006	2013-12-01 03:40:42 +00:00
Bill Wendling	2798f1ef58	Use 'unsigned char' to get this past gcc error message: error: invalid conversion from 'unsigned char' to '{anonymous}::Sequence' llvm-svn: 196004	2013-12-01 03:36:07 +00:00
Stephen Canon	c454964c47	Rein in overzealous InstCombine of fptrunc(OP(fpextend, fpextend)). llvm-svn: 195934	2013-11-28 21:38:05 +00:00
Nadav Rotem	b0082d246a	PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE. llvm-svn: 195791	2013-11-26 22:24:25 +00:00
Arnold Schwaighofer	a2c8e008d2	LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary In signed arithmetic we could end up with an i64 trip count for an i32 phi. Because it is signed arithmetic we know that this is only defined if the i32 does not wrap. It is therefore safe to truncate the i64 trip count to a i32 value. Fixes PR18049. llvm-svn: 195787	2013-11-26 22:11:23 +00:00
Diego Novillo	c0dd1037c8	Refactor some code in SampleProfile.cpp I'm adding new functionality in the sample profiler. This will require more data to be kept around for each function, so I moved the structure SampleProfile that we keep for each function into a separate class. There are no functional changes in this patch. It simply provides a new home where to place all the new data that I need to propagate weights through edges. There are some other name and minor edits throughout. llvm-svn: 195780	2013-11-26 20:37:33 +00:00
Nadav Rotem	f9f8482e3a	PR18060 - When we RAUW values with ExtractElement instructions in some cases we generate PHI nodes with multiple entries from the same basic block but with different values. Enabling CSE on ExtractElement instructions make sure that all of the RAUWed instructions are the same. llvm-svn: 195773	2013-11-26 17:29:19 +00:00
Stepan Dyatkovskiy	abb8505dc5	PR17925 bugfix. Short description. This issue is about case of treating pointers as integers. We treat pointers as different if they references different address space. At the same time, we treat pointers equal to integers (with machine address width). It was a point of false-positive. Consider next case on 32bit machine: void foo0(i32 addrespace(1)* %p) void foo1(i32 addrespace(2)* %p) void foo2(i32 %p) foo0 != foo1, while foo1 == foo2 and foo0 == foo2. As you can see it breaks transitivity. That means that result depends on order of how functions are presented in module. Next order causes merging of foo0 and foo1: foo2, foo0, foo1 First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be merged with foo2. Depending on order, things could be merged we don't expect to. The fix: Forbid to treat any pointer as integer, except for those, who belong to address space 0. llvm-svn: 195769	2013-11-26 16:11:03 +00:00
Chandler Carruth	6378cf539f	[PM] Split the CallGraph out from the ModulePass which creates the CallGraph. This makes the CallGraph a totally generic analysis object that is the container for the graph data structure and the primary interface for querying and manipulating it. The pass logic is separated into its own class. For compatibility reasons, the pass provides wrapper methods for most of the methods on CallGraph -- they all just forward. This will allow the new pass manager infrastructure to provide its own analysis pass that constructs the same CallGraph object and makes it available. The idea is that in the new pass manager, the analysis pass's 'run' method returns a concrete analysis 'result'. Here, that result is a 'CallGraph'. The 'run' method will typically do only minimal work, deferring much of the work into the implementation of the result object in order to be lazy about computing things, but when (like DomTree) there is some up-front computation, the analysis does it prior to handing the result back to the querying pass. I know some of this is fairly ugly. I'm happy to change it around if folks can suggest a cleaner interim state, but there is going to be some amount of unavoidable ugliness during the transition period. The good thing is that this is very limited and will naturally go away when the old pass infrastructure goes away. It won't hang around to bother us later. Next up is the initial new-PM-style call graph analysis. =] llvm-svn: 195722	2013-11-26 04:19:30 +00:00
Chandler Carruth	57458517ef	Migrate metadata information from scalar to vector instructions during SLP vectorization. Based on the code in BBVectorizer. Fixes PR17741. Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my driving of clang-format. =] llvm-svn: 195528	2013-11-23 00:48:34 +00:00
Yuchen Wu	c87ca32163	llvm-cov: Split entry blocks in GCNOProfiling.cpp. gcov expects every function to contain an entry block that unconditionally branches into the next block. clang does not implement basic blocks in this manner, so gcov did not output correct branch info if the entry block branched to multiple blocks. This change splits every function's entry block into an empty block and a block with the rest of the instructions. The instrumentation code will take care of the rest. llvm-svn: 195513	2013-11-22 23:07:45 +00:00
Manman Ren	cb14bbcc48	Debug Info: move StripDebugInfo from StripSymbols.cpp to DebugInfo.cpp. We can share the implementation between StripSymbols and dropping debug info for metadata versions that do not match. Also update the comments to match the implementation. A follow-on patch will drop the "Debug Info Version" module flag in StripDebugInfo. llvm-svn: 195505	2013-11-22 22:06:31 +00:00
Matt Arsenault	6ea0aade26	StructurizeCFG: Fix verification failure with some loops. If the beginning of the loop was also the entry block of the function, branches were inserted to the entry block which isn't allowed. If this occurs, create a new dummy function entry block that branches to the start of the loop. llvm-svn: 195493	2013-11-22 19:24:39 +00:00
Matt Arsenault	9fb6e0ba58	StructurizeCFG: Fix inverting a branch on an argument llvm-svn: 195492	2013-11-22 19:24:37 +00:00
Rafael Espindola	6597992c69	Add a fixed version of r195470 back. The fix is simply to use CurI instead of I when handling aliases to avoid accessing a invalid iterator. original message: Convert linkonce* to weak* instead of strong. Also refactor the logic into a helper function. This is an important improve on mingw where the linker complains about mixed weak and strong symbols. Converting to weak ensures that the symbol is not dropped, but keeps in a comdat, making the linker happy. llvm-svn: 195477	2013-11-22 17:58:12 +00:00
Rafael Espindola	77aa674cc4	Revert "Convert linkonce* to weak* instead of strong." This reverts commit r195470. Debugging failure in some bots. llvm-svn: 195472	2013-11-22 17:09:34 +00:00
Richard Sandiford	8ee1b77de3	Add a Scalarizer pass. llvm-svn: 195471	2013-11-22 16:58:05 +00:00
Rafael Espindola	5574032575	Convert linkonce* to weak* instead of strong. Also refactor the logic into a helper function. This is an important improvement on mingw where the linker complains about mixed weak and strong symbols. Converting to weak ensures that the symbol is not dropped, but keeps in a comdat, making the linker happy. llvm-svn: 195470	2013-11-22 16:14:30 +00:00
Arnold Schwaighofer	1756e1ea92	SLPVectorizer: Fix whitespace errors. llvm-svn: 195468	2013-11-22 15:47:17 +00:00
Yi Jiang	79a2b0a6d1	SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses. llvm-svn: 195406	2013-11-22 01:57:02 +00:00
Peter Collingbourne	0be79e1ade	Introduce two command-line flags for the instrumentation pass to control whether the labels of pointers should be ignored in load and store instructions The new command line flags are -dfsan-ignore-pointer-label-on-store and -dfsan-ignore-pointer-label-on-load. Their default value matches the current labelling scheme. Additionally, the function __dfsan_union_load is marked as readonly. Patch by Lorenzo Martignoni! Differential Revision: http://llvm-reviews.chandlerc.com/D2187 llvm-svn: 195382	2013-11-21 23:20:54 +00:00
Evgeniy Stepanov	cb5bdffc4e	[msan] Propagate condition origin in select instruction. llvm-svn: 195349	2013-11-21 12:00:24 +00:00
Yuchen Wu	2a9d96992d	llvm-cov: Don't assume FileChecksum was generated. For cases where emitProfileArcs() was called but emitProfileNotes() was not, set the CfgChecksum to 0. llvm-svn: 195311	2013-11-21 04:53:39 +00:00
Yuchen Wu	664dc7678b	llvm-cov: Fixed some bugs related to file checksum. Added call to update CfgChecksum. Made FileChecksum a vector, separate for each source file. llvm-svn: 195309	2013-11-21 04:01:05 +00:00
Yuchen Wu	babe749125	llvm-cov: Added file checksum to gcno and gcda files. Instead of permanently outputting "MVLL" as the file checksum, clang will create gcno and gcda checksums by hashing the destination block numbers of every arc. This allows for llvm-cov to check if the two gcov files are synchronized. Regenerated the test files so they contain the checksum. Also added negative test to ensure error when the checksums don't match. llvm-svn: 195191	2013-11-20 04:15:05 +00:00
Arnold Schwaighofer	8bc4a0ba14	SLPVectorizer: Fix stale for Value pointer array We are slicing an array of Value pointers and process those slices in a loop. The problem is that we might invalidate a later slice by vectorizing a former slice. Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can tell. The test case will only fail when running with libgmalloc. radar://15498655 llvm-svn: 195162	2013-11-19 22:20:20 +00:00
Arnold Schwaighofer	5f7c48ebff	SLPVectorizer: Fix whitespace errors llvm-svn: 195161	2013-11-19 22:20:18 +00:00
Chandler Carruth	a126200665	Fix an issue where SROA computed different results based on the relative order of slices of the alloca which have exactly the same size and other properties. This was found by a perniciously unstable sort implementation used to flush out buggy uses of the algorithm. The fundamental idea is that findCommonType should return the best common type it can find across all of the slices in the range. There were two bugs here previously: 1) We would accept an integer type smaller than a byte-width multiple, and if there were different bit-width integer types, we would accept the first one. This caused an actual failure in the testcase updated here when the sort order changed. 2) If we found a bad combination of types or a non-load, non-store use before an integer typed load or store we would bail, but if we found the integere typed load or store, we would use it. The correct behavior is to always use an integer typed operation which covers the partition if one exists. While a clever debugging sort algorithm found problem #1 in our existing test cases, I have no useful test case ideas for #2. I spotted in by inspection when looking at this code. llvm-svn: 195118	2013-11-19 09:03:18 +00:00
Michael Ilseman	d930c19d20	Add support for software expansion of 64-bit integer division instructions. Patch by Dmitri Shtilman! llvm-svn: 195116	2013-11-19 06:54:19 +00:00
Adrian Prantl	8e10fdbc0f	Debug info: Let LowerDbgDeclare perfom the dbg.declare -> dbg.value lowering only for load/stores to scalar allocas. The resulting values confuse the backend and don't add anything because we can describe array-allocas with a dbg.declare intrinsic just fine. rdar://problem/15464571 llvm-svn: 195052	2013-11-18 23:04:38 +00:00
Alexey Samsonov	a788b940f7	[ASan] Fix PR17867 - make sure ASan doesn't crash if use-after-scope and use-after-return are combined. llvm-svn: 195014	2013-11-18 14:53:55 +00:00
Arnold Schwaighofer	b72cb4ec49	LoopVectorizer: Extend the induction variable to a larger type In some case the loop exit count computation can overflow. Extend the type to prevent most of those cases. The problem is loops like: int main () { int a = 1; char b = 0; lbl: a &= 4; b--; if (b) goto lbl; return a; } The backedge count is 255. The induction variable type is i8. If we add one to 255 to get the exit count we overflow to zero. To work around this issue we extend the type of the induction variable to i32 in the case of i8 and i16. PR17532 llvm-svn: 195008	2013-11-18 13:14:32 +00:00
NAKAMURA Takumi	f9c8339a4e	Utils/LoopUnroll.cpp: Tweak (StringRef)OldName to be valid until it is used, since r194601. eraseFromParent() invalidates OldName. llvm-svn: 194970	2013-11-17 18:05:34 +00:00
Hal Finkel	29aeb20518	Add a loop rerolling flag to the PassManagerBuilder This adds a boolean member variable to the PassManagerBuilder to control loop rerolling (just like we have for unrolling and the various vectorization options). This is necessary for control by the frontend. Loop rerolling remains disabled by default at all optimization levels. llvm-svn: 194966	2013-11-17 16:02:50 +00:00
Hal Finkel	66cd3f1ba3	Add the cold attribute to error-reporting call sites Generally speaking, control flow paths with error reporting calls are cold. So far, error reporting calls are calls to perror and calls to fprintf, fwrite, etc. with stderr as the stream. This can be extended in the future. The primary motivation is to improve block placement (the cold attribute affects the static branch prediction heuristics). llvm-svn: 194943	2013-11-17 02:06:35 +00:00
Hal Finkel	67107ea1af	Fix ndebug-build unused variable in loop rerolling llvm-svn: 194941	2013-11-17 01:21:54 +00:00
Hal Finkel	bf45efde2d	Add a loop rerolling pass This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The transformation aims to take loops like this: for (int i = 0; i < 3200; i += 5) { a[i] += alpha * b[i]; a[i + 1] += alpha * b[i + 1]; a[i + 2] += alpha * b[i + 2]; a[i + 3] += alpha * b[i + 3]; a[i + 4] += alpha * b[i + 4]; } and turn them into this: for (int i = 0; i < 3200; ++i) { a[i] += alpha * b[i]; } and loops like this: for (int i = 0; i < 500; ++i) { x[3i] = foo(0); x[3i+1] = foo(0); x[3*i+2] = foo(0); } and turn them into this: for (int i = 0; i < 1500; ++i) { x[i] = foo(0); } There are two motivations for this transformation: 1. Code-size reduction (especially relevant, obviously, when compiling for code size). 2. Providing greater choice to the loop vectorizer (and generic unroller) to choose the unrolling factor (and a better ability to vectorize). The loop vectorizer can take vector lengths and register pressure into account when choosing an unrolling factor, for example, and a pre-unrolled loop limits that choice. This is especially problematic if the manual unrolling was optimized for a machine different from the current target. The current implementation is limited to single basic-block loops only. The rerolling recognition should work regardless of how the loop iterations are intermixed within the loop body (subject to dependency and side-effect constraints), but the significant restriction is that the order of the instructions in each iteration must be identical. This seems sufficient to capture all current use cases. This pass is not currently enabled by default at any optimization level. llvm-svn: 194939	2013-11-16 23:59:05 +00:00
Hal Finkel	12100bf7e8	Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls: (fptrunc (sqrt (fpext x))) -> (sqrtf x) but does not apply the same optimization to llvm.sqrt. This is a problem because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in fast-math mode, and because this optimization is being applied to sqrt and not applied to llvm.sqrt, sometimes the fast-math code is slower. This change makes InstCombine apply this optimization to llvm.sqrt as well. This fixes the specific problem in PR17758, although the same underlying issue (optimizations applied to libcalls are not applied to intrinsics) exists for other optimizations in SimplifyLibCalls. llvm-svn: 194935	2013-11-16 21:29:08 +00:00
Benjamin Kramer	03f3e248eb	InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs. This is common in bitfield code. llvm-svn: 194925	2013-11-16 16:00:48 +00:00
Arnold Schwaighofer	dbb7b87d7a	LoopVectorizer: Use abi alignment for accesses with no alignment When we vectorize a scalar access with no alignment specified, we have to set the target's abi alignment of the scalar access on the vectorized access. Using the same alignment of zero would be wrong because most targets will have a bigger abi alignment for vector types. This probably fixes PR17878. llvm-svn: 194876	2013-11-15 23:09:33 +00:00
Manman Ren	bc37658a7f	ArgumentPromotion: correctly transfer TBAA tags and alignments. We used to use std::map<IndicesVector, LoadInst> for OriginalLoads, and when we try to promote two arguments, they will both write to OriginalLoads causing created loads for the two arguments to have the same original load. And the same tbaa tag and alignment will be put to the created loads for the two arguments. The fix is to use std::map<std::pair<Argument, IndicesVector>, LoadInst*> for OriginalLoads, so each Argument will write to different parts of the map. PR17906 llvm-svn: 194846	2013-11-15 20:41:15 +00:00
Kostya Serebryany	0604c62d7b	[asan] use GlobalValue::PrivateLinkage for coverage guard to save quite a bit of code size llvm-svn: 194800	2013-11-15 09:52:05 +00:00
Bob Wilson	da4147c743	Reapply "[asan] Poor man's coverage that works with ASan" I was able to successfully run a bootstrapped LTO build of clang with r194701, so this change does not seem to be the cause of our failing buildbots. llvm-svn: 194789	2013-11-15 07:16:09 +00:00
Matt Arsenault	a9e95abcbf	Add instcombine visitor for addrspacecast llvm-svn: 194786	2013-11-15 05:45:08 +00:00
Bob Wilson	ae73587c4b	Revert "[asan] Poor man's coverage that works with ASan" This reverts commit 194701. Apple's bootstrapped LTO builds have been failing, and this change (along with compiler-rt 194702-194704) is the only thing on the blamelist. I will either reappy these changes or help debug the problem, depending on whether this fixes the buildbots. llvm-svn: 194780	2013-11-15 03:28:22 +00:00
Kostya Serebryany	6da3f74061	[asan] Poor man's coverage that works with ASan llvm-svn: 194701	2013-11-14 13:27:41 +00:00
Evgeniy Stepanov	585813e33d	[msan] Fast path optimization for wrap-indirect-calls feature of MemorySanitizer. Indirect call wrapping helps MSanDR (dynamic instrumentation companion tool for MSan) to catch all cases where execution leaves a compiler-instrumented module by allowing the tool to rewrite targets of indirect calls. This change is an optimization that skips wrapping for calls when target is inside the current module. This relies on the linker providing symbols at the begin and end of the module code (or code + data, does not really matter). Gold linker provides such symbols by default. GNU (BFD) linker needs a link flag: -Wl,--defsym=__executable_start=0. More info: https://code.google.com/p/memory-sanitizer/wiki/MSanDR#Native_exec llvm-svn: 194697	2013-11-14 12:29:04 +00:00
Jakub Staszak	86a7492f0d	Use StringRef instead of std::string llvm-svn: 194601	2013-11-13 20:09:11 +00:00
Alexey Samsonov	aa19c0a1c3	Fix -Wdelete-non-virtual-dtor warnings by making SampleProfile methods non-virtual llvm-svn: 194568	2013-11-13 13:09:39 +00:00
Diego Novillo	8d6568b56b	SampleProfileLoader pass. Initial setup. This adds a new scalar pass that reads a file with samples generated by 'perf' during runtime. The samples read from the profile are incorporated and emmited as IR metadata reflecting that profile. The profile file is assumed to have been generated by an external profile source. The profile information is converted into IR metadata, which is later used by the analysis routines to estimate block frequencies, edge weights and other related data. External profile information files have no fixed format, each profiler is free to define its own. This includes both the on-disk representation of the profile and the kind of profile information stored in the file. A common kind of profile is based on sampling (e.g., perf), which essentially counts how many times each line of the program has been executed during the run. The SampleProfileLoader pass is organized as a scalar transformation. On startup, it reads the file given in -sample-profile-file to determine what kind of profile it contains. This file is assumed to contain profile information for the whole application. The profile data in the file is read and incorporated into the internal state of the corresponding profiler. To facilitate testing, I've organized the profilers to support two file formats: text and native. The native format is whatever on-disk representation the profiler wants to support, I think this will mostly be bitcode files, but it could be anything the profiler wants to support. To do this, every profiler must implement the SampleProfile::loadNative() function. The text format is mostly meant for debugging. Records are separated by newlines, but each profiler is free to interpret records as it sees fit. Profilers must implement the SampleProfile::loadText() function. Finally, the pass will call SampleProfile::emitAnnotations() for each function in the current translation unit. This function needs to translate the loaded profile into IR metadata, which the analyzer will later be able to use. This patch implements the first steps towards the above design. I've implemented a sample-based flat profiler. The format of the profile is fairly simplistic. Each sampled function contains a list of relative line locations (from the start of the function) together with a count representing how many samples were collected at that line during execution. I generate this profile using perf and a separate converter tool. Currently, I have only implemented a text format for these profiles. I am interested in initial feedback to the whole approach before I send the other parts of the implementation for review. This patch implements: - The SampleProfileLoader pass. - The base ExternalProfile class with the core interface. - A SampleProfile sub-class using the above interface. The profiler generates branch weight metadata on every branch instructions that matches the profiles. - A text loader class to assist the implementation of SampleProfile::loadText(). - Basic unit tests for the pass. Additionally, the patch uses profile information to compute branch weights based on instruction samples. This patch converts instruction samples into branch weights. It does a fairly simplistic conversion: Given a multi-way branch instruction, it calculates the weight of each branch based on the maximum sample count gathered from each target basic block. Note that this assignment of branch weights is somewhat lossy and can be misleading. If a basic block has more than one incoming branch, all the incoming branches will get the same weight. In reality, it may be that only one of them is the most heavily taken branch. I will adjust this assignment in subsequent patches. llvm-svn: 194566	2013-11-13 12:22:21 +00:00
Nadav Rotem	ea186b9515	Update the docs to match the function name. llvm-svn: 194537	2013-11-13 01:12:01 +00:00
Nadav Rotem	0ed2fdb5af	Fold (iszero(A&K1) \| iszero(A&K2)) -> (A&(K1\|K2)) != (K1\|K2) if we know that K1 and K2 are 'one-hot' (only one bit is on). llvm-svn: 194525	2013-11-12 22:38:59 +00:00
Nadav Rotem	53d32211b7	FoldBranchToCommonDest merges branches into a single branch with or/and of the condition. It has a heuristics for estimating when some of the dependencies are processed by out-of-order processors. This patch adds another rule to the heuristics that says that if the "BonusInstruction" that we speculatively execute is used by the condition of the second branch then it is okay to hoist it. This change exposes more opportunities for other passes to transform the code. It does not matter that much that we if-convert the code because the selectiondag builder splits or/and branches into multiple branches when profitable. llvm-svn: 194524	2013-11-12 22:37:16 +00:00
Rafael Espindola	dd8757abbc	Corruptly merge constants with explicit and implicit alignments. Constant merge can merge a constant with implicit alignment with one that has explicit alignment. Before this change it was assuming that the explicit alignment was higher than the implicit one, causing the result to be under aligned in some cases. Fixes pr17815. Patch by Chris Smowton! llvm-svn: 194506	2013-11-12 20:21:43 +00:00
Benjamin Kramer	7c30260ab3	SimplifyCFG: Use existing constant folding logic when forming switch tables. Both simpler and more powerful than the hand-rolled folding logic. llvm-svn: 194475	2013-11-12 12:24:36 +00:00
Shuxin Yang	f1ec34bdfd	Correct a glitch in r194424 which may invalidate iterator. llvm-svn: 194457	2013-11-12 08:33:03 +00:00
Yuchen Wu	062f24c973	llvm-cov: Added call to update run/program counts. Also updated test files that were generated from this change. llvm-svn: 194453	2013-11-12 04:59:08 +00:00
Shuxin Yang	3168ab3376	Fix PR17952. The symptom is that an assertion is triggered. The assertion was added by me to detect the situation when value is propagated from dead blocks. (We can certainly get rid of assertion; it is safe to do so, because propagating value from dead block to alive join node is certainly ok.) The root cause of this bug is : edge-splitting is conducted on the fly, the edge being split could be a dead edge, therefore the block that split the critial edge needs to be flagged "dead" as well. There are 3 ways to fix this bug: 1) Get rid of the assertion as I mentioned eariler 2) When an dead edge is split, flag the inserted block "dead". 3) proactively split the critical edges connecting dead and live blocks when new dead blocks are revealed. This fix go for 3) with additional 2 LOC. Testing case was added by Rafael the other day. llvm-svn: 194424	2013-11-11 22:00:23 +00:00
Renato Golin	3f67a7de36	Move debug message in vectorizer No functional change, just better reporting. llvm-svn: 194388	2013-11-11 16:27:35 +00:00
Evgeniy Stepanov	560e089355	[msan] Propagate origin for insertvalue, extractvalue. llvm-svn: 194374	2013-11-11 13:37:10 +00:00
Bill Wendling	fed6c220ec	Revert "Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308." This causes PR17852. This reverts commit d93e8a06b2ca09ab18f390cd514b7443e2e571f7. Conflicts: test/Transforms/GVN/cond_br2.ll llvm-svn: 194348	2013-11-10 07:34:34 +00:00
Matt Arsenault	c900303e2f	Use type form of getIntPtrType. This should be inconsequential and is work towards removing the default address space arguments. llvm-svn: 194347	2013-11-10 04:46:57 +00:00
Nadav Rotem	5ba1c6ced8	SimplifyCFG has a heuristics for out-of-order processors that decides when it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite. llvm-svn: 194346	2013-11-10 04:13:31 +00:00
Matt Arsenault	5bcefabcda	Teach MergeFunctions about address spaces llvm-svn: 194342	2013-11-10 01:44:37 +00:00
Hal Finkel	1a642aef37	Remove dead code from LoopUnswitch LoopUnswitch's code simplification routine has logic to convert conditional branches into unconditional branches, after unswitching makes the condition constant, and then remove any blocks that renders dead. Unfortunately, this code is dead, currently broken, and furthermore, has never been alive (at least as far back at 2006). No functionality change intended. llvm-svn: 194277	2013-11-08 19:58:21 +00:00
Michael Gottesman	24b2f6fdda	[objc-arc] Convert the one directional retain/release relation assert to a conditional check + fail. Due to the previously added overflow checks, we can have a retain/release relation that is one directional. This occurs specifically when we run into an additive overflow causing us to drop state in only one direction. If that occurs, we should bail and not optimize that retain/release instead of asserting. Apologies for the size of the testcase. It is necessary to cause the additive cfg overflow to trigger. rdar://15377890 llvm-svn: 194083	2013-11-05 16:02:40 +00:00
Hal Finkel	081eaef6fa	Add a runtime unrolling parameter to the LoopUnroll pass constructor As with the other loop unrolling parameters (the unrolling threshold, partial unrolling, etc.) runtime unrolling can now also be controlled via the constructor. This will be necessary for moving non-trivial unrolling late in the pass manager (after loop vectorization). No functionality change intended. llvm-svn: 194027	2013-11-05 00:08:03 +00:00
Shuxin Yang	d1382b6c31	Remove dead code llvm-svn: 194017	2013-11-04 21:44:01 +00:00
Benjamin Kramer	9e7f7c7fdb	SLPVectorizer: Use properlyDominates to satisfy the irreflexivity of a strict weak ordering. STL debug mode checks this. llvm-svn: 194015	2013-11-04 21:34:55 +00:00
Matt Arsenault	243140f2fd	Scalarize select vector arguments when extracted. When the elements are extracted from a select on vectors or a vector select, do the select on the extracted scalars from the input if there is only one use. llvm-svn: 194013	2013-11-04 20:36:06 +00:00
Benjamin Kramer	191ba00b83	SLPVectorizer: Add a missing pair of parens. No functionality change. llvm-svn: 193958	2013-11-03 12:54:32 +00:00
Benjamin Kramer	91e8f3c348	SLPVectorizer: When CSEing generated gathers only scan blocks containing them. Instead of doing a RPO traversal of the whole function remember the blocks containing gathers (typically <= 2) and scan them in dominator-first order. The actual CSE is still quadratic, but I'm not confident that adding a scoped hash table here is worth it as we're only looking at the generated instructions and not arbitrary code. llvm-svn: 193956	2013-11-03 12:27:52 +00:00
David Majnemer	120f4a06fd	Revert "Inliner: Handle readonly attribute per argument when adding memcpy" This reverts commit r193356, it caused PR17781. A reduced test case covering this regression has been added to the test suite. llvm-svn: 193955	2013-11-03 12:22:13 +00:00
David Majnemer	927df85de0	Spell "Actual" correctly llvm-svn: 193954	2013-11-03 11:09:39 +00:00
Bob Wilson	d8d92d90fa	Convert calls to __sinpi and __cospi into __sincospi_stret This adds an SimplifyLibCalls case which converts the special __sinpi and __cospi (float & double variants) into a __sincospi_stret where appropriate to remove duplicated work. Patch by Tim Northover llvm-svn: 193943	2013-11-03 06:48:38 +00:00
Benjamin Kramer	089c1e4f6d	SLPVectorizer: Remove duplicated function. llvm-svn: 193927	2013-11-02 14:46:27 +00:00
Benjamin Kramer	568a1cd9df	LoopVectorize: Remove quadratic behavior the local CSE. Doing this with a hash map doesn't change behavior and avoids calling isIdenticalTo O(n^2) times. This should probably eventually move into a utility class shared with EarlyCSE and the limited CSE in the SLPVectorizer. llvm-svn: 193926	2013-11-02 13:39:00 +00:00
Arnold Schwaighofer	d0789cdffe	LoopVectorizer: Move cse code into its own function llvm-svn: 193895	2013-11-01 23:28:54 +00:00
Arnold Schwaighofer	a846a7f8f0	LoopVectorizer: Perform redundancy elimination on induction variables When the loop vectorizer was part of the SCC inliner pass manager gvn would run after the loop vectorizer followed by instcombine. This way redundancy (multiple uses) were removed and instcombine could perform scalarization on the induction variables. Having moved the loop vectorizer to later we no longer run any form of redundancy elimination before we perform instcombine. This caused vectorized induction variables to survive that did not before. On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops. This should also help lpbench and paq8p. I ran a Release (without Asserts) build over the test-suite and did not see any negative impact on compile time. radar://15339680 llvm-svn: 193891	2013-11-01 22:18:19 +00:00
Benjamin Kramer	1fbcdca9e3	LoopVectorize: Look for consecutive acces in GEPs with trailing zero indices If we have a pointer to a single-element struct we can still build wide loads and stores to it (if there is no padding). llvm-svn: 193860	2013-11-01 14:09:50 +00:00
Arnold Schwaighofer	70a4665f55	LoopVectorizer: If dependency checks fail try runtime checks When a dependence check fails we can still try to vectorize loops with runtime array bounds checks. This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the scalar performance on a corei7-avx. radar://15339680 llvm-svn: 193853	2013-11-01 03:05:07 +00:00
Arnold Schwaighofer	1ca922e296	LoopVectorizer: Clear all member data structures in RuntimeCheck.reset() Clear all data structures when resetting the RuntimeCheck data structure. No test case. This was exposed by an upcomming change. llvm-svn: 193852	2013-11-01 03:05:04 +00:00
Manman Ren	87a2adc7fe	Do not convert "call asm" to "invoke asm" in Inliner. Given that backend does not handle "invoke asm" correctly ("invoke asm" will be handled by SelectionDAGBuilder::visitInlineAsm, which does not have the right setup for LPadToCallSiteMap) and we already made the assumption that inline asm does not throw in InstCombiner::visitCallSite, we are going to make the same assumption in Inliner to make sure we don't convert "call asm" to "invoke asm". If it becomes necessary to add support for "invoke asm" later on, we will need to modify the backend as well as remove the assumptions that inline asm does not throw. Fix rdar://15317907 llvm-svn: 193808	2013-10-31 21:56:03 +00:00
Rafael Espindola	282a47037b	Use LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN instead of the "dso list". There are two ways one could implement hiding of linkonce_odr symbols in LTO: * LLVM tells the linker which symbols can be hidden if not used from native files. * The linker tells LLVM which symbols are not used from other object files, but will be put in the dso symbol table if present. GOLD's API is the second option. It was implemented almost 1:1 in llvm by passing the list down to internalize. LLVM already had partial support for the first option. It is also very similar to how ld64 handles hiding these symbols when not doing LTO. This patch then * removes the APIs for the DSO list. * marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr global values and other linkonce_odr whose address is not used. * makes the gold plugin responsible for handling the API mismatch. llvm-svn: 193800	2013-10-31 20:51:58 +00:00
Rafael Espindola	6554e5a94d	Merge CallGraph and BasicCallGraph. llvm-svn: 193734	2013-10-31 03:03:55 +00:00
Matt Arsenault	38b8ecf378	Teach scalarrepl about address spaces llvm-svn: 193720	2013-10-30 22:54:58 +00:00
Matt Arsenault	614ea99da7	Fix GVN creating bitcast between address spaces llvm-svn: 193710	2013-10-30 19:05:41 +00:00
Arnold Schwaighofer	77af0f6e82	ARM cost model: Account for zero cost scalar SROA instructions By vectorizing a series of srl, or, ... instructions we have obfuscated the intention so much that the backend does not know how to fold this code away. radar://15336950 llvm-svn: 193573	2013-10-29 01:33:53 +00:00
Arnold Schwaighofer	86252451c4	SLPVectorizer: Use vector type for vectorized memory operations No test case, because with the current cost model we don't see a difference. An upcoming ARM memory cost model change will expose and test this bug. radar://15332579 llvm-svn: 193572	2013-10-29 01:33:50 +00:00
Shuxin Yang	2e1890e18b	Revert r193251 : Use address-taken to disambiguate global variable and indirect memops. llvm-svn: 193489	2013-10-27 03:08:44 +00:00
Wan Xiaofei	be640b28c0	Quick look-up for block in loop. This patch implements quick look-up for block in loop by maintaining a hash set for blocks. It improves the efficiency of loop analysis a lot, the biggest improvement could be 5-6%(458.sjeng). Below are the compilation time for our benchmark in llc before & after the patch. Benchmark llc - trunk llc - patched 401.bzip2 0.339081 100.00% 0.329657 102.86% 403.gcc 19.853966 100.00% 19.605466 101.27% 429.mcf 0.049823 100.00% 0.048451 102.83% 433.milc 0.514898 100.00% 0.510217 100.92% 444.namd 1.109328 100.00% 1.103481 100.53% 445.gobmk 4.988028 100.00% 4.929114 101.20% 456.hmmer 0.843871 100.00% 0.825865 102.18% 458.sjeng 0.754238 100.00% 0.714095 105.62% 464.h264ref 2.9668 100.00% 2.90612 102.09% 471.omnetpp 4.556533 100.00% 4.511886 100.99% bitmnp01 0.038168 100.00% 0.0357 106.91% idctrn01 0.037745 100.00% 0.037332 101.11% libquake2 3.78689 100.00% 3.76209 100.66% libquake_ 2.251525 100.00% 2.234104 100.78% linpack 0.033159 100.00% 0.032788 101.13% matrix01 0.045319 100.00% 0.043497 104.19% nbench 0.333161 100.00% 0.329799 101.02% tblook01 0.017863 100.00% 0.017666 101.12% ttsprk01 0.054337 100.00% 0.053057 102.41% Reviewer : Andrew Trick <atrick@apple.com>, Hal Finkel <hfinkel@anl.gov> Approver : Andrew Trick <atrick@apple.com> Test : Pass make check-all & llvm test-suite llvm-svn: 193460	2013-10-26 03:08:02 +00:00
Andrew Trick	57243da70f	Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop. Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu (affecting trunk and 3.3) When SCEV expands a recurrence outside of a loop it attempts to scale by the stride of the recurrence. Chained recurrences don't work that way. We could compute binomial coefficients, but would hve to guarantee that the chained AddRec's are in a perfectly reduced form. llvm-svn: 193438	2013-10-25 21:35:56 +00:00
Rafael Espindola	7749d7ccc7	Handle calls and invokes in GlobalStatus. This patch teaches GlobalStatus to analyze a call that uses the global value as a callee, not as an argument. With this change internalize call handle the common use of linkonce_odr functions. This reduces the number of linkonce_odr functions in a LTO build of clang (checked with the emit-llvm gold plugin option) from 1730 to 60. llvm-svn: 193436	2013-10-25 21:29:52 +00:00
Hal Finkel	02f562df43	LoopVectorizer: Don't attempt to vectorize extractelement instructions The loop vectorizer does not currently understand how to vectorize extractelement instructions. The existing check, which excluded all vector-valued instructions, did not catch extractelement instructions because it checked only the return value. As a result, vectorization would proceed, producing illegal instructions like this: %58 = extractelement <2 x i32> %15, i32 0 %59 = extractelement i32 %58, i32 0 where the second extractelement is illegal because its first operand is not a vector. llvm-svn: 193434	2013-10-25 20:40:15 +00:00
Tom Stellard	bc7d87f07c	Inliner: Handle readonly attribute per argument when adding memcpy Patch by: Vincent Lejeune llvm-svn: 193356	2013-10-24 16:38:33 +00:00
Renato Golin	1ba143e140	Mark vector loops as already vectorized Make sure we mark all loops (scalar and vector) when vectorizing, so that we don't try to vectorize them anymore. Also, set unroll to 1, since this is what we check for on early exit. llvm-svn: 193349	2013-10-24 14:50:51 +00:00
Nuno Lopes	340b0463e6	fix PR17635: false positive with packed structures LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object llvm-svn: 193317	2013-10-24 09:17:24 +00:00
Juergen Ributzka	d04d096ecf	Fix a bug in LinearFunctionTestReplace that created invalid loop exit checks. Reviewed by Andy llvm-svn: 193303	2013-10-24 05:29:56 +00:00
Andrew Trick	ada2356ac9	Clarify comments in genLoopLimit. llvm-svn: 193292	2013-10-24 00:43:38 +00:00
Yuchen Wu	3197b25b27	Fixed comment typo in GCOVProfiling.cpp llvm-svn: 193268	2013-10-23 20:35:00 +00:00
Shuxin Yang	e4fb375995	Use address-taken to disambiguate global variable and indirect memops. Major steps include: 1). introduces a not-addr-taken bit-field in GlobalVariable 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable dosen't have its address taken. 3). AA use this info for disambiguation. llvm-svn: 193251	2013-10-23 17:28:19 +00:00
Eric Christopher	874fa0f6c7	Fix spelling, grammar, and match naming convention for test files. llvm-svn: 193130	2013-10-21 23:14:06 +00:00
Tom Stellard	e1631ddf93	SimplifyCFG: Don't duplicate calls to functions marked noduplicate v2 v2: - Use CI->cannotDuplicate() llvm-svn: 193115	2013-10-21 20:07:30 +00:00
Matt Arsenault	404c60a7c3	Use more type helper functions llvm-svn: 193109	2013-10-21 19:43:56 +00:00
Matt Arsenault	fa64659bd8	Teach SimplifyCFG about address spaces llvm-svn: 193104	2013-10-21 18:55:08 +00:00
Rafael Espindola	3d7fc25c7c	Optimize more linkonce_odr values during LTO. When a linkonce_odr value that is on the dso list is not unnamed_addr we can still look to see if anything is actually using its address. If not, it is safe to hide it. This patch implements that by moving GlobalStatus to Transforms/Utils and using it in Internalize. llvm-svn: 193090	2013-10-21 17:14:55 +00:00
Michael Gottesman	63c63ac21e	Fix the predecessor removal logic in r193045. Additionally some small comment/stylistic fixes are included as well. llvm-svn: 193068	2013-10-21 05:20:11 +00:00
Bill Wendling	90dd90afcb	Don't eliminate a partially redundant load if it's in a landing pad. A landing pad can be jumped to only by the unwind edge of an invoke instruction. If we eliminate a partially redundant load in a landing pad, it will create a basic block that violates this constraint. It then leads to other problems down the line if it tries to merge that basic block with the landing pad. Avoid this by not eliminating the load in a landing pad. PR17621 llvm-svn: 193064	2013-10-21 04:09:17 +00:00
Michael Gottesman	c024f3258a	Teach simplify-cfg how to correctly create covered lookup tables for switches on iN with N >= 3. One optimization simplify-cfg performs is the converting of switches to lookup tables if the switch has > 4 cases. This is done by: 1. Finding the max/min case value and calculating the switch case range. 2. Create a lookup table basic block. 3. Perform a check in the switch's BB to see if the input value is in the switch's case range. If the input value satisfies said predicate branch to the lookup table BB, otherwise branch to the switch's default destination BB using the default value as the result. The conditional check consists of subtracting the min case value of the table from any input iN value and then ensuring that said value is unsigned less than the size of the lookup table represented as an iN value. If the lookup table is a covered lookup table, the size of the table will be N which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN value against 0 which is always false yielding the incorrect result. This patch fixes this problem by recognizing if we have a covered lookup table and if we do, unconditionally jumps to the lookup table BB since the covering property of the lookup table implies no input values could not be handled by said BB. rdar://15268442 llvm-svn: 193045	2013-10-20 07:04:37 +00:00
Bill Wendling	4fea22c63b	Perform an intelligent splice of the predecessor with the single successor. If the predecessor's being spliced into a landing pad, then we need the PHIs to come first and the rest of the predecessor's code to come after the landing pad instruction. llvm-svn: 193035	2013-10-19 11:27:12 +00:00
Nadav Rotem	7f27e0b0ce	Mark some command line flags as hidden llvm-svn: 193013	2013-10-18 23:38:13 +00:00
Rafael Espindola	045a78fa7e	Rename fields of GlobalStatus to match the coding style. llvm-svn: 192910	2013-10-17 18:18:52 +00:00
Rafael Espindola	27797baee7	rename SafeToDestroyConstant to isSafeToDestroyConstant and clang-format. llvm-svn: 192907	2013-10-17 18:06:32 +00:00
Rafael Espindola	026c9cbefe	Simplify the interface of AnalyzeGlobal a bit and rename to analyzeGlobal. No functionality change. llvm-svn: 192906	2013-10-17 18:00:25 +00:00
Evgeniy Stepanov	21a9c93a4d	[msan] Use zero-extension in shadow cast by default. Switch to sign-extension in r192575 caused 7% perf loss on 482.sphinx3. llvm-svn: 192882	2013-10-17 10:53:50 +00:00
Dmitry Vyukov	b1ad5720a2	tsan: implement no_sanitize_thread attribute If a function has no_sanitize_thread attribute, do not instrument memory accesses in it. llvm-svn: 192871	2013-10-17 07:20:06 +00:00
Arnold Schwaighofer	a66582470b	SLPVectorizer: Don't vectorize volatile memory operations radar://15231682 Reapply r192799, http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang/builds/8226 showed that the bot is still broken even with this out. llvm-svn: 192820	2013-10-16 17:52:40 +00:00
Arnold Schwaighofer	06a0324f6a	Revert "SLPVectorizer: Don't vectorize volatile memory operations" This speculatively reverts commit 192799. It might have broken a linux buildbot. llvm-svn: 192816	2013-10-16 17:19:40 +00:00
Arnold Schwaighofer	5078ea2bd9	SLPVectorizer: Don't vectorize volatile memory operations radar://15231682 llvm-svn: 192799	2013-10-16 16:09:00 +00:00
Kostya Serebryany	d3d23bec66	[asan] Optimize accesses to global arrays with constant index Summary: Given a global array G[N], which is declared in this CU and has static initializer avoid instrumenting accesses like G[i], where 'i' is a constant and 0<=i<N. Also add a bit of stats. This eliminates ~1% of instrumentations on SPEC2006 and also partially helps when asan is being run together with coverage. Reviewers: samsonov Reviewed By: samsonov CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1947 llvm-svn: 192794	2013-10-16 14:06:14 +00:00
Benjamin Kramer	c97850be76	LoopVectorize: Properly reflect PODness in comments. llvm-svn: 192717	2013-10-15 16:19:54 +00:00
Craig Topper	ef9e993eaa	Remove x86_sse42_crc32_64_8 intrinsic. It has no functional difference from x86_sse42_crc32_32_8 and was not mapped to a clang builtin. I'm not even sure why this form of the instruction is even called out explicitly in the docs. Also add AutoUpgrade support to convert it into the other intrinsic with appropriate trunc and zext. llvm-svn: 192672	2013-10-15 05:20:47 +00:00
Rafael Espindola	8c1d78ad51	Remove lib/Transforms/Instrumentation/ProfilingUtils.* They were leftover from the old profiling support. Patch by Alastair Murray. llvm-svn: 192605	2013-10-14 16:46:46 +00:00
Chris Lattner	94fc4bed1f	Basic blocks typically have few predecessors. Use a SmallDenseMap to avoid a heap allocation when this is the case. llvm-svn: 192602	2013-10-14 16:05:55 +00:00
Evgeniy Stepanov	be83d8f693	[msan] Instrument x86._cvt intrinsics. Currently MSan checks that arguments of cvt intrinsics are fully initialized. That's too much to ask: some of them only operate on lower half, or even quarter, of the input register. llvm-svn: 192599	2013-10-14 15:16:25 +00:00
Evgeniy Stepanov	9b5517b127	[msan] Fix handling of scalar select of vectors. llvm-svn: 192575	2013-10-14 09:52:09 +00:00
Arnold Schwaighofer	58864d2d5f	SLPVectorizer: Sort PHINodes based on their opcode Before this patch we relied on the order of phi nodes when we looked for phi nodes of the same type. This could prevent vectorization of cases where there was a phi node of a second type in between phi nodes of some type. This is important for vectorization of an internal graphics kernel. On the test suite + external on x86_64 (and on a run on armv7s) it showed no impact on either performance or compile time. radar://15024459 llvm-svn: 192537	2013-10-12 18:56:27 +00:00
Tobias Grosser	5cff1e2d78	LoopVectorize: Add missing INITIALIZE_PASS_DEPENDENCY macros Contributed-by: Peter Zotov <whitequark@whitequark.org> llvm-svn: 192536	2013-10-12 18:29:15 +00:00
Renato Golin	dd943a8919	Better info when debugging vectorizer llvm-svn: 192460	2013-10-11 16:14:39 +00:00
Shuxin Yang	1cab418ce2	Fix a bug in Dead Argument Elimination. If a function seen at compile time is not necessarily the one linked to the binary being built, it is illegal to change the actual arguments passing to it. e.g. -------------------------- void foo(int lol) { // foo() has linkage satisifying isWeakForLinker() // "lol" is not used at all. } void bar(int lo2) { // xform to foo(undef) is illegal, as compiler dose not know which // instance of foo() will be linked to the the binary being built. foo(lol2); } ----------------------------- Such functions can be captured by isWeakForLinker(). NOTE that mayBeOverridden() is insufficient for this purpose as it dosen't include linkage types like AvailableExternallyLinkage and LinkOnceODRLinkage. Take link_odr* as an example, it indicates a set of EQUIVALENT globals that can be merged at link-time. However, the semantic of EQUIVALENT-functions includes parameters. Changing parameters breaks the assumption. Thank John McCall for help, especially for the explanation of subtle difference between linkage types. rdar://11546243 llvm-svn: 192302	2013-10-09 17:21:44 +00:00
Arnold Schwaighofer	0caddfc731	LoopVectorize: External uses must use the last value in a reduction cycle Otherwise, we don't perform operations that would have been performed on the scalar version. Fixes PR17498. llvm-svn: 192133	2013-10-07 21:05:43 +00:00
Alexey Samsonov	a1944e6d26	Revert r191834 until we measure the effect of this benchmarks and maybe find a better way to fix it llvm-svn: 192121	2013-10-07 19:03:24 +00:00
Hal Finkel	f5a3eaea55	UpdatePHINodes in BasicBlockUtils should not crash on duplicate predecessors UpdatePHINodes has an optimization to reuse an existing PHI node, where it first deletes all of its entries and then replaces them. Unfortunately, in the case where we had duplicate predecessors (which are allowed so long as the associated PHI entries have the same value), the loop removing the existing PHI entries from the to-be-reused PHI would assert (if that PHI was not the one which had the duplicates). llvm-svn: 192001	2013-10-04 23:41:05 +00:00
Arnold Schwaighofer	698d4ac8a8	SLPVectorizer: Sort inputs to commutative binary operations Sort the operands of the other entries in the current vectorization root according to the first entry's operands opcodes. %conv0 = uitofp ... %load0 = load float ... = fmul %conv0, %load0 = fmul %load0, %conv1 = fmul %load0, %conv2 Make sure that we recursively vectorize <%conv0, %conv1, %conv2> and <%load0, %load0, %load0>. This makes it more likely to obtain vectorizable trees. We have to be careful when we sort that we don't destroy 'good' existing ordering implied by source order. radar://15080067 llvm-svn: 191977	2013-10-04 20:39:16 +00:00
Owen Anderson	5797bfd4a3	Pull fptrunc's upwards through selects when one of the select's selectands was a constant. This has a number of benefits, including producing small immediates (easier to materialize, smaller constant pools) as well as being more likely to allow the fptrunc to fuse with a preceding instruction (truncating selects are unusual). llvm-svn: 191929	2013-10-03 21:08:05 +00:00
Rafael Espindola	cda2911caa	Optimize linkonce_odr unnamed_addr functions during LTO. Generalize the API so we can distinguish symbols that are needed just for a DSO symbol table from those that are used from some native .o. The symbols that are only wanted for the dso symbol table can be dropped if llvm can prove every other dso has a copy (linkonce_odr) and the address is not important (unnamed_addr). llvm-svn: 191922	2013-10-03 18:29:09 +00:00
Matt Arsenault	bfa37e546d	Make gep i8* X, -(ptrtoint Y) transform work with address spaces llvm-svn: 191920	2013-10-03 18:15:57 +00:00
Matt Arsenault	0be1cb1c7b	Don't use runtime bounds check between address spaces. Don't vectorize with a runtime check if it requires a comparison between pointers with different address spaces. The values can't be assumed to be directly comparable. Previously it would create an illegal bitcast. llvm-svn: 191862	2013-10-02 22:38:17 +00:00
Yi Jiang	8fd1a806d5	Apply slp vectorization on fully-vectorizable tree of height 2 llvm-svn: 191852	2013-10-02 20:20:39 +00:00
Matt Arsenault	39d592fe48	Fix debug printing spacing. Fix missing newlines, missing and extra spaces in printed messages. llvm-svn: 191851	2013-10-02 20:04:29 +00:00
Matt Arsenault	cccbe16785	Fix comment grammar and capitalization. llvm-svn: 191850	2013-10-02 20:04:26 +00:00
Benjamin Kramer	b9add84ef6	SLPVectorizer: Make store chain finding more aggressive with GetUnderlyingObject. This recursively strips all GEPs like the existing code. It also handles bitcasts and other operations that do not change the pointer value. llvm-svn: 191847	2013-10-02 19:06:06 +00:00
Tom Stellard	d3e916eb6a	StructurizeCFG: Add dependency on LowerSwitch pass Switch instructions were crashing the StructurizeCFG pass, and it's probably easier anyway if we don't need to handle them in this pass. Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 191841	2013-10-02 17:04:59 +00:00
Chandler Carruth	ea56494625	Remove the very substantial, largely unmaintained legacy PGO infrastructure. This was essentially work toward PGO based on a design that had several flaws, partially dating from a time when LLVM had a different architecture, and with an effort to modernize it abandoned without being completed. Since then, it has bitrotted for several years further. The result is nearly unusable, and isn't helping any of the modern PGO efforts. Instead, it is getting in the way, adding confusion about PGO in LLVM and distracting everyone with maintenance on essentially dead code. Removing it paves the way for modern efforts around PGO. Among other effects, this removes the last of the runtime libraries from LLVM. Those are being developed in the separate 'compiler-rt' project now, with somewhat different licensing specifically more approriate for runtimes. llvm-svn: 191835	2013-10-02 15:42:23 +00:00
Alexey Samsonov	31540172d0	Remove "localize global" optimization Summary: As discussed in http://llvm-reviews.chandlerc.com/D1754, this optimization isn't really valid for C, and fires too rarely anyway. Reviewers: rafael, nicholas Reviewed By: nicholas CC: rnk, llvm-commits, nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D1769 llvm-svn: 191834	2013-10-02 15:31:34 +00:00
Matt Arsenault	517d84e268	Don't merge tiny functions. It's silly to merge functions like these: define void @foo(i32 %x) { ret void } define void @bar(i32 %x) { ret void } to get define void @bar(i32) { tail call void @foo(i32 %0) ret void } llvm-svn: 191786	2013-10-01 18:05:30 +00:00
Rafael Espindola	44fee4e0eb	Remove several unused variables. Patch by Alp Toker. llvm-svn: 191757	2013-10-01 13:32:03 +00:00
Matt Arsenault	5ea37f8d89	Fix code duplication llvm-svn: 191716	2013-10-01 00:01:14 +00:00
Matt Arsenault	8468062c6e	Use right address space size in InstCombineCompares The test's output doesn't change, but this ensures this is actually hit with a different address space. llvm-svn: 191701	2013-09-30 21:11:01 +00:00

... 4 5 6 7 8 ...

11351 Commits