llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	ff856f4c4e	Note the PR number. llvm-svn: 199932	2014-01-23 20:17:12 +00:00
Rafael Espindola	2a05ea5c0e	Remove tail marker when changing an argument to an alloca. Argument promotion can replace an argument of a call with an alloca. This requires clearing the tail marker as it is very likely that the callee is now using an alloca in the caller. This fixes pr14710. llvm-svn: 199909	2014-01-23 17:19:42 +00:00
Chandler Carruth	aa7fa5e4b2	[LPM] Make LoopSimplify no longer a LoopPass and instead both a utility function and a FunctionPass. This has many benefits. The motivating use case was to be able to compute function analysis passes after running LoopSimplify (to avoid invalidating them) and then to run other passes which require LoopSimplify. Specifically passes like unrolling and vectorization are critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so that they can be profile aware. For the LoopVectorize pass the only things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify and LCSSA is next on my list. There are also a bunch of other benefits of doing this: - It is now very feasible to make more passes preserve LoopSimplify because they can simply run it after changing a loop. Because subsequence passes can assume LoopSimplify is preserved we can reduce the runs of this pass to the times when we actually mutate a loop structure. - The new pass manager should be able to more easily support loop passes factored in this way. - We can at long, long last observe that LoopSimplify is preserved across SCEV. This halves the number of times we run LoopSimplify!!! Now, getting here wasn't trivial. First off, the interfaces used by LoopSimplify are all over the map regarding how analysis are updated. We end up with weird "pass" parameters as a consequence. I'll try to clean at least some of this up later -- I'll have to have it all clean for the new pass manager. Next up I discovered a really frustrating bug. LoopUnroll claims to preserve LoopSimplify. That's actually a lie. But the way the LoopPassManager ends up running the passes, it always ran LoopSimplify on the unrolled-into loop, rectifying this oversight before any verification could kick in and point out that in fact nothing was preserved. So I've added code to the unroller to actually simplify the surrounding loop when it succeeds at unrolling. The only functional change in the test suite is that we now catch a case that was previously missed because SCEV and other loop transforms see their containing loops as simplified and thus don't miss some opportunities. One test case has been converted to check that we catch this case rather than checking that we miss it but at least don't get the wrong answer. Note that I have #if-ed out all of the verification logic in LoopSimplify! This is a temporary workaround while extracting these bits from the LoopPassManager. Currently, there is no way to have a pass in the LoopPassManager which preserves LoopSimplify along with one which does not. The LPM will try to verify on each loop in the nest that LoopSimplify holds but the now-Function-pass cannot distinguish what loop is being verified and so must try to verify all of them. The inner most loop is clearly no longer simplified as there is a pass which didn't even attempt to preserve it. =/ Once I get LCSSA out (and maybe LoopVectorize and some other fixes) I'll be able to re-enable this check and catch any places where we are still failing to preserve LoopSimplify. If this causes problems I can back this out and try to commit all of this at once, but so far this seems to work and allow much more incremental progress. llvm-svn: 199884	2014-01-23 11:23:19 +00:00
Matt Arsenault	88b3cc7090	Add CHECK-LABELs llvm-svn: 199846	2014-01-22 22:32:58 +00:00
Matt Arsenault	84de61148b	Handle an addrspacecast case in memcpyopt llvm-svn: 199836	2014-01-22 21:53:19 +00:00
Owen Anderson	1664dc8973	Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators. This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd. llvm-svn: 199629	2014-01-20 07:44:53 +00:00
Benjamin Kramer	b80e1699b3	InstCombine: Modernize a bunch of cast combines. Also make them vector-aware. llvm-svn: 199608	2014-01-19 20:05:13 +00:00
Benjamin Kramer	7a74bd4703	InstCombine: Replace a hand-rolled version of isKnownToBeAPowerOfTwo with the real thing. llvm-svn: 199604	2014-01-19 16:48:41 +00:00
Benjamin Kramer	72196f3ae5	InstCombine: Teach most integer add/sub/mul/div combines how to deal with vectors. llvm-svn: 199602	2014-01-19 15:24:22 +00:00
Benjamin Kramer	76b15d04ff	InstCombine: Refactor fmul/fdiv combines to handle vectors. llvm-svn: 199598	2014-01-19 13:36:27 +00:00
Chandler Carruth	1bf38c6a71	Fix a really nasty SROA bug with how we handled out-of-bounds memcpy intrinsics. Reported on the list by Evan with a couple of attempts to fix, but it took a while to dig down to the root cause. There are two overlapping bugs here, both centering around the circumstance of discovering a memcpy operand which is known to be completely outside the bounds of the alloca. First, we need to kill the other side of the memcpy if it was added to this alloca. Otherwise we'll factor it into our slicing and try to rewrite it even though we know for a fact that it is dead. This is made more tricky because we can visit the sides in either order. So we have to both kill the other side and skip instructions marked as dead. The latter really should be goodness in every case, but here is a matter of correctness. Second, we need to actually remove the uses of the alloca by the memcpy when queuing it for later deletion. Otherwise it may still be using the alloca when we go to promote it (if the rewrite re-uses the existing alloca instruction). Do this by factoring out the use-clobbering used when for nixing a Phi argument and re-using it across the operands of a to-be-deleted instruction. llvm-svn: 199590	2014-01-19 12:16:54 +00:00
Arnold Schwaighofer	cc742dd9e4	LoopVectorizer: A reduction that has multiple uses of the reduction value is not a reduction. Really. Under certain circumstances (the use list of an instruction has to be set up right - hence the extra pass in the test case) we would not recognize when a value in a potential reduction cycle was used multiple times by the reduction cycle. Fixes PR18526. radar://15851149 llvm-svn: 199570	2014-01-19 03:18:31 +00:00
Nick Lewycky	a6a17d77d2	Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu! llvm-svn: 199564	2014-01-18 22:47:12 +00:00
Benjamin Kramer	fea9ac99b0	InstCombine: Make the (fmul X, -1.0) -> (fsub -0.0, X) transform handle vectors too. PR18532. llvm-svn: 199553	2014-01-18 16:43:14 +00:00
Owen Anderson	48b842ef7c	Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep). llvm-svn: 199528	2014-01-18 00:48:14 +00:00
Owen Anderson	e7321660c1	Fix two cases where we could lose fast math flags when optimizing FADD expressions. llvm-svn: 199427	2014-01-16 21:26:02 +00:00
Owen Anderson	4557a156e3	Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation. llvm-svn: 199425	2014-01-16 21:07:52 +00:00
Owen Anderson	e8537fc7e0	Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression. llvm-svn: 199424	2014-01-16 20:59:41 +00:00
Owen Anderson	f74cfe031f	Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X). llvm-svn: 199420	2014-01-16 20:36:42 +00:00
Andrew Trick	ee5aa7f71a	Fix PR18449: SCEV needs more precise max BECount for multi-exit loop. llvm-svn: 199299	2014-01-15 06:42:11 +00:00
Hans Wennborg	4744ac1733	Switch-to-lookup tables: set threshold to 3 cases There has been an old FIXME to find the right cut-off for when it's worth analyzing and potentially transforming a switch to a lookup table. The switches always have two or more cases. I could not measure any speed-up by transforming a switch with two cases. A switch with three cases gets a nice speed-up, and I couldn't measure any compile-time regression, so I think this is the right threshold. In a Clang self-host, this causes 480 new switches to be transformed, and reduces the final binary size with 8 KB. llvm-svn: 199294	2014-01-15 05:00:27 +00:00
Arnold Schwaighofer	dc4c9460a2	LoopVectorize: Only strip casts from integer types when replacing symbolic strides Fixes PR18480. llvm-svn: 199291	2014-01-15 03:35:46 +00:00
Matt Arsenault	2d353d1a10	Do pointer cast simplifications on addrspacecast llvm-svn: 199254	2014-01-14 20:00:45 +00:00
Matt Arsenault	e55a2c2e6b	Make nocapture analysis work with addrspacecast llvm-svn: 199246	2014-01-14 19:11:52 +00:00
Hans Wennborg	ac114a3ce7	Switch-to-lookup tables: Don't require a result for the default case when the lookup table doesn't have any holes. This means we can build a lookup table for switches like this: switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; case 3: return 4; default: exit(1); } The default case doesn't yield a constant result here, but that doesn't matter, since a default result is only necessary for filling holes in the lookup table, and this table doesn't have any holes. This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB off the resulting clang binary. llvm-svn: 199025	2014-01-12 00:44:41 +00:00
Benjamin Kramer	c10563d14e	Fix broken CHECK lines. llvm-svn: 199016	2014-01-11 21:06:00 +00:00
NAKAMURA Takumi	a64d0bccc8	llvm/test/Transforms/SampleProfile/syntax.ll: Eliminate locale-sensitive message check. llvm-svn: 199000	2014-01-11 09:23:52 +00:00
Diego Novillo	9518b63bfc	Extend and simplify the sample profile input file. 1- Use the line_iterator class to read profile files. 2- Allow comments in profile file. Lines starting with '#' are completely ignored while reading the profile. 3- Add parsing support for discriminators and indirect call samples. Our external profiler can emit more profile information that we are currently not handling. This patch does not add new functionality to support this information, but it allows profile files to provide it. I will add actual support later on (for at least one of these features, I need support for DWARF discriminators in Clang). A sample line may contain the following additional information: Discriminator. This is used if the sampled program was compiled with DWARF discriminator support (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This is currently only emitted by GCC and we just ignore it. Potential call targets and samples. If present, this line contains a call instruction. This models both direct and indirect calls. Each called target is listed together with the number of samples. For example, 130: 7 foo:3 bar:2 baz:7 The above means that at relative line offset 130 there is a call instruction that calls one of foo(), bar() and baz(). With baz() being the relatively more frequent call target. Differential Revision: http://llvm-reviews.chandlerc.com/D2355 4- Simplify format of profile input file. This implements earlier suggestions to simplify the format of the sample profile file. The symbol table is not necessary and function profiles do not need to know the number of samples in advance. Differential Revision: http://llvm-reviews.chandlerc.com/D2419 llvm-svn: 198973	2014-01-10 23:23:51 +00:00
Diego Novillo	0accb3d2bc	Propagation of profile samples through the CFG. This adds a propagation heuristic to convert instruction samples into branch weights. It implements a similar heuristic to the one implemented by Dehao Chen on GCC. The propagation proceeds in 3 phases: 1- Assignment of block weights. All the basic blocks in the function are initial assigned the same weight as their most frequently executed instruction. 2- Creation of equivalence classes. Since samples may be missing from blocks, we can fill in the gaps by setting the weights of all the blocks in the same equivalence class to the same weight. To compute the concept of equivalence, we use dominance and loop information. Two blocks B1 and B2 are in the same equivalence class if B1 dominates B2, B2 post-dominates B1 and both are in the same loop. 3- Propagation of block weights into edges. This uses a simple propagation heuristic. The following rules are applied to every block B in the CFG: - If B has a single predecessor/successor, then the weight of that edge is the weight of the block. - If all the edges are known except one, and the weight of the block is already known, the weight of the unknown edge will be the weight of the block minus the sum of all the known edges. If the sum of all the known edges is larger than B's weight, we set the unknown edge weight to zero. - If there is a self-referential edge, and the weight of the block is known, the weight for that edge is set to the weight of the block minus the weight of the other incoming edges to that block (if known). Since this propagation is not guaranteed to finalize for every CFG, we only allow it to proceed for a limited number of iterations (controlled by -sample-profile-max-propagate-iterations). It currently uses the same GCC default of 100. Before propagation starts, the pass builds (for each block) a list of unique predecessors and successors. This is necessary to handle identical edges in multiway branches. Since we visit all blocks and all edges of the CFG, it is cleaner to build these lists once at the start of the pass. Finally, the patch fixes the computation of relative line locations. The profiler emits lines relative to the function header. To discover it, we traverse the compilation unit looking for the subprogram corresponding to the function. The line number of that subprogram is the line where the function begins. That becomes line zero for all the relative locations. llvm-svn: 198972	2014-01-10 23:23:46 +00:00
Arnold Schwaighofer	c2e9d759f2	LoopVectorizer: Handle strided memory accesses by versioning for (i = 0; i < N; ++i) A[i * Stride1] += B[i * Stride2]; We take loops like this and check that the symbolic strides 'Strided1/2' are one and drop to the scalar loop if they are not. This is currently disabled by default and hidden behind the flag 'enable-mem-access-versioning'. radar://13075509 llvm-svn: 198950	2014-01-10 18:20:32 +00:00
Hao Liu	26abebbb2c	Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine. llvm-svn: 198730	2014-01-08 03:06:15 +00:00
Andrew Trick	e4a18605e0	Reapply r198654 "indvars: sink truncates outside the loop." This doesn't seem to have actually broken anything. It was paranoia on my part. Trying again now that bots are more stable. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198678	2014-01-07 06:59:12 +00:00
Andrew Trick	3c0ed08996	Revert "indvars: sink truncates outside the loop." This reverts commit r198654. One of the bots reported a SciMark failure. llvm-svn: 198659	2014-01-07 01:50:58 +00:00
Andrew Trick	0b8e3b2cb4	indvars: sink truncates outside the loop. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198654	2014-01-07 01:02:55 +00:00
Andrew Trick	6796ab424c	Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted. We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted. llvm-svn: 198631	2014-01-06 19:43:14 +00:00
Alp Toker	5e9f3265f8	Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." This commit was the source of crasher PR18384: While deleting: label %for.cond127 An asserting value handle still pointed to this value! UNREACHABLE executed at llvm/lib/IR/Value.cpp:671! Reverting to get the builders green, feel free to re-land after fixing up. (Renato has a handy isolated repro if you need it.) This reverts commit r198478. llvm-svn: 198503	2014-01-04 17:00:45 +00:00
Andrew Trick	aceac9746d	Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things. getSCEV for an ashr instruction creates an intermediate zext expression when it truncates its operand. The operand is initially inside the loop, so the narrow zext expression has a non-loop-invariant loop disposition. LoopSimplify then runs on an outer loop, hoists the ashr operand, and properly invalidate the SCEVs that are mapped to value. The SCEV expression for the ashr is now an AddRec with the hoisted value as the now loop-invariant start value. The LoopDisposition of this wide value was properly invalidated during LoopSimplify. However, if we later get the ashr SCEV again, we again try to create the intermediate zext expression. We get the same SCEV that we did earlier, and it is still cached because it was never mapped to a Value. When we try to create a new AddRec we abort because we're using the old non-loop-invariant LoopDisposition. I don't have a solution for this other than to clear LoopDisposition when LoopSimplify hoists things. I think the long-term strategy should be to perform LoopSimplify on all loops before computing SCEV and before running any loop opts on individual loops. It's possible we may want to rerun LoopSimplify on individual loops, but it should rarely do anything, so rarely require invalidating SCEV. llvm-svn: 198478	2014-01-04 05:52:49 +00:00
David Peixotto	ea9ba446d5	Fix loop rerolling pass failure with non-consant loop lower bound The loop rerolling pass was failing with an assertion failure from a failed cast on loops like this: void foo(int A, int B, int m, int n) { for (int i = m; i < n; i+=4) { A[i+0] = B[i+0] * 4; A[i+1] = B[i+1] * 4; A[i+2] = B[i+2] * 4; A[i+3] = B[i+3] * 4; } } The code was casting the SCEV-expanded code for the new induction variable to a phi-node. When the loop had a non-constant lower bound, the SCEV expander would end the code expansion with an add insted of a phi node and the cast would fail. It looks like the cast to a phi node was only needed to get the induction variable value coming from the backedge to compute the end of loop condition. This patch changes the loop reroller to compare the induction variable to the number of times the backedge is taken instead of the iteration count of the loop. In other words, we stop the loop when the current value of the induction variable == IterationCount-1. Previously, the comparison was comparing the induction variable value from the next iteration == IterationCount. This problem only seems to occur on 32-bit targets. For some reason, the loop is not rerolled on 64-bit targets. PR18290 llvm-svn: 198425	2014-01-03 17:20:01 +00:00
Arnold Schwaighofer	833a82ecde	BasicAA: Use reachabilty instead of dominance for checking value equality in phi cycles This allows the value equality check to work even if we don't have a dominator tree. Also add some more comments. I was worried about compile time impacts and did not implement reachability but used the dominance check in the initial patch. The trade-off was that the dominator tree was required. The llvm utility function isPotentiallyReachable cuts off the recursive search after 32 visits. Testing did not show any compile time regressions showing my worries unjustfied. No compile time or performance regressions at O3 -flto -mavx on test-suite + externals. Addresses review comments from r198290. llvm-svn: 198400	2014-01-03 05:47:03 +00:00
Matt Arsenault	461c8e0a8c	Delete unread globals through addrspacecast llvm-svn: 198346	2014-01-02 20:01:43 +00:00
Matt Arsenault	da1deabb16	Fix addrspacecast with metadata globals llvm-svn: 198345	2014-01-02 19:53:49 +00:00
Andrew Trick	020dd898fc	indvars: insert truncate at loop boundary to avoid redundant IVs. When widening an IV to remove s/zext, we generally try to eliminate the original narrow IV. However, LCSSA phi nodes outside the loop were still using the original IV. Clean this up more aggressively to avoid redundancy in generated code. llvm-svn: 198338	2014-01-02 19:29:38 +00:00
Arnold Schwaighofer	0d10a9d579	BasicAA: Fix value equality and phi cycles When there are cycles in the value graph we have to be careful interpreting "Value" identity as "value" equivalence. We interpret the value of a phi node as the value of its operands. When we check for value equivalence now we make sure that the "Value" dominates all cycles (phis). %0 = phi [%noaliasval, %addr2] %l = load %ptr %addr1 = gep @a, 0, %l %addr2 = gep @a, 0, (%l + 1) store %ptr ... Before this patch we would return NoAlias for (%0, %addr1) which is wrong because the value of the load is from different iterations of the loop. Tested on x86_64 -mavx at O3 and O3 -flto with no performance or compile time regressions. PR18068 radar://15653794 llvm-svn: 198290	2014-01-02 03:31:36 +00:00
Chandler Carruth	f5689f8304	Disable transforms that introduce calls to exp10*() on Linux due to widespread glibc bugs. The glibc implementation of exp10 has a very serious precision bug in version 2.15 (and older versions). This is still very widely used (the current Ubuntu LTS for example uses it) and so it isn't reasonable to make transforms that produce these functions. This fixes many miscompiles introduced when we started transforming pow(10.0, ...) into exp10, and it may have fixed other latent miscompiles where exp10 provided sufficient precision but exp10f did not. This is all really horrible. The primary bug has been fixed for over a year and glibc 2.18 works correctly for the test cases I have, but it will be 2017 before the LTS using 2.15 is no longer supported by Ubuntu (and thus reasonable for folks to be relying on). =[ We're either going to need to live without these optimizations, or find a way to switch behavior more dynamically than using simply the fact that the OS is "Linux". To make matters worse, there appears to be significant testing and fixing of numerous other bugs in the exp10 family of functions right now in glibc. While those haven't been causing problems I've seen in the wild, it gives me concerns that we may need to wait until an even later release of glibc before we can reliably transform code into exp10. llvm-svn: 198093	2013-12-28 02:40:19 +00:00
Andrew Trick	0ba77a0740	Add support to indvars for optimizing sadd.with.overflow. Split sadd.with.overflow into add + sadd.with.overflow to allow analysis and optimization. This should ideally be done after InstCombine, which can perform code motion (eventually indvars should run after all canonical instcombines). We want ISEL to recombine the add and the check, at least on x86. This is currently under an option for reducing live induction variables: -liv-reduce. The next step is reducing liveness of IVs that are live out of the overflow check paths. Once the related optimizations are fully developed, reviewed and tested, I do expect this to become default. llvm-svn: 197926	2013-12-23 23:31:49 +00:00
Richard Sandiford	1fb5c13e3a	Fix Scalarizer insertion point when replacing PHIs with insertelements If the Scalarizer scalarized a vector PHI but could not scalarize all uses of it, it would insert a series of insertelements to reconstruct the vector PHI value from the scalar ones. The problem was that it would emit these insertelements immediately after the PHI, even if there were other PHIs after it. llvm-svn: 197909	2013-12-23 14:51:56 +00:00
Richard Sandiford	3548cbb980	Fix Scalarizer handling of vector GEPs with multiple index operands The old code only worked for one index operand. Also handle "inbounds". llvm-svn: 197908	2013-12-23 14:45:00 +00:00
Justin Bogner	0ba3f211c4	Transforms: Don't create bad weights when eliminating dead cases If we happen to eliminate every case in a switch that has branch weights, we currently try to create metadata for the one remaining branch, triggering an assert. Instead, we need to check that the metadata we're trying to create is sensible. llvm-svn: 197791	2013-12-20 08:21:30 +00:00
Justin Bogner	668eb1f746	test: Make a branchweight test more specific llvm-svn: 197790	2013-12-20 08:21:27 +00:00
Justin Bogner	f71b18e972	test: Prefer CHECK-LABEL to CHECK in branchweight tests llvm-svn: 197789	2013-12-20 08:21:24 +00:00

1 2 3 4 5 ...

4148 Commits