llvm-project

Commit Graph

Author	SHA1	Message	Date
Owen Anderson	48b842ef7c	Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep). llvm-svn: 199528	2014-01-18 00:48:14 +00:00
Owen Anderson	e7321660c1	Fix two cases where we could lose fast math flags when optimizing FADD expressions. llvm-svn: 199427	2014-01-16 21:26:02 +00:00
Owen Anderson	4557a156e3	Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation. llvm-svn: 199425	2014-01-16 21:07:52 +00:00
Owen Anderson	e8537fc7e0	Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression. llvm-svn: 199424	2014-01-16 20:59:41 +00:00
Owen Anderson	f74cfe031f	Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X). llvm-svn: 199420	2014-01-16 20:36:42 +00:00
Andrew Trick	ee5aa7f71a	Fix PR18449: SCEV needs more precise max BECount for multi-exit loop. llvm-svn: 199299	2014-01-15 06:42:11 +00:00
Hans Wennborg	4744ac1733	Switch-to-lookup tables: set threshold to 3 cases There has been an old FIXME to find the right cut-off for when it's worth analyzing and potentially transforming a switch to a lookup table. The switches always have two or more cases. I could not measure any speed-up by transforming a switch with two cases. A switch with three cases gets a nice speed-up, and I couldn't measure any compile-time regression, so I think this is the right threshold. In a Clang self-host, this causes 480 new switches to be transformed, and reduces the final binary size with 8 KB. llvm-svn: 199294	2014-01-15 05:00:27 +00:00
Arnold Schwaighofer	dc4c9460a2	LoopVectorize: Only strip casts from integer types when replacing symbolic strides Fixes PR18480. llvm-svn: 199291	2014-01-15 03:35:46 +00:00
Matt Arsenault	2d353d1a10	Do pointer cast simplifications on addrspacecast llvm-svn: 199254	2014-01-14 20:00:45 +00:00
Matt Arsenault	e55a2c2e6b	Make nocapture analysis work with addrspacecast llvm-svn: 199246	2014-01-14 19:11:52 +00:00
Hans Wennborg	ac114a3ce7	Switch-to-lookup tables: Don't require a result for the default case when the lookup table doesn't have any holes. This means we can build a lookup table for switches like this: switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; case 3: return 4; default: exit(1); } The default case doesn't yield a constant result here, but that doesn't matter, since a default result is only necessary for filling holes in the lookup table, and this table doesn't have any holes. This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB off the resulting clang binary. llvm-svn: 199025	2014-01-12 00:44:41 +00:00
Benjamin Kramer	c10563d14e	Fix broken CHECK lines. llvm-svn: 199016	2014-01-11 21:06:00 +00:00
NAKAMURA Takumi	a64d0bccc8	llvm/test/Transforms/SampleProfile/syntax.ll: Eliminate locale-sensitive message check. llvm-svn: 199000	2014-01-11 09:23:52 +00:00
Diego Novillo	9518b63bfc	Extend and simplify the sample profile input file. 1- Use the line_iterator class to read profile files. 2- Allow comments in profile file. Lines starting with '#' are completely ignored while reading the profile. 3- Add parsing support for discriminators and indirect call samples. Our external profiler can emit more profile information that we are currently not handling. This patch does not add new functionality to support this information, but it allows profile files to provide it. I will add actual support later on (for at least one of these features, I need support for DWARF discriminators in Clang). A sample line may contain the following additional information: Discriminator. This is used if the sampled program was compiled with DWARF discriminator support (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This is currently only emitted by GCC and we just ignore it. Potential call targets and samples. If present, this line contains a call instruction. This models both direct and indirect calls. Each called target is listed together with the number of samples. For example, 130: 7 foo:3 bar:2 baz:7 The above means that at relative line offset 130 there is a call instruction that calls one of foo(), bar() and baz(). With baz() being the relatively more frequent call target. Differential Revision: http://llvm-reviews.chandlerc.com/D2355 4- Simplify format of profile input file. This implements earlier suggestions to simplify the format of the sample profile file. The symbol table is not necessary and function profiles do not need to know the number of samples in advance. Differential Revision: http://llvm-reviews.chandlerc.com/D2419 llvm-svn: 198973	2014-01-10 23:23:51 +00:00
Diego Novillo	0accb3d2bc	Propagation of profile samples through the CFG. This adds a propagation heuristic to convert instruction samples into branch weights. It implements a similar heuristic to the one implemented by Dehao Chen on GCC. The propagation proceeds in 3 phases: 1- Assignment of block weights. All the basic blocks in the function are initial assigned the same weight as their most frequently executed instruction. 2- Creation of equivalence classes. Since samples may be missing from blocks, we can fill in the gaps by setting the weights of all the blocks in the same equivalence class to the same weight. To compute the concept of equivalence, we use dominance and loop information. Two blocks B1 and B2 are in the same equivalence class if B1 dominates B2, B2 post-dominates B1 and both are in the same loop. 3- Propagation of block weights into edges. This uses a simple propagation heuristic. The following rules are applied to every block B in the CFG: - If B has a single predecessor/successor, then the weight of that edge is the weight of the block. - If all the edges are known except one, and the weight of the block is already known, the weight of the unknown edge will be the weight of the block minus the sum of all the known edges. If the sum of all the known edges is larger than B's weight, we set the unknown edge weight to zero. - If there is a self-referential edge, and the weight of the block is known, the weight for that edge is set to the weight of the block minus the weight of the other incoming edges to that block (if known). Since this propagation is not guaranteed to finalize for every CFG, we only allow it to proceed for a limited number of iterations (controlled by -sample-profile-max-propagate-iterations). It currently uses the same GCC default of 100. Before propagation starts, the pass builds (for each block) a list of unique predecessors and successors. This is necessary to handle identical edges in multiway branches. Since we visit all blocks and all edges of the CFG, it is cleaner to build these lists once at the start of the pass. Finally, the patch fixes the computation of relative line locations. The profiler emits lines relative to the function header. To discover it, we traverse the compilation unit looking for the subprogram corresponding to the function. The line number of that subprogram is the line where the function begins. That becomes line zero for all the relative locations. llvm-svn: 198972	2014-01-10 23:23:46 +00:00
Arnold Schwaighofer	c2e9d759f2	LoopVectorizer: Handle strided memory accesses by versioning for (i = 0; i < N; ++i) A[i * Stride1] += B[i * Stride2]; We take loops like this and check that the symbolic strides 'Strided1/2' are one and drop to the scalar loop if they are not. This is currently disabled by default and hidden behind the flag 'enable-mem-access-versioning'. radar://13075509 llvm-svn: 198950	2014-01-10 18:20:32 +00:00
Hao Liu	26abebbb2c	Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine. llvm-svn: 198730	2014-01-08 03:06:15 +00:00
Andrew Trick	e4a18605e0	Reapply r198654 "indvars: sink truncates outside the loop." This doesn't seem to have actually broken anything. It was paranoia on my part. Trying again now that bots are more stable. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198678	2014-01-07 06:59:12 +00:00
Andrew Trick	3c0ed08996	Revert "indvars: sink truncates outside the loop." This reverts commit r198654. One of the bots reported a SciMark failure. llvm-svn: 198659	2014-01-07 01:50:58 +00:00
Andrew Trick	0b8e3b2cb4	indvars: sink truncates outside the loop. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198654	2014-01-07 01:02:55 +00:00
Andrew Trick	6796ab424c	Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted. We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted. llvm-svn: 198631	2014-01-06 19:43:14 +00:00
Alp Toker	5e9f3265f8	Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." This commit was the source of crasher PR18384: While deleting: label %for.cond127 An asserting value handle still pointed to this value! UNREACHABLE executed at llvm/lib/IR/Value.cpp:671! Reverting to get the builders green, feel free to re-land after fixing up. (Renato has a handy isolated repro if you need it.) This reverts commit r198478. llvm-svn: 198503	2014-01-04 17:00:45 +00:00
Andrew Trick	aceac9746d	Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things. getSCEV for an ashr instruction creates an intermediate zext expression when it truncates its operand. The operand is initially inside the loop, so the narrow zext expression has a non-loop-invariant loop disposition. LoopSimplify then runs on an outer loop, hoists the ashr operand, and properly invalidate the SCEVs that are mapped to value. The SCEV expression for the ashr is now an AddRec with the hoisted value as the now loop-invariant start value. The LoopDisposition of this wide value was properly invalidated during LoopSimplify. However, if we later get the ashr SCEV again, we again try to create the intermediate zext expression. We get the same SCEV that we did earlier, and it is still cached because it was never mapped to a Value. When we try to create a new AddRec we abort because we're using the old non-loop-invariant LoopDisposition. I don't have a solution for this other than to clear LoopDisposition when LoopSimplify hoists things. I think the long-term strategy should be to perform LoopSimplify on all loops before computing SCEV and before running any loop opts on individual loops. It's possible we may want to rerun LoopSimplify on individual loops, but it should rarely do anything, so rarely require invalidating SCEV. llvm-svn: 198478	2014-01-04 05:52:49 +00:00
David Peixotto	ea9ba446d5	Fix loop rerolling pass failure with non-consant loop lower bound The loop rerolling pass was failing with an assertion failure from a failed cast on loops like this: void foo(int A, int B, int m, int n) { for (int i = m; i < n; i+=4) { A[i+0] = B[i+0] * 4; A[i+1] = B[i+1] * 4; A[i+2] = B[i+2] * 4; A[i+3] = B[i+3] * 4; } } The code was casting the SCEV-expanded code for the new induction variable to a phi-node. When the loop had a non-constant lower bound, the SCEV expander would end the code expansion with an add insted of a phi node and the cast would fail. It looks like the cast to a phi node was only needed to get the induction variable value coming from the backedge to compute the end of loop condition. This patch changes the loop reroller to compare the induction variable to the number of times the backedge is taken instead of the iteration count of the loop. In other words, we stop the loop when the current value of the induction variable == IterationCount-1. Previously, the comparison was comparing the induction variable value from the next iteration == IterationCount. This problem only seems to occur on 32-bit targets. For some reason, the loop is not rerolled on 64-bit targets. PR18290 llvm-svn: 198425	2014-01-03 17:20:01 +00:00
Arnold Schwaighofer	833a82ecde	BasicAA: Use reachabilty instead of dominance for checking value equality in phi cycles This allows the value equality check to work even if we don't have a dominator tree. Also add some more comments. I was worried about compile time impacts and did not implement reachability but used the dominance check in the initial patch. The trade-off was that the dominator tree was required. The llvm utility function isPotentiallyReachable cuts off the recursive search after 32 visits. Testing did not show any compile time regressions showing my worries unjustfied. No compile time or performance regressions at O3 -flto -mavx on test-suite + externals. Addresses review comments from r198290. llvm-svn: 198400	2014-01-03 05:47:03 +00:00
Matt Arsenault	461c8e0a8c	Delete unread globals through addrspacecast llvm-svn: 198346	2014-01-02 20:01:43 +00:00
Matt Arsenault	da1deabb16	Fix addrspacecast with metadata globals llvm-svn: 198345	2014-01-02 19:53:49 +00:00
Andrew Trick	020dd898fc	indvars: insert truncate at loop boundary to avoid redundant IVs. When widening an IV to remove s/zext, we generally try to eliminate the original narrow IV. However, LCSSA phi nodes outside the loop were still using the original IV. Clean this up more aggressively to avoid redundancy in generated code. llvm-svn: 198338	2014-01-02 19:29:38 +00:00
Arnold Schwaighofer	0d10a9d579	BasicAA: Fix value equality and phi cycles When there are cycles in the value graph we have to be careful interpreting "Value" identity as "value" equivalence. We interpret the value of a phi node as the value of its operands. When we check for value equivalence now we make sure that the "Value" dominates all cycles (phis). %0 = phi [%noaliasval, %addr2] %l = load %ptr %addr1 = gep @a, 0, %l %addr2 = gep @a, 0, (%l + 1) store %ptr ... Before this patch we would return NoAlias for (%0, %addr1) which is wrong because the value of the load is from different iterations of the loop. Tested on x86_64 -mavx at O3 and O3 -flto with no performance or compile time regressions. PR18068 radar://15653794 llvm-svn: 198290	2014-01-02 03:31:36 +00:00
Chandler Carruth	f5689f8304	Disable transforms that introduce calls to exp10*() on Linux due to widespread glibc bugs. The glibc implementation of exp10 has a very serious precision bug in version 2.15 (and older versions). This is still very widely used (the current Ubuntu LTS for example uses it) and so it isn't reasonable to make transforms that produce these functions. This fixes many miscompiles introduced when we started transforming pow(10.0, ...) into exp10, and it may have fixed other latent miscompiles where exp10 provided sufficient precision but exp10f did not. This is all really horrible. The primary bug has been fixed for over a year and glibc 2.18 works correctly for the test cases I have, but it will be 2017 before the LTS using 2.15 is no longer supported by Ubuntu (and thus reasonable for folks to be relying on). =[ We're either going to need to live without these optimizations, or find a way to switch behavior more dynamically than using simply the fact that the OS is "Linux". To make matters worse, there appears to be significant testing and fixing of numerous other bugs in the exp10 family of functions right now in glibc. While those haven't been causing problems I've seen in the wild, it gives me concerns that we may need to wait until an even later release of glibc before we can reliably transform code into exp10. llvm-svn: 198093	2013-12-28 02:40:19 +00:00
Andrew Trick	0ba77a0740	Add support to indvars for optimizing sadd.with.overflow. Split sadd.with.overflow into add + sadd.with.overflow to allow analysis and optimization. This should ideally be done after InstCombine, which can perform code motion (eventually indvars should run after all canonical instcombines). We want ISEL to recombine the add and the check, at least on x86. This is currently under an option for reducing live induction variables: -liv-reduce. The next step is reducing liveness of IVs that are live out of the overflow check paths. Once the related optimizations are fully developed, reviewed and tested, I do expect this to become default. llvm-svn: 197926	2013-12-23 23:31:49 +00:00
Richard Sandiford	1fb5c13e3a	Fix Scalarizer insertion point when replacing PHIs with insertelements If the Scalarizer scalarized a vector PHI but could not scalarize all uses of it, it would insert a series of insertelements to reconstruct the vector PHI value from the scalar ones. The problem was that it would emit these insertelements immediately after the PHI, even if there were other PHIs after it. llvm-svn: 197909	2013-12-23 14:51:56 +00:00
Richard Sandiford	3548cbb980	Fix Scalarizer handling of vector GEPs with multiple index operands The old code only worked for one index operand. Also handle "inbounds". llvm-svn: 197908	2013-12-23 14:45:00 +00:00
Justin Bogner	0ba3f211c4	Transforms: Don't create bad weights when eliminating dead cases If we happen to eliminate every case in a switch that has branch weights, we currently try to create metadata for the one remaining branch, triggering an assert. Instead, we need to check that the metadata we're trying to create is sensible. llvm-svn: 197791	2013-12-20 08:21:30 +00:00
Justin Bogner	668eb1f746	test: Make a branchweight test more specific llvm-svn: 197790	2013-12-20 08:21:27 +00:00
Justin Bogner	f71b18e972	test: Prefer CHECK-LABEL to CHECK in branchweight tests llvm-svn: 197789	2013-12-20 08:21:24 +00:00
Arnold Schwaighofer	50b8302c55	LoopVectorizer: Don't if-convert constant expressions that can trap A phi node operand or an instruction operand could be a constant expression that can trap (division). Check that we don't vectorize such cases. PR16729 radar://15653590 llvm-svn: 197449	2013-12-17 01:11:01 +00:00
Yi Jiang	6ab044ee35	Enable double to float shrinking optimizations for binary functions like 'fmin/fmax'. Fix radar:15283121 llvm-svn: 197434	2013-12-16 22:42:40 +00:00
Joerg Sonnenberger	ddb582896a	There is no exp10 on NetBSD. llvm-svn: 197348	2013-12-15 20:36:17 +00:00
Chandler Carruth	37d25de459	[inliner] Fix PR18206 by preventing inlining functions that call setjmp through an invoke instruction. The original patch for this was written by Mark Seaborn, but I've reworked his test case into the existing returns_twice test case and implemented the fix by the prior refactoring to actually run the cost analysis over invoke instructions, and then here fixing our detection of the returns_twice attribute to work for both calls and invokes. We never noticed because we never saw an invoke. =[ llvm-svn: 197216	2013-12-13 08:00:01 +00:00
Chandler Carruth	0814d2adf0	[inliner] Completely change (and fix) how the inline cost analysis handles terminator instructions. The inline cost analysis inheritted some pretty rough handling of terminator insts from the original cost analysis, and then made it much, much worse by factoring all of the important analyses into a separate instruction visitor. That instruction visitor never visited the terminator. This works fine for things like conditional branches, but for many other things we simply computed The Wrong Value. First example are unconditional branches, which should be free but were counted as full cost. This is most significant for conditional branches where the condition simplifies and folds during inlining. We paid a 1 instruction tax on every branch in a straight line specialized path. =[ Oh, we also claimed that the unreachable instruction had cost. But it gets worse. Let's consider invoke. We never applied the call penalty. We never accounted for the cost of the arguments. Nope. Worse still, we didn't handle the correctness constraints of not inlining recursive invokes, or exception throwing returns_twice functions. Oops. See PR18206. Sadly, PR18206 requires yet another fix, but this refactoring is at least a huge step in that direction. llvm-svn: 197215	2013-12-13 07:59:56 +00:00
Mark Seaborn	b5649b853d	Fix spelling in comment in test: "themselve" -> "themselves" llvm-svn: 197180	2013-12-12 21:26:30 +00:00
Hal Finkel	f59fd7dcb4	Fix a use-after-free error in GlobalOpt CleanupConstantGlobalUsers GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage constant users to be visited. The pointers in this array need to be weak handles because when we delete a constant array, we may also be holding a pointer to one of its elements (or an element of one of its elements if we're dealing with an array of arrays) in the worklist. Fixes PR17347. llvm-svn: 197178	2013-12-12 20:45:24 +00:00
Yi Jiang	f92a574246	Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) llvm-svn: 197109	2013-12-12 01:55:04 +00:00
Justin Bogner	a41a7b3ee5	Transforms: Don't create bad branch weights when folding a switch This avoids creating branch weight metadata of length one when we fold cases into the default of a switch instruction, which was triggering an assert. llvm-svn: 196845	2013-12-10 00:13:41 +00:00
Manman Ren	2e06c8c777	Revert 196544 due to internal bot failures. llvm-svn: 196732	2013-12-08 20:28:33 +00:00
Mark Seaborn	1b3dd3527e	Fix inlining to not lose the "cleanup" clause from landingpads This fixes PR17872. This bug can lead to C++ destructors not being called when they should be, when an exception is thrown. llvm-svn: 196711	2013-12-08 00:51:21 +00:00
Mark Seaborn	ef3dbb93ec	Fix inlining to not produce duplicate landingpad clauses Before this change, inlining one "invoke" into an outer "invoke" call site can lead to the outer landingpad's catch/filter clauses being copied multiple times into the resulting landingpad. This happens: * when the inlined function contains multiple "resume" instructions, because forwardResume() copies the clauses but is called multiple times; * when the inlined function contains a "resume" and a "call", because HandleCallsInBlockInlinedThroughInvoke() copies the clauses but is redundant with forwardResume(). Fix this by deduplicating the code. This problem doesn't lead to any incorrect execution; it's only untidy. This change will make fixing PR17872 a little easier. llvm-svn: 196710	2013-12-08 00:50:58 +00:00
Renato Golin	c6b580ac12	force vector width via cpu on vectorizer metadata enable llvm-svn: 196669	2013-12-07 21:46:08 +00:00
Matt Arsenault	bbf18c6958	Fix assert with copy from global through addrspacecast llvm-svn: 196638	2013-12-07 02:58:45 +00:00

1 2 3 4 5 ...

4134 Commits