llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	fbfdced30f	Convert tests to FileCheck llvm-svn: 185124	2013-06-28 01:29:35 +00:00
Arnold Schwaighofer	12ecb331af	LoopVectorize: Preserve debug location info radar://14169017 llvm-svn: 185122	2013-06-28 00:38:54 +00:00
Arnold Schwaighofer	38de7cd464	LoopVectorize: Cache edge masks created during if-conversion Otherwise, we end up with an exponential IR blowup. Fixes PR16472. llvm-svn: 185097	2013-06-27 20:31:06 +00:00
Arnold Schwaighofer	a2dd195fb3	LoopVectorize: Use vectorized loop invariant gep index anchored in loop Use vectorized instruction instead of original instruction anchored in the original loop. Fixes PR16452 and t2075.c of PR16455. llvm-svn: 185081	2013-06-27 15:11:55 +00:00
Manman Ren	31dee5bec9	Update testing case to make DI nodes have the correct format. llvm-svn: 185061	2013-06-27 06:40:18 +00:00
Arnold Schwaighofer	8db6347b9d	Fix spelling. llvm-svn: 185052	2013-06-27 01:01:11 +00:00
Arnold Schwaighofer	ccd6c9929b	LoopVectorize: Don't store a reversed value in the vectorized value map When we store values for reversed induction stores we must not store the reversed value in the vectorized value map. Another instruction might use this value. This fixes 3 test cases of PR16455. llvm-svn: 185051	2013-06-27 00:45:41 +00:00
Michael Gottesman	41748d7c86	Added support for the Builtin attribute. The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin, rdar://problem/13727199 llvm-svn: 185049	2013-06-27 00:25:01 +00:00
Nadav Rotem	4c5b2d1de6	Erase all of the instructions that we RAUWed llvm-svn: 184969	2013-06-26 17:16:09 +00:00
Nadav Rotem	f4ca3994b8	Do not add cse-ed instructions into the visited map because we dont want to consider them as a candidate for replacement of instructions to be visited. llvm-svn: 184966	2013-06-26 16:54:53 +00:00
Nadav Rotem	0794acc1da	SLPVectorizer: support slp-vectorization of PHINodes between basic blocks llvm-svn: 184888	2013-06-25 23:04:09 +00:00
Bob Wilson	acfc01dedf	Fix SROA to avoid unnecessary scalar conversions for 1-element vectors. When a 1-element vector alloca is promoted, a store instruction can often be rewritten without converting the value to a scalar and using an insertelement instruction to stuff it into the new alloca. This patch just adds a check to skip that conversion when it is unnecessary. This turns out to be really important for some ARM Neon operations where <1 x i64> is used to get around the fact that i64 is not a legal type. llvm-svn: 184870	2013-06-25 19:09:50 +00:00
Arnold Schwaighofer	b252c11ccc	Reapply 184685 after the SetVector iteration order fix. This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg testers. "LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598" llvm-svn: 184724	2013-06-24 12:09:15 +00:00
Arnold Schwaighofer	58ca945f38	Revert "LoopVectorize: Use the dependence test utility class" This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac. We are seeing a stage2 and stage3 miscompare on some dragonegg bots. llvm-svn: 184690	2013-06-24 06:10:41 +00:00
Arnold Schwaighofer	b914a7e2ef	LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598 llvm-svn: 184685	2013-06-24 03:55:48 +00:00
Nadav Rotem	210e86d7c4	SLP Vectorizer: Add support for vectorizing parts of the tree. Untill now we detected the vectorizable tree and evaluated the cost of the entire tree. With this patch we can decide to trim-out branches of the tree that are not profitable to vectorizer. Also, increase the max depth from 6 to 12. In the worse possible case where all of the code is made of diamond-shaped graph this can bring the cost to 2**10, but diamonds are not very common. llvm-svn: 184681	2013-06-24 02:52:43 +00:00
Nadav Rotem	0323925d51	SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences. Make sure that we don't replace and RAUW two sequences if one does not dominate the other. llvm-svn: 184674	2013-06-23 21:57:27 +00:00
Nadav Rotem	eb65e67eea	SLP Vectorizer: Implement a simple CSE optimization for the gather sequences. llvm-svn: 184660	2013-06-23 06:15:46 +00:00
Nadav Rotem	80de0a28f1	SLP Vectorizer: Implement multi-block slp-vectorization. Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks. It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function. I removed the support for extracting values from trees. We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2). llvm-svn: 184647	2013-06-22 21:34:10 +00:00
Nadav Rotem	14a89c5428	SLPVectorization: Add a basic support for cross-basic block slp vectorization. We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent hints for vectorization of other basic blocks. llvm-svn: 184444	2013-06-20 17:41:45 +00:00
Matt Arsenault	d46fce1141	Move StructurizeCFG out of R600 to generic Transforms. Register it with PassManager llvm-svn: 184343	2013-06-19 20:18:24 +00:00
Quentin Colombet	145eb97d3a	LSR: Fix the parameters used to compute the scaling factor cost. Prior to this change, the considered addressing modes may be invalid since the maximum and minimum offsets were not taking into account. This was causing an assertion failure. The added test case exercices that behavior. <rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal addressing mode has an illegal cost!") llvm-svn: 184341	2013-06-19 19:59:41 +00:00
Nadav Rotem	1e9668ea81	SLPVectorizer: handle scalars that are extracted from vectors (using ExtractElementInst). llvm-svn: 184325	2013-06-19 17:33:16 +00:00
Nadav Rotem	86e848c849	SLPVectorizer: start constructing chains at stores that are not power of two. The type <3 x i8> is a common in graphics and we want to be able to vectorize it. This changes accelerates bullet by 12% and 471_omnetpp by 5%. llvm-svn: 184317	2013-06-19 15:57:29 +00:00
Nadav Rotem	e98da7f548	SLPVectorizer: vectorize compares and selects. llvm-svn: 184282	2013-06-19 05:49:52 +00:00
Pekka Jaaskelainen	eb90fd1c3b	Fix for a regression caused by the LoopVectorizer when vectorizing loops with memory accesses to non-zero address spaces. It simply dropped the AS info. Fixes PR16306. llvm-svn: 184103	2013-06-17 18:49:06 +00:00
Derek Schuff	ec9dc01b33	Fix DeleteDeadVarargs not to crash on functions referenced by BlockAddresses This pass was assuming that if hasAddressTaken() returns false for a function, the function's only uses are call sites. That's not true because there can be references by BlockAddresses too. Fix the pass to handle this case. Fix BlockAddress::replaceUsesOfWithOnConstant() to allow a function's type to be changed by RAUW'ing the function with a bitcast of the recreated function. Patch by Mark Seaborn. llvm-svn: 183933	2013-06-13 19:51:17 +00:00
Rafael Espindola	8d30480344	Always remove an alias when we rename the target. Should fix the dragonegg build bots. llvm-svn: 183845	2013-06-12 16:45:47 +00:00
Rafael Espindola	fb3fc0bf34	Convert test to FileCheck. llvm-svn: 183843	2013-06-12 16:35:53 +00:00
Rafael Espindola	a82555c0f8	Change how globalopt handles aliases in llvm.used. Instead of a custom implementation of replaceAllUsesWith, we just call replaceAllUsesWith and recreate llvm.used and llvm.compiler-used. This change is particularity interesting because it makes llvm see through what clang is doing with static used functions in extern "C" contexts. With this change, running clang -O2 in extern "C" { __attribute__((used)) static void foo() {} } produces @llvm.used = appending global [1 x i8] [i8 bitcast (void ()* @foo to i8*)], section "llvm.metadata" define internal void @foo() #0 { entry: ret void } llvm-svn: 183756	2013-06-11 17:48:06 +00:00
Tim Northover	64280fbba1	Make DeadArgumentElimination more conservative on variadic functions Variadic functions are particularly fragile in the face of ABI changes, so this limits how much the pass changes them llvm-svn: 183625	2013-06-09 02:17:27 +00:00
Shuxin Yang	140d592d84	Fix a potential bug in r183584. r183584 tries to derive some info from the code AFTER a call and apply these derived info to the code BEFORE the call, which is not always safe as the call in question may never return, and in this case, the derived info is invalid. Thank Duncan for pointing out this potential bug. rdar://14073661 llvm-svn: 183606	2013-06-08 04:56:05 +00:00
Shuxin Yang	bd254f2601	Fix an assertion in MemCpyOpt pass. The MemCpyOpt pass is capable of optimizing: callee(&S); copy N bytes from S to D. into: callee(&D); subject to some legality constraints. Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))", while D is an opaque-typed, 'sret' formal argument of function being compiled. i.e. the signature of the func being compiled is something like this: T caller(...,%opaque* noalias nocapture sret %D, ...) The fix is that when come across such situation, instead of calling some utility functions to get the size of D's type (which will crash), we simply assume D has at least N bytes as implified by the copy-instruction. rdar://14073661 llvm-svn: 183584	2013-06-07 22:45:21 +00:00
Michael Gottesman	9e7261c874	[objc-arc] Ensure that the cfg path count does not overflow when we multiply TopDownPathCount/BottomUpPathCount. rdar://12480535 llvm-svn: 183489	2013-06-07 06:16:49 +00:00
Rafael Espindola	932470bcd9	Add a testcase from pr16244. llvm-svn: 183433	2013-06-06 19:15:23 +00:00
David Majnemer	29130c5e8d	IndVarSimplify: check if loop invariant expansion can trap IndVarSimplify is willing to move divide instructions outside of their loop bodies if they are invariant of the loop. However, it may not be safe to expand them if we do not know if they can trap. Instead, check to see if it is not safe to expand the instruction and skip the expansion. This fixes PR16041. Testcase by Rafael Ávila de Espíndola. llvm-svn: 183239	2013-06-04 17:51:58 +00:00
Rafael Espindola	a5e536ab0e	Second part of pr16069 The problem this time seems to be a thinko. We were assuming that in the CFG A \| \ \| B \| / C speculating the basic block B would cause only the phi value for the B->C edge to be speculated. That is not true, the phi's are semantically in the edges, so if the A->B->C path is taken, any code needed for A->C is not executed and we have to consider it too when deciding to speculate B. llvm-svn: 183226	2013-06-04 14:11:59 +00:00
David Majnemer	c82f27af2a	SimplifyCFG: Do not transform PHI to select if doing so would be unsafe PR16069 is an interesting case where an incoming value to a PHI is a trap value while also being a 'ConstantExpr'. We do not consider this case when performing the 'HoistThenElseCodeToIf' optimization. Instead, make our modifications more conservative if we detect that we cannot transform the PHI to a select. llvm-svn: 183152	2013-06-03 20:43:12 +00:00
Nick Lewycky	3f715e260a	When determining the new index for an insertelement, we may not assume that an index greater than the size of the vector is invalid. The shuffle may be shrinking the size of the vector. Fixes a crash! Also drop the maximum recursion depth of the safety check for this optimization to five. llvm-svn: 183080	2013-06-01 20:51:31 +00:00
Andrew Trick	ee9143acf5	Prevent loop-unroll from making assumptions about undefined behavior. Fixes rdar:14036816, PR16130. There is an opportunity to compute precise trip counts for 'or' expressions and multi-exit loops. rdar:14038809: Optimize trip count computation for multi-exit loops. To do this we need to record the fact that ExitLimit assumes NSW. When it does not we can safely assume that the loop trip count is the minimum ExitLimt across all subexpressions and loop exits. llvm-svn: 183060	2013-05-31 23:34:46 +00:00
Arnold Schwaighofer	70a9be5297	LoopVectorize: PHIs with only outside users should prevent vectorization We check that instructions in the loop don't have outside users (except if they are reduction values). Unfortunately, we skipped this check for if-convertable PHIs. Fixes PR16184. llvm-svn: 183035	2013-05-31 19:53:50 +00:00
Quentin Colombet	8aa7abe2ae	Modify how the formulae are rated in Loop Strength Reduce. Namely, check if the target allows to fold more that one register in the addressing mode and if yes, adjust the cost accordingly. Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2 needs a temporary register for the computation, whereas it was correctly estimated for reg1 + scale * reg2. <rdar://problem/13973908> llvm-svn: 183021	2013-05-31 17:20:29 +00:00
Rafael Espindola	65281bf36e	Simplify multiplications by vectors whose elements are powers of 2. Patch by Andrea Di Biagio. llvm-svn: 183005	2013-05-31 14:27:15 +00:00
Nick Lewycky	a2b7720618	Reapply with r182909 with a fix to the calculation of the new indices for insertelement instructions. llvm-svn: 182976	2013-05-31 00:59:42 +00:00
Evgeniy Stepanov	2c14269883	Revert r182909. PR/16177 llvm-svn: 182919	2013-05-30 09:40:17 +00:00
Nick Lewycky	d7f27094c0	Swizzle vector inputs if it helps us eliminate shuffles. llvm-svn: 182909	2013-05-30 04:33:38 +00:00
Paul Redmond	5fdf836ba4	Add support for llvm.vectorizer metadata - llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem llvm-svn: 182802	2013-05-28 20:00:34 +00:00
Andrew Trick	e2431c64bc	Track IR ordering of SelectionDAG nodes 3/4. Remove the old IR ordering mechanism and switch to new one. Fix unit test failures. llvm-svn: 182704	2013-05-25 03:08:10 +00:00
Michael Gottesman	e67f40c514	[objc-arc] KnownSafe does not imply that it is safe to perform code motion across CFG edges since even if it is safe to remove RR pairs, we may still be able to move a retain/release into a loop. rdar://13949644 llvm-svn: 182670	2013-05-24 20:44:05 +00:00
Michael Gottesman	5a91bbf33a	[objc-arc] Make sure that multiple owners is propogated correctly through the pass via the usage of a global data structure. rdar://13750319 llvm-svn: 182669	2013-05-24 20:44:02 +00:00

1 2 3 4 5 ...

3747 Commits