llvm-project

Commit Graph

Author	SHA1	Message	Date
Nadav Rotem	13306816fc	LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes. llvm-svn: 180593	2013-04-26 05:08:59 +00:00
Nadav Rotem	f43cbeee15	LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers. llvm-svn: 180570	2013-04-25 19:55:03 +00:00
Arnold Schwaighofer	a6578f7056	LoopVectorize: Scalarize padded types This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196	2013-04-24 16:16:01 +00:00
Arnold Schwaighofer	23a0589bce	LoopVectorizer: Bail out if we don't have datalayout we need it llvm-svn: 180195	2013-04-24 16:15:58 +00:00
Nadav Rotem	71c9d6d333	LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order. This fixes a miscompilation in FreeBSD's regex library. llvm-svn: 180121	2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen	d3c90e132a	Call the potentially costly isAnnotatedParallel() only once. Made the uniform write test's checks a bit stricter. llvm-svn: 180119	2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen	6f2f66b63f	Refuse to (even try to) vectorize loops which have uniform writes, even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081	2013-04-23 08:08:51 +00:00
Anat Shemer	10260a75e3	Changed back (relative to commit 179786) the operations executed when extract(cast) is transformed to cast(extract). It uses the Builder class as before. In addition the result node is added to the Worklist, so all the previous extract users will become the new scalar cast users. llvm-svn: 180045	2013-04-22 20:51:10 +00:00
David Blaikie	f55abeaf4c	Revert "Revert "PR14606: debug info imported_module support"" This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even though the debug info was clearly invalid on all of them, but this ought to fix it. llvm-svn: 179996	2013-04-22 06:12:31 +00:00
Benjamin Kramer	0212dc27ed	SROA: Don't crash on a select with two identical operands. This is an edge case that can happen if we modify a chain of multiple selects. Update all operands in that case and remove the assert. PR15805. llvm-svn: 179982	2013-04-21 17:48:39 +00:00
Arnold Schwaighofer	6eb32b31bd	Revert "SimplifyCFG: If convert single conditional stores" There is the temptation to make this tranform dependent on target information as it is not going to be beneficial on all (sub)targets. Therefore, we should probably do this in MI Early-Ifconversion. This reverts commit r179957. Original commit message: "SimplifyCFG: If convert single conditional stores This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up." llvm-svn: 179980	2013-04-21 13:09:04 +00:00
Nadav Rotem	c57af326a4	SLPVectorize: Add support for vectorization of casts. llvm-svn: 179975	2013-04-21 08:05:59 +00:00
Michael Gottesman	d5b701faf1	[objc-arc] Cleaned up tail-call-invariant-enforcement.ll. Specifically: 1. Added checks that unwind is being properly added to various instructions. 2. Fixed the declaration/calling of objc_release to have a return type of void. 3. Moved all checks to precede the functions and added checks to ensure that the checks would only match inside the specific function that we are attempting to check. llvm-svn: 179973	2013-04-21 02:59:44 +00:00
Michael Gottesman	77aa946321	[objc-arc] Check that objc-arc-expand properly handles all strictly forwarding calls and does not touch calls which are not strictly forwarding (i.e. objc_retainBlock). llvm-svn: 179972	2013-04-21 01:57:46 +00:00
Michael Gottesman	524052fec1	[objc-arc] Renamed the test file clang-arc-used-intrinsic-removed-if-isolated.ll -> intrinsic-use-isolated.ll to match the other test file intrinsic-use.ll. llvm-svn: 179971	2013-04-21 01:42:24 +00:00
Nadav Rotem	8aca44a623	Fix PR15800. Do not try to vectorize vectors and structs. llvm-svn: 179960	2013-04-20 22:29:43 +00:00
Arnold Schwaighofer	3546ccf465	SimplifyCFG: If convert single conditional stores This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up. llvm-svn: 179957	2013-04-20 21:42:09 +00:00
Nuno Lopes	36e827602a	recommit tests llvm-svn: 179955	2013-04-20 17:39:52 +00:00
Nadav Rotem	83c7c41bc2	SLPVectorizer: Improve the cost model for loop invariant broadcast values. llvm-svn: 179930	2013-04-20 06:13:47 +00:00
Benjamin Kramer	630e6e1422	MergeFunc: Make pointer and integer types generate the same hash. The logic that actually compares the types considers pointers and integers the same if they are of the same size. This created a strange mismatch between hash and reality and made the test case for this fail on some platforms (yay, test cases). llvm-svn: 179905	2013-04-19 23:06:44 +00:00
Bill Wendling	24e8a0d5f0	Make variable match any name. llvm-svn: 179903	2013-04-19 22:30:43 +00:00
Bill Wendling	81c8cf5ef9	Try explicitly setting the target triple to see if this gets it to pass on ARM. llvm-svn: 179890	2013-04-19 21:24:51 +00:00
Chad Rosier	11ebe05643	Attempt to pacify this test for the buildbots. llvm-svn: 179874	2013-04-19 19:27:33 +00:00
Bill Wendling	b670649067	Add test to make sure that a int-to-ptr can be merged correctly. llvm-svn: 179869	2013-04-19 18:16:06 +00:00
Benjamin Kramer	ec1bb4fdaf	ConstantFolding: ComputeMaskedBits wants the scalar size for vectors. Fixes PR15791. llvm-svn: 179859	2013-04-19 16:56:24 +00:00
Jakub Staszak	9b59d14fc4	Revert 179826. Tests were worthless. llvm-svn: 179845	2013-04-19 09:32:30 +00:00
Eric Christopher	0e89ade8ff	Revert "PR14606: debug info imported_module support" This reverts commit r179836 as it seems to have caused test failures. llvm-svn: 179840	2013-04-19 07:47:16 +00:00
David Blaikie	88564f3cf7	PR14606: debug info imported_module support Adding another CU-wide list, in this case of imported_modules (since they should be relatively rare, it seemed better to add a list where each element had a "context" value, rather than add a (usually empty) list to every scope). This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll need to expand this to cover DW_TAG_imported_declaration too. llvm-svn: 179836	2013-04-19 06:57:04 +00:00
Jakub Staszak	2c1daf75b9	Don't run expensive -O2 and -O3 in tests. llvm-svn: 179825	2013-04-19 01:10:45 +00:00
Anat Shemer	5570318f43	In the function InstCombiner::visitExtractElementInst() removed the limitation that extract is promoted over a cast only if the cast has only one use. llvm-svn: 179786	2013-04-18 19:56:44 +00:00
Anat Shemer	0c95efad7e	Added a function scalarizePHI() that sclarizes a vector phi instruction if it has only 2 uses: one to promote the vector phi in a loop and the other use is an extract operation of one element at a constant location. llvm-svn: 179783	2013-04-18 19:35:39 +00:00
Arnold Schwaighofer	4cd6aa110c	LoopVectorizer: Recognize min/max reductions A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y) sequence in LLVM. If we see such a sequence we can treat it just as any other commutative binary instruction and reduce it. This appears to help bzip2 by about 1.5% on an imac12,2. radar://12960601 llvm-svn: 179773	2013-04-18 17:22:34 +00:00
Benjamin Kramer	8df2cfb858	LoopVectorize: Use a set to avoid longer cycles in the reduction chain too. Fixes PR15748. llvm-svn: 179757	2013-04-18 14:29:13 +00:00
David Majnemer	81af06e003	Revert "Combine bit test + conditional or into simple math" It is causing stage2 builds to fail, let's get them running again. llvm-svn: 179750	2013-04-18 08:42:33 +00:00
David Majnemer	bdf0caf6b1	Combine bit test + conditional or into simple math Simplify: (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) Into: (or (shl (and X, C1), C3), y) Where: C3 = Log(C2) - Log(C1) If: C1 and C2 are both powers of two llvm-svn: 179748	2013-04-18 07:30:07 +00:00
Michael Gottesman	323964ca9e	[objc-arc] Do not mismatch up retains inside a for loop with releases outside said for loop in the presense of differing provenance caused by escaping blocks. This occurs due to an alloca representing a separate ownership from the original pointer. Thus consider the following pseudo-IR: objc_retain(%a) for (...) { objc_retain(%a) %block <- %a F(%block) objc_release(%block) } objc_release(%a) From the perspective of the optimizer, the %block is a separate provenance from the original %a. Thus the optimizer pairs up the inner retain for %a and the outer release from %a, resulting in segfaults. This is fixed by noting that the signature of a mismatch of retain/releases inside the for loop is a Use/CanRelease top down with an None bottom up (since bottom up the Retain-CanRelease-Use-Release sequence is completed by the inner objc_retain, but top down due to the differing provenance from the objc_release said sequence is not completed). In said case in CheckForCFGHazards, we now clear the state of %a implying that no pairing will occur. Additionally a test case is included. rdar://12969722 llvm-svn: 179747	2013-04-18 05:39:45 +00:00
Michael Gottesman	a15ab25238	Streamline arc-annotation test (removing some cases which do not add any extra coverage) and set it up to use FileCheck variables to make the test more robust. llvm-svn: 179745	2013-04-18 04:34:06 +00:00
Peter Collingbourne	37ae72b508	Do not optimise fprintf() calls if its return value is used. Differential Revision: http://llvm-reviews.chandlerc.com/D620 llvm-svn: 179661	2013-04-17 02:01:10 +00:00
Hans Wennborg	c9e1d99279	simplifycfg: Fix integer overflow converting switch into icmp. If a switch instruction has a case for every possible value of its type, with the same successor, SimplifyCFG would replace it with an icmp ult, but the computation of the bound overflows in that case, which inverts the test. Patch by Jed Davis! llvm-svn: 179587	2013-04-16 08:35:36 +00:00
Bill Wendling	3789171972	We are not able to bitcast a pointer to an integral value. Two return types are not equivalent if one is a pointer and the other is an integral. This is because we cannot bitcast a pointer to an integral value. PR15185 llvm-svn: 179569	2013-04-15 22:33:50 +00:00
Nadav Rotem	b9116e6966	SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops. llvm-svn: 179562	2013-04-15 22:00:26 +00:00
Eric Christopher	13637e900e	Revert "Recommit r179497 after fixing uninitialized variable." until I can fix the testcases here: http://lab.llvm.org:8011/builders/clang-native-arm-cortex-a9/builds/6952 This reverts commit r179512 due to testcases specifying triples that they didn't actually mean and causing failures on other platforms. llvm-svn: 179513	2013-04-15 07:31:37 +00:00
Eric Christopher	fc2beaa136	Recommit r179497 after fixing uninitialized variable. llvm-svn: 179512	2013-04-15 07:07:21 +00:00
Nadav Rotem	5d393c416f	SLPVectorizer: Add support for vectorizing trees that start at compare instructions. llvm-svn: 179504	2013-04-15 04:25:27 +00:00
Eric Christopher	1f140317e3	Revert "Remove some unused triple and data layout." This reverts commit r179497 and the accompanying commit as it broke random platforms that aren't osx. llvm-svn: 179499	2013-04-14 23:35:36 +00:00
Eric Christopher	4eebd14ad0	Remove some unused triple and data layout. llvm-svn: 179498	2013-04-14 23:32:44 +00:00
David Majnemer	1fae195557	Reorders two transforms that collide with each other One performs: (X == 13 \| X == 14) -> X-13 <u 2 The other: (A == C1 \|\| A == C2) -> (A & ~(C1 ^ C2)) == C1 The problem is that there are certain values of C1 and C2 that trigger both transforms but the first one blocks out the second, this generates suboptimal code. Reordering the transforms should be better in every case and allows us to do interesting stuff like turn: %shr = lshr i32 %X, 4 %and = and i32 %shr, 15 %add = add i32 %and, -14 %tobool = icmp ne i32 %add, 0 into: %and = and i32 %X, 240 %tobool = icmp ne i32 %and, 224 llvm-svn: 179493	2013-04-14 21:15:43 +00:00
Nadav Rotem	6ebddae118	Make the command line triple match the module triple. llvm-svn: 179492	2013-04-14 20:13:05 +00:00
Nadav Rotem	029208ceeb	Remove unused function attributes. llvm-svn: 179476	2013-04-14 05:47:04 +00:00
Nadav Rotem	54b413d157	SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree. llvm-svn: 179475	2013-04-14 05:15:53 +00:00
Nadav Rotem	0b9cf8567b	SLPVectorizer: add initial support for reduction variable vectorization. llvm-svn: 179470	2013-04-14 03:22:20 +00:00
Benjamin Kramer	adc1727c39	GlobalDCE: Fix an oversight in my last commit that could lead to crashes. There is a Constant with non-constant operands: blockaddress. llvm-svn: 179460	2013-04-13 16:11:14 +00:00
Benjamin Kramer	89ca4bc6d4	Fix a scalability issue with complex ConstantExprs. This is basically the same fix in three different places. We use a set to avoid walking the whole tree of a big ConstantExprs multiple times. For example: (select cmp, (add big_expr 1), (add big_expr 2)) We don't want to visit big_expr twice here, it may consist of thousands of nodes. The testcase exercises this by creating an insanely large ConstantExprs out of a loop. It's questionable if the optimizer should ever create those, but this can be triggered with real C code. Fixes PR15714. llvm-svn: 179458	2013-04-13 12:53:18 +00:00
Benjamin Kramer	e89c705030	InstCombine: Check the operand types before merging fcmp ord & fcmp ord. Fixes PR15737. llvm-svn: 179417	2013-04-12 21:56:23 +00:00
Nadav Rotem	8543ba3e52	SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. llvm-svn: 179414	2013-04-12 21:16:54 +00:00
Nadav Rotem	87a0af6e1b	CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles. llvm-svn: 179413	2013-04-12 21:15:03 +00:00
David Majnemer	1a08accbb7	Simplify (A & ~B) in icmp if A is a power of 2 The transform will execute like so: (A & ~B) == 0 --> (A & B) != 0 (A & ~B) != 0 --> (A & B) == 0 llvm-svn: 179386	2013-04-12 17:25:07 +00:00
Arnold Schwaighofer	f9cea17f75	LoopVectorizer: integer division is not a reduction operation Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381	2013-04-12 15:15:19 +00:00
David Majnemer	b81cd63c4b	Optimize icmp involving addition better Allows LLVM to optimize sequences like the following: %add = add nsw i32 %x, 1 %cmp = icmp sgt i32 %add, %y into: %cmp = icmp sge i32 %x, %y as well as: %add1 = add nsw i32 %x, 20 %add2 = add nsw i32 %y, 57 %cmp = icmp sge i32 %add1, %add2 into: %add = add nsw i32 %y, 37 %cmp = icmp sle i32 %cmp, %x llvm-svn: 179316	2013-04-11 20:05:46 +00:00
Benjamin Kramer	a95f87494a	Fix for wrong instcombine on vector insert/extract When trying to collapse sequences of insertelement/extractelement instructions into single shuffle instructions, there is one specific case where the Instruction Combiner wrongly updates the resulting Mask of shuffle indexes. The problem is in function CollectShuffleElments. If we have a sequence of insert/extract element instructions like the one below: %tmp1 = extractelement <4 x float> %LHS, i32 0 %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1 %tmp3 = extractelement <4 x float> %RHS, i32 2 %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3 Where: . %RHS will have a mask of [4,5,6,7] . %LHS will have a mask of [0,1,2,3] The Mask of shuffle indexes is wrongly computed to [4,1,6,7] instead of [4,0,6,7]. When analyzing %tmp2 in order to compute the Mask for the resulting shuffle instruction, the algorithm forgets to update the mask index at position 1 with the index associated to the element extracted from %LHS by instruction %tmp1. Patch by Andrea DiBiagio! llvm-svn: 179291	2013-04-11 15:10:09 +00:00
Benjamin Kramer	b50682e156	Add missing colons to check lines. llvm-svn: 179277	2013-04-11 12:41:41 +00:00
Benjamin Kramer	3960c1cd56	FileCheckize a bunch of tests. llvm-svn: 179276	2013-04-11 12:32:23 +00:00
Nadav Rotem	73dffa4184	Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions. llvm-svn: 179207	2013-04-10 19:41:36 +00:00
Nadav Rotem	2d9dec322e	Add support for bottom-up SLP vectorization infrastructure. This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations. The infrastructure has three potential users: 1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]). 2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute. 3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization. This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code: void SAXPY(int x, int y, int a, int i) { x[i] = a * x[i] + y[i]; x[i+1] = a * x[i+1] + y[i+1]; x[i+2] = a * x[i+2] + y[i+2]; x[i+3] = a * x[i+3] + y[i+3]; } llvm-svn: 179117	2013-04-09 19:44:35 +00:00
Nadav Rotem	abcc64fd13	Revert r176408 and r176407 to address PR15540. llvm-svn: 179111	2013-04-09 18:16:05 +00:00
Michael Gottesman	ccc93e72e1	Converted 8x tests of SimplifyCFG to use FileCheck instead of grep. llvm-svn: 179087	2013-04-09 05:18:53 +00:00
Nadav Rotem	7b7585d153	Revert 179071 because it is not the right way to support non standard new/new[] operators. llvm-svn: 179084	2013-04-09 04:43:46 +00:00
Nadav Rotem	9dd90ac5b4	c++ new operators are not malloc-like functions because they do not return uninitialized memory. Users may overide new-operators and implement any function that they like. llvm-svn: 179071	2013-04-08 23:40:47 +00:00
Chandler Carruth	0e8a52d18f	Fix PR15674 (and PR15603): a SROA think-o. The fix for PR14972 in r177055 introduced a real think-o in the store side, likely because I was much more focused on the load side. While we can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily widen a value to be stored, as that changes the width of memory access! Lock down the code path in the store rewriting which would do this to only handle the intended circumstance. All of the existing tests continue to pass, and I've added a test from the PR. llvm-svn: 178974	2013-04-07 11:47:54 +00:00
Michael Gottesman	31ba23aa56	An objc_retain can serve as a use for a different pointer. This is the counterpart to commit r160637, except it performs the action in the bottomup portion of the data flow analysis. llvm-svn: 178922	2013-04-05 22:54:32 +00:00
Michael Gottesman	1d8d25777d	Properly model precise lifetime when given an incomplete dataflow sequence. The normal dataflow sequence in the ARC optimizer consists of the following states: Retain -> CanRelease -> Use -> Release The optimizer before this patch stored the uses that determine the lifetime of the retainable object pointer when it bottom up hits a retain or when top down it hits a release. This is correct for an imprecise lifetime scenario since what we are trying to do is remove retains/releases while making sure that no ``CanRelease'' (which is usually a call) deallocates the given pointer before we get to the ``Use'' (since that would cause a segfault). If we are considering the precise lifetime scenario though, this is not correct. In such a situation, we DO care about the previous sequence, but additionally, we wish to track the uses resulting from the following incomplete sequences: Retain -> CanRelease -> Release (TopDown) Retain <- Use <- Release (BottomUp) NOTE This patch looks large but the most of it consists of updating test cases. Additionally this fix exposed an additional bug. I removed the test case that expressed said bug and will recommit it with the fix in a little bit. llvm-svn: 178921	2013-04-05 22:54:28 +00:00
Shuxin Yang	95adf5258f	Disable the optimization about promoting vector-element-access with symbolic index. This optimization is unstable at this moment; it 1) block us on a very important application 2) PR15200 3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll (the CHECK command compare the output against wrong result) I personally believe this optimization should not have any impact on the autovectorized code, as auto-vectorizer is supposed to put gather/scatter in a "right" way. Although in theory downstream optimizaters might reveal some gather/scatter optimization opportunities, the chance is quite slim. For the hand-crafted vectorizing code, in term of redundancy elimination, load-CSE, copy-propagation and DSE can collectively achieve the same result, but in much simpler way. On the other hand, these optimizers are able to improve the code in a incremental way; in contrast, SROA is sort of all-or-none approach. However, SROA might slighly win in stack size, as it tries to figure out a stretch of memory tightenly cover the area accessed by the dynamic index. rdar://13174884 PR15200 llvm-svn: 178912	2013-04-05 21:07:08 +00:00
Arnold Schwaighofer	df6f67ed87	LoopVectorizer: Pass OperandValueKind information to the cost model Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809	2013-04-04 23:26:27 +00:00
Michael Gottesman	b8c8836594	Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue. The semantics of ARC implies that a pointer passed into an objc_autorelease must live until some point (potentially down the stack) where an autorelease pool is popped. On the other hand, an objc_autoreleaseReturnValue just signifies that the object must live until the end of the given function at least. Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in terms of the semantics of ARC* implying that performing the given strength reduction without any knowledge of how this relates to the autorelease pool pop that is further up the stack violates the semantics of ARC. *Even though objc_autoreleaseReturnValue if you know that no RV optimization will occur is more computationally expensive. llvm-svn: 178612	2013-04-03 02:57:24 +00:00
Bill Wendling	88d06c3b2d	Use a worklist to avoid a sneaky iterator invalidation. The iterator could be invalidated when it's recursively deleting a whole bunch of constant expressions in a constant initializer. Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was run on a `.ll' file, it wouldn't crash. This is why the test first pushes the `.ll' file through `llvm-as' before feeding it to `opt'. PR15440 llvm-svn: 178531	2013-04-02 08:16:45 +00:00
Shuxin Yang	6662fd0f15	Correct assertion condition llvm-svn: 178484	2013-04-01 18:13:05 +00:00
Benjamin Kramer	52ceb44331	X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. llvm-svn: 178459	2013-04-01 10:23:49 +00:00
Shuxin Yang	7b0c94e207	Implement XOR reassociation. It is based on following rules: rule 1: (x \| c1) ^ c2 => (x & ~c1) ^ (c1^c2), only useful when c1=c2 rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2)) rule 3: (x \| c1) ^ (x \| c2) = (x & c3) ^ c3 where c3 = c1 ^ c2 rule 4: (x \| c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2 It reduces an application's size (in terms of # of instructions) by 8.9%. Reviwed by Pete Cooper. Thanks a lot! rdar://13212115 llvm-svn: 178409	2013-03-30 02:15:01 +00:00
Michael Gottesman	9412830090	Updated test0 of retain-not-declared.ll to reflect the fact that objc-arc-expand runs before objc-arc/objc-arc-contract. Specifically, objc-arc-expand will make sure that the objc_retainAutoreleasedReturnValue, objc_autoreleaseReturnValue, and ret will all have %call as an argument. llvm-svn: 178382	2013-03-29 22:44:59 +00:00
Michael Gottesman	3b8f877860	Add clang.arc.used to ModuleHasARC so ARC always runs if said call is present in a module. clang.arc.used is an interesting call for ARC since ObjCARCContract needs to run to remove said intrinsic to avoid a linker error (since the call does not exist). llvm-svn: 178369	2013-03-29 21:15:23 +00:00
Michael Gottesman	49f9885a2a	Non optimizable objc_retainBlock calls are not forwarding. Since we handle optimizable objc_retainBlocks through strength reduction in OptimizableIndividualCalls, we know that all code after that point will only see non-optimizable objc_retainBlock calls. IsForwarding is only called by functions after that point, so it is ok to just classify objc_retainBlock as non-forwarding. <rdar://problem/13249661>. llvm-svn: 178285	2013-03-28 20:11:30 +00:00
Michael Gottesman	158fdf699e	[ObjCARC] Strength reduce objc_retainBlock -> objc_retain if the objc_retainBlock is optimizable. If an objc_retainBlock has the copy_on_escape metadata attached to it AND if the block pointer argument only escapes down the stack, we are allowed to strength reduce the objc_retainBlock to to an objc_retain and thus optimize it. Current there is logic in the ARC data flow analysis to handle this case which is complicated and involved making distinctions in between objc_retainBlock and objc_retain in certain places and considering them the same in others. This patch simplifies said code by: 1. Performing the strength reduction in the initial ARC peephole analysis (ObjCARCOpts::OptimizeIndividualCalls). 2. Changes the ARC dataflow analysis (which runs after the peephole analysis) to consider all objc_retainBlock calls to not be optimizable (since if the call was optimizable, we would have strength reduced it already). This patch leaves in the infrastructure in the ARC dataflow analysis to handle this case, which due to 2 will just be dead code. I am doing this on purpose to separate the removal of the old code from the testing of the new code. <rdar://problem/13249661>. llvm-svn: 178284	2013-03-28 20:11:19 +00:00
Akira Hatanaka	19468cafad	Remove -O3. llvm-svn: 178278	2013-03-28 19:34:14 +00:00
David Blaikie	5692e72f30	Revert "Adding DIImportedModules to DIScopes." This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7. Turns out we're going with a different schema design to represent DW_TAG_imported_modules so we won't need this extra field. llvm-svn: 178215	2013-03-28 02:44:59 +00:00
Akira Hatanaka	99866dd535	Check if Type is a vector before calling function Type::getVectorNumElements. llvm-svn: 178208	2013-03-28 01:28:02 +00:00
Michael Gottesman	46ebe53ead	Added back in the test for arc-annotations. The test was removed since I had not turned off the test during release builds. This fails since ARC annotations support is conditionally compiled out during release builds. I added the proper requires header to assuage this issue. llvm-svn: 178101	2013-03-27 00:09:58 +00:00
David Blaikie	a26d70358f	Adding DIImportedModules to DIScopes. This is just the basic groundwork for supporting DW_TAG_imported_module but I wanted to commit this before pushing support further into Clang or LLVM so that this rather churny change is isolated from the rest of the work. The major churn here is obviously adding another field (within the common DIScope prefix) to all DIScopes (files, classes, namespaces, lexical scopes, etc). This should be the last big churny change needed for DW_TAG_imported_module/using directive support/PR14606. llvm-svn: 178099	2013-03-27 00:07:26 +00:00
Ulrich Weigand	b1e02b2af2	Add test case for commit r178031. llvm-svn: 178038	2013-03-26 17:30:02 +00:00
Bill Wendling	1ee6d8cef4	Remove testcase. It's failing on some platforms but not others. llvm-svn: 177956	2013-03-26 01:10:03 +00:00
Bill Wendling	d78cfa81be	Hmm...not failing...odd llvm-svn: 177955	2013-03-26 01:08:02 +00:00
Bill Wendling	a5a44d4fd6	Temporarily XFAIL this test until Michael can look at it. llvm-svn: 177953	2013-03-26 00:46:31 +00:00
Michael Gottesman	cd4de0f9bb	[ObjCARC Annotations] Added support for displaying the state of pointers at the bottom/top of BBs of the ARC dataflow analysis for both bottomup and topdown analyses. This will allow for verification and analysis of the merge function of the data flow analyses in the ARC optimizer. The actual implementation of this feature is by introducing calls to the functions llvm.arc.annotation.{bottomup,topdown}.{bbstart,bbend} which are only declared. Each such call takes in a pointer to a global with the same name as the pointer whose provenance is being tracked and a pointer whose name is one of our Sequence states and points to a string that contains the same name. To ensure that the optimizer does not consider these annotations in any way, I made it so that the annotations are considered to be of IC_None type. A test case is included for this commit and the previous ObjCARCAnnotation commit. llvm-svn: 177952	2013-03-26 00:42:09 +00:00
John McCall	a237239097	Add an optimizer-side test case for ARC bug <rdar://13195034>, fixed in the frontend with @clang.arc.use. llvm-svn: 177928	2013-03-25 22:09:52 +00:00
Shuxin Yang	389ed4b8f7	Fix a bug in fast-math fadd/fsub simplification. The problem is that the code mistakenly took for granted that following constructor is able to create an APFloat from a SIGNED integer: APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value) rdar://13486998 llvm-svn: 177906	2013-03-25 20:43:41 +00:00
NAKAMURA Takumi	951a9b169b	Disable, for now, llvm/test/Transforms/GCOVProfiling on win32. I'll investigate them later. llvm-svn: 177894	2013-03-25 19:47:20 +00:00
Arnaud A. de Grandmaison	3ee88e8a77	Address issues found by Duncan during post-commit review of r177856. llvm-svn: 177863	2013-03-25 11:47:38 +00:00
Arnaud A. de Grandmaison	9c383d68cf	InstCombine: simplify comparisons to zero of (shl %x, Cst) or (mul %x, Cst) This simplification happens at 2 places : - using the nsw attribute when the shl / mul is used by a sign test - when the shl / mul is compared for (in)equality to zero llvm-svn: 177856	2013-03-25 09:48:49 +00:00
John McCall	20182ac0c7	Kill every call to @clang.arc.use in the ARC contract phase. llvm-svn: 177769	2013-03-22 21:38:36 +00:00
Bill Wendling	d96a7a6be8	Update test. There may be multiple catches, but those will be cleaned up. llvm-svn: 177758	2013-03-22 20:36:39 +00:00
Arnaud A. de Grandmaison	f364bc63e7	InstCombine: Improve the result bitvect type when folding (cmp pred (load (gep GV, i)) C) to a bit test. The original code used i32, and i64 if legal. This introduced unneeded casts when they aren't legal, or when the index variable i has another type. In order of preference: try to use i's type; use the smallest fitting legal type (using an added DataLayout method); default to i32. A testcase checks that this works when the index gep operand is i16. Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com> Reviewed by : Duncan llvm-svn: 177712	2013-03-22 08:25:01 +00:00
David Blaikie	0d7d62e4b2	Move the DIFile in DISubprogram to the beginning to be a common prefix along with other DIScopes llvm-svn: 177674	2013-03-21 22:29:36 +00:00
David Blaikie	cc8d090163	Remove unused field in DISubprogram llvm-svn: 177661	2013-03-21 20:28:52 +00:00
Bill Wendling	5b98172115	Update some EH tests that were violating the new EH model. The landingpad instruction needs to be the first non-PHI instruction in the unwind destination block. llvm-svn: 177650	2013-03-21 18:30:10 +00:00
Meador Inge	6b6a161ccf	Move library call prototype attribute inference to functionattrs The simplify-libcalls pass implemented a doInitialization hook to infer function prototype attributes for well-known functions. Given that the simplify-libcalls pass is going away and that the functionattrs pass is already in place to deduce function attributes, I am moving this logic to the functionattrs pass. This approach was discussed during patch review: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html. llvm-svn: 177619	2013-03-21 00:55:59 +00:00
David Blaikie	43a729d165	Remove unused field in DICompileUnit llvm-svn: 177590	2013-03-20 22:34:33 +00:00
Nick Lewycky	2b7fe9fc93	Don't assume the test directory is writable, use %T to find a writable directory. llvm-svn: 177488	2013-03-20 05:59:40 +00:00
Arnaud A. de Grandmaison	87c473f0d1	IndVarSimplify: do not recompute an IV value outside of the loop if : - it is trivially known to be used inside the loop in a way that can not be optimized away - there is no use outside of the loop which can take advantage of the computation hoisting llvm-svn: 177432	2013-03-19 20:00:22 +00:00
Nick Lewycky	d67186337a	Emit the linkage name instead of the function name, when available. This means that we'll prefer to emit the mangled C++ name (pending a clang change). llvm-svn: 177371	2013-03-19 01:37:55 +00:00
Manman Ren	1217112d11	Check whether a pointer is non-null (isKnownNonNull) in isKnownNonZero. This handles the case where we have an inbounds GEP with alloca as the pointer. This fixes the regression in PR12750 and rdar://13286434. Note that we can also fix this by handling some GEP cases in isKnownNonNull. llvm-svn: 177321	2013-03-18 21:23:25 +00:00
David Tweed	d505b24277	Initially forgotten-to-svn-add test case for r177279. llvm-svn: 177280	2013-03-18 12:07:24 +00:00
Michael Gottesman	a8b60a4fda	Reduced dont-infinite-loop-during-block-escape-analysis.ll with bugpoint and moved it to retain-block-escape-analysis.ll. NOTE I verified that the original bug behind dont-infinite-loop-during-block-escape-analysis.ll occurs when using opt on retain-block-escape-analysis.ll. llvm-svn: 177240	2013-03-17 21:31:12 +00:00
David Blaikie	8fb8224578	Split out filename & directory from DIFile to start generalizing over DIScopes This is the first step to making all DIScopes have a common metadata prefix (so that things (using directives, for example) that can appear in any scope can be added to that common prefix). DIFile is itself a DIScope so the common prefix of all DIScopes cannot be a DIFile - instead it's the raw filename/directory name pair. llvm-svn: 177239	2013-03-17 21:13:55 +00:00
David Blaikie	2e488d1f0d	Generalize debug info test to be resilient to changes in metadata node numbering llvm-svn: 177238	2013-03-17 21:08:22 +00:00
Michael Gottesman	9782183126	The promised test case for r175939. This test makes sure that the ObjCARC escape analysis looks at the uses of instructions which copy the block pointer value by checking all four cases where that can occur. llvm-svn: 177232	2013-03-17 08:42:58 +00:00
Arnold Schwaighofer	9b55e31bcb	LoopVectorizer: Insert some white space to make test case more readable Also remove some unneeded function attributes. llvm-svn: 177114	2013-03-14 21:31:09 +00:00
Arnold Schwaighofer	4991ce9d49	Add missing asserts flag to test - it uses debug flags llvm-svn: 177102	2013-03-14 19:01:58 +00:00
Arnold Schwaighofer	c63cf3a0ae	LoopVectorize: Invert case when we use a vector cmp value to query select cost We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098	2013-03-14 18:54:36 +00:00
Shuxin Yang	2eca602f8b	Perform factorization as a last resort of unsafe fadd/fsub simplification. Rules include: 1)1 xy +/- xz => x*(y +/- z) (the order of operands dosen't matter) 2) y/x +/- z/x => (y +/- z)/x The transformation is disabled if the new add/sub expr "y +/- z" is a denormal/naz/inifinity. rdar://12911472 llvm-svn: 177088	2013-03-14 18:08:26 +00:00
Chandler Carruth	a1c54bbe34	PR14972: SROA vs. GVN exposed a really bad bug in SROA. The fundamental problem is that SROA didn't allow for overly wide loads where the bits past the end of the alloca were masked away and the load was sufficiently aligned to ensure there is no risk of page fault, or other trapping behavior. With such widened loads, SROA would delete the load entirely rather than clamping it to the size of the alloca in order to allow mem2reg to fire. This was exposed by a test case that neatly arranged for GVN to run first, widening certain loads, followed by an inline step, and then SROA which miscompiles the code. However, I see no reason why this hasn't been plaguing us in other contexts. It seems deeply broken. Diagnosing all of the above took all of 10 minutes of debugging. The really annoying aspect is that fixing this completely breaks the pass. ;] There was an implicit reliance on the fact that no loads or stores extended past the alloca once we decided to rewrite them in the final stage of SROA. This was used to encode information about whether the loads and stores had been split across multiple partitions of the original alloca. That required threading explicit tracking of whether a use of a partition is split across multiple partitions. Once that was done, another problem arose: we allowed splitting of integer loads and stores iff they were loads and stores to the entire alloca. This is a really arbitrary limitation, and splitting at least some integer loads and stores is crucial to maximize promotion opportunities. My first attempt was to start removing the restriction entirely, but currently that does Very Bad Things by causing many common alloca patterns to be fully decomposed into i8 operations and lots of or-ing together to produce larger integers on demand. The code bloat is terrifying. That is still the right end-goal, but substantial work must be done to either merge partitions or ensure that small i8 values are eagerly merged in some other pass. Sadly, figuring all this out took essentially all the time and effort here. So the end result is that we allow splitting only when the load or store at least covers the alloca. That ensures widened loads and stores don't hurt SROA, and that we don't rampantly decompose operations more than we have previously. All of this was already fairly well tested, and so I've just updated the tests to cover the wide load behavior. I can add a test that crafts the pass ordering magic which caused the original PR, but that seems really brittle and to provide little benefit. The fundamental problem is that widened loads should Just Work. llvm-svn: 177055	2013-03-14 11:32:24 +00:00
Nick Lewycky	3d28d4dee7	Remove a change to the debug info in this test, that I made while testing something else and forgot to remove. llvm-svn: 177007	2013-03-14 05:28:10 +00:00
Nick Lewycky	d11060d971	Try using %S to find the emitted .gcno file. llvm-svn: 177006	2013-03-14 05:23:30 +00:00
Nick Lewycky	fdfed3e9c9	Refactor GCOV's six constructor arguments into a struct with a getter that constructs default arguments. It can now take default arguments from cl::opt'ions. Add a new -default-gcov-version=... option, and actually test it! Sink the reverse-order of the version into GCOVProfiling, hiding it from our users. llvm-svn: 177002	2013-03-14 05:13:26 +00:00
David Blaikie	0d221159a0	Remove the unused 4th operand for DIFile debug info metadata llvm-svn: 176983	2013-03-13 22:05:21 +00:00
David Blaikie	1ca2f36289	Refactor filename/directory in DICompileUnit into a DIFile This is the next step towards making the metadata for DIScopes have a common prefix rather than having to delegate based on their tag type. llvm-svn: 176913	2013-03-13 00:01:35 +00:00
David Blaikie	452c3ff649	Remove unused "isMain" field from DICompileUnit llvm-svn: 176910	2013-03-12 22:43:04 +00:00
David Blaikie	a4f770d51c	Update debug info test cases with empty SplitDebugFilename field. This could be 'null' or the empty string, DIDescriptor::getStringField coalesces the two cases anyway so it's just a matter of legible/efficient representation. The change in behavior of the DICompileUnit::get* functions could be subsumed by the full verification check - but ideally that should just be an assertion if we could front-load the actual debug info metadata failure paths. llvm-svn: 176907	2013-03-12 22:25:36 +00:00
Jan Wen Voung	6dc3076080	Revert the test moves from 176733. Use "REQUIRES: asserts" instead. llvm-svn: 176873	2013-03-12 16:27:52 +00:00
David Blaikie	47922fb006	Upgrading debug info test cases to be (more) compatible with the current debug info format. These cases were found by further work to remove support for debug info versioning. Common cleanups (other than changing the version info in the tag field) included adding the last parameter to compile_units (recently added for fission support) and other cases of trailing fields in lexical blocks, compile units, and subprograms. llvm-svn: 176834	2013-03-11 22:37:40 +00:00
Bill Wendling	9534d8885f	Don't remove a landing pad if the invoke requires a table entry. An invoke may require a table entry. For instance, when the function it calls is expected to throw. <rdar://problem/13360379> llvm-svn: 176827	2013-03-11 20:53:00 +00:00
Benjamin Kramer	fc0c7bf0d7	Fix test case. llvm-svn: 176773	2013-03-09 18:34:27 +00:00
Benjamin Kramer	01b75cc0f2	Test case hygiene. llvm-svn: 176772	2013-03-09 18:25:40 +00:00
Arnold Schwaighofer	4090b61ac3	LoopVectorizer: Ignore dbg.value instructions We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic and don't transfer them to the vectorized code. radar://13378964 llvm-svn: 176768	2013-03-09 15:56:34 +00:00
Jan Wen Voung	7857a64909	Disable statistics on Release builds and move tests that depend on -stats. Summary: Statistics are still available in Release+Asserts (any +Asserts builds), and stats can also be turned on with LLVM_ENABLE_STATS. Move some of the FastISel stats that were moved under DEBUG() back out of DEBUG(), since stats are disabled across the board now. Many tests depend on grepping "-stats" output. Move those into a orig_dir/Stats/. so that they can be marked as unsupported when building without statistics. Differential Revision: http://llvm-reviews.chandlerc.com/D486 llvm-svn: 176733	2013-03-08 22:56:31 +00:00
Benjamin Kramer	10a74ed434	Force cpu in test. llvm-svn: 176702	2013-03-08 17:01:18 +00:00
Benjamin Kramer	37c2d65c5a	Insert the reduction start value into the first bypass block to preserve domination. Fixes PR15344. llvm-svn: 176701	2013-03-08 16:58:37 +00:00
Andrew Trick	a0a5ca06b9	SimplifyCFG fix for volatile load/store. Fixes rdar:13349374. Volatile loads and stores need to be preserved even if the language standard says they are undefined. "volatile" in this context means "get out of the way compiler, let my platform handle it". Additionally, this is the only way I know of with llvm to write to the first page (when hardware allows) without dropping to assembly. llvm-svn: 176599	2013-03-07 01:03:35 +00:00
Jim Grosbach	95d2eb95c3	InstCombine: Don't shrink allocas when combining with a bitcast. When considering folding a bitcast of an alloca into the alloca itself, make sure we don't shrink the amount of memory being allocated, or things rapidly go sideways. rdar://13324424 llvm-svn: 176547	2013-03-06 05:44:53 +00:00
Nuno Lopes	589443bd93	recommit r172363 & r171325 (reverted in r172756) This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation fingers crossed so that it does break clang boostrap again.. llvm-svn: 176408	2013-03-02 11:36:24 +00:00
Arnold Schwaighofer	20ef54f4c1	X86 cost model: Adjust cost for custom lowered vector multiplies This matters for example in following matrix multiply: int mmult(int rows, int cols, int m1, int m2, int m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403	2013-03-02 04:02:52 +00:00
Nadav Rotem	739e37a0d2	PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. The LoopVectorizer often runs multiple times on the same function due to inlining. When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches. With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time. PR14448. llvm-svn: 176399	2013-03-02 01:33:49 +00:00
Benjamin Kramer	12f98fae98	LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses. Fixes PR15384. llvm-svn: 176366	2013-03-01 19:07:31 +00:00
Quentin Colombet	e684a6d4aa	Fix a bug in instcombine for fmul in fast math mode. The instcombine recognized pattern looks like: a = b * c d = a +/- Cst or a = b * c d = Cst +/- a When creating the new operands for fadd or fsub instruction following the related fmul, the first operand was created with the second original operand (M0 was created with C1) and the second with the first (M1 with Opnd0). The fix consists in creating the new operands with the appropriate original operand, i.e., M0 with Opnd0 and M1 with C1. llvm-svn: 176300	2013-02-28 21:12:40 +00:00
Benjamin Kramer	dc145816fd	LoopVectorize: Vectorize math builtin calls. This properly asks TargetLibraryInfo if a call is available and if it is, it can be translated into the corresponding LLVM builtin. We don't vectorize sqrt() yet because I'm not sure about the semantics for negative numbers. The other intrinsic should be exact equivalents to the libm functions. Differential Revision: http://llvm-reviews.chandlerc.com/D465 llvm-svn: 176188	2013-02-27 15:24:19 +00:00
Michael Ilseman	a7b93c1e5f	Constant fold vector bitcasts of halves similarly to how floats and doubles are folded. Test case included. llvm-svn: 176131	2013-02-26 22:51:07 +00:00
Benjamin Kramer	ee40b9a2d4	CVP: If we have a PHI with an incoming select, try to skip the select. This is a common pattern with dyn_cast and similar constructs, when the PHI no longer depends on the select it can often be turned into a simpler construct or even get hoisted out of the loop. PR15340. llvm-svn: 175995	2013-02-24 15:34:43 +00:00
Benjamin Kramer	b867fea5e6	Fix invalid IR in test, missing incoming value for PHI node. llvm-svn: 175994	2013-02-24 15:34:29 +00:00
Renato Golin	0890ace58a	Some more tests for the global structure vectorizer llvm-svn: 175964	2013-02-23 12:48:30 +00:00
Renato Golin	adc1b07002	More tests to global struct vectorizer llvm-svn: 175898	2013-02-22 16:18:31 +00:00
Bill Wendling	a032374ea0	Use references to attribute groups on the call/invoke instructions. Listing all of the attributes for the callee of a call/invoke instruction is way too much and makes the IR unreadable. Use references to attributes instead. llvm-svn: 175877	2013-02-22 09:09:42 +00:00
Renato Golin	cf928cb53f	Allow GlobalValues to vectorize with AliasAnalysis Storing the load/store instructions with the values and inspect them using Alias Analysis to make sure they don't alias, since the GEP pointer operand doesn't take the offset into account. Trying hard to not add any extra cost to loads and stores that don't overlap on global values, AA is only calculated if all of the previous attempts failed. Using biggest vector register size as the stride for the vectorization access, as we're being conservative and the cost model (which calculates the real vectorization factor) is only run after the legalization phase. We might re-think this relationship in the future, but for now, I'd rather be safe than sorry. llvm-svn: 175818	2013-02-21 22:39:03 +00:00

1 2 3 4 5 ...

3747 Commits