llvm-project

Commit Graph

Author	SHA1	Message	Date
Cameron Zwarich	ca4c633489	Fix another case of <rdar://problem/9184212> that only occurs with code generated by llvm-gcc, since llvm-gcc uses 2 i64s for passing a 4 x float vector on ARM rather than an i64 array like Clang. llvm-svn: 129878	2011-04-20 21:48:38 +00:00
Frits van Bommel	d097212a08	Add test cases for Jay's r129641 and fix a 32-bit-centric testcase in a file with a 64-bit datalayout. llvm-svn: 129643	2011-04-16 14:31:50 +00:00
Chris Lattner	0ab5e2cded	Fix a ton of comment typos found by codespell. Patch by Luis Felipe Strano Moraes! llvm-svn: 129558	2011-04-15 05:18:47 +00:00
Eli Friedman	2395626605	Add an instcombine for constructs like a \| -(b != c); a select is more canonical, and generally leads to better code. Found while looking at an article about saturating arithmetic. llvm-svn: 129545	2011-04-14 22:41:27 +00:00
Owen Anderson	92651ec374	Fix an infinite alternation in JumpThreading where two transforms would repeatedly undo each other. The solution is to perform more aggressive constant folding to make one of the edges just folded away rather than trying to thread it. Fixes <rdar://problem/9284786>. Discovered with CSmith. llvm-svn: 129538	2011-04-14 21:35:50 +00:00
Mon P Wang	2e5528f0b2	Vectors with different number of elements of the same element type can have the same allocation size but different primitive sizes(e.g., <3xi32> and <4xi32>). When ScalarRepl promotes them, it can't use a bit cast but should use a shuffle vector instead. llvm-svn: 129472	2011-04-13 21:40:02 +00:00
Dan Gohman	1c6c34834b	Fix reassociate to use a worklist instead of recursing when new reassociation opportunities are exposed. This fixes a bug where the nested reassociation expects to be the IR to be consistent, but it isn't, because the outer reassociation has disconnected some of the operands. rdar://9167457 llvm-svn: 129324	2011-04-12 00:11:56 +00:00
Chris Lattner	e81d045d94	remove the StructRetPromotion pass. It is unused, not maintained and has some bugs. If this is interesting functionality, it should be reimplemented in the argpromotion pass. llvm-svn: 129314	2011-04-11 23:09:44 +00:00
Eli Friedman	9cca0715aa	Add back a couple checks removed by r129128; the fact that an intitializer is an array of structures doesn't imply it's a ConstantArray of ConstantStruct. llvm-svn: 129207	2011-04-09 09:11:09 +00:00
Chris Lattner	88974f4625	fix PR9523, a crash in looprotate on a non-canonical loop made out of indirectbr. llvm-svn: 129203	2011-04-09 07:25:58 +00:00
Eli Friedman	17822fcde9	PR9604; try to deal with RAUW updates correctly in the AST. I'm not convinced it's completely safe to cache the AST across LICM runs even with this fix, but this fix can't hurt. llvm-svn: 129198	2011-04-09 06:55:46 +00:00
Eli Friedman	4db39cefdb	Test for r129190. llvm-svn: 129197	2011-04-09 06:39:43 +00:00
Devang Patel	bc3d8b212f	Do not let debug info interfer with branch folding. llvm-svn: 129114	2011-04-07 23:11:25 +00:00
Devang Patel	197c35298a	While hoisting common code from if/else, hoist debug info intrinsics if they match. llvm-svn: 129078	2011-04-07 17:27:36 +00:00
Eli Friedman	c5f22a7815	PR9634: Don't unconditionally tell the AliasSetTracker that the PreheaderLoad is equivalent to any other relevant value; it isn't true in general. If it is equivalent, the LoopPromoter will tell the AST the equivalence. Also, delete the PreheaderLoad if it is unused. Chris, since you were the last one to make major changes here, can you check that this is sane? llvm-svn: 129049	2011-04-07 01:35:06 +00:00
Nadav Rotem	cc771acd77	This testcase passed even without the fix. Added the target info to make the test fail (without the fix). Thanks Dan. llvm-svn: 128999	2011-04-06 11:18:29 +00:00
Nadav Rotem	a069c6ce05	InstCombine optimizes gep(bitcast(x)) even when the bitcasts casts away address space info. We crash with an assert in this case. This change checks that the address space of the bitcasted pointer is the same as the gep ptr. llvm-svn: 128884	2011-04-05 14:29:52 +00:00
Eli Friedman	17bf4922c9	PR9446: RecursivelyDeleteTriviallyDeadInstructions can delete the instruction after the given instruction; make sure to handle that case correctly. (It's difficult to trigger; the included testcase involves a dead block, but I don't think that's a requirement.) While I'm here, get rid of the unnecessary warning about SimplifyInstructionsInBlock, since it should work correctly as far as I know. llvm-svn: 128782	2011-04-02 22:45:17 +00:00
Benjamin Kramer	d121765e64	InstCombine: Turn icmp + sext into bitwise/integer ops when the input has only one unknown bit. int test1(unsigned x) { return (x&8) ? 0 : -1; } int test3(unsigned x) { return (x&8) ? -1 : 0; } before (x86_64): _test1: andl $8, %edi cmpl $1, %edi sbbl %eax, %eax ret _test3: andl $8, %edi cmpl $1, %edi sbbl %eax, %eax notl %eax ret after: _test1: shrl $3, %edi andl $1, %edi leal -1(%rdi), %eax ret _test3: shll $28, %edi movl %edi, %eax sarl $31, %eax ret llvm-svn: 128732	2011-04-01 20:09:10 +00:00
Nadav Rotem	d74b72b8a9	Instcombile optimization: extractelement(cast) -> cast(extractelement) llvm-svn: 128683	2011-03-31 22:57:29 +00:00
Benjamin Kramer	5291054ef1	InstCombine: APFloat can't perform arithmetic on PPC double doubles, don't even try. Thanks Eli! llvm-svn: 128676	2011-03-31 21:35:49 +00:00
Benjamin Kramer	be209ab8a2	InstCombine: Fix transform to use the swapped predicate. Thanks Frits! llvm-svn: 128628	2011-03-31 10:46:03 +00:00
Benjamin Kramer	d159d94644	InstCombine: fold fcmp (fneg x), (fneg y) -> fcmp x, y llvm-svn: 128627	2011-03-31 10:12:22 +00:00
Benjamin Kramer	a8c5d0872d	InstCombine: fold fcmp pred (fneg x), C -> fcmp swap(pred) x, -C llvm-svn: 128626	2011-03-31 10:12:15 +00:00
Benjamin Kramer	cbb18e91a8	InstCombine: Shrink "fcmp (fpext x), C" to "fcmp x, C" if C can be losslessly converted to the type of x. Fixes PR9592. llvm-svn: 128625	2011-03-31 10:12:07 +00:00
Benjamin Kramer	2ccfbc8b71	InstCombine: fold fcmp (fpext x), (fpext y) -> fcmp x, y. llvm-svn: 128624	2011-03-31 10:11:58 +00:00
Bill Wendling	5034159c5f	* The DSE code that tested for overlapping needed to take into account the fact that one of the numbers is signed while the other is unsigned. This could lead to a wrong result when the signed was promoted to an unsigned int. * Add the data layout line to the testcase so that it will test the appropriate thing. Patch by David Terei! llvm-svn: 128577	2011-03-30 21:37:19 +00:00
Benjamin Kramer	af0ed953c5	Avoid turning a floating point division with a constant power of two into a denormal multiplication. Some platforms may treat denormals as zero, on other platforms multiplication with a subnormal is slower than dividing by a normal. llvm-svn: 128555	2011-03-30 17:02:54 +00:00
Benjamin Kramer	8564e0de96	InstCombine: If the divisor of an fdiv has an exact inverse, turn it into an fmul. Fixes PR9587. llvm-svn: 128546	2011-03-30 15:42:35 +00:00
Benjamin Kramer	272f2b0044	InstCombine: Add a few missing combines for ANDs and ORs of sign bit tests. On x86 we now compile "if (a < 0 && b < 0)" into testl %edi, %esi js IF.THEN llvm-svn: 128496	2011-03-29 22:06:41 +00:00
Cameron Zwarich	ff811cc475	Do some simple copy propagation through integer loads and stores when promoting vector types. This helps a lot with inlined functions when using the ARM soft float ABI. Fixes <rdar://problem/9184212>. llvm-svn: 128453	2011-03-29 05:19:52 +00:00
Nick Lewycky	8544228d5a	Teach the transformation that moves binary operators around selects to preserve the subclass optional data. llvm-svn: 128388	2011-03-27 19:51:23 +00:00
Frits van Bommel	0bb2ad2cf7	Constant folding support for calls to umul.with.overflow(), basically identical to the smul.with.overflow() code. llvm-svn: 128379	2011-03-27 14:26:13 +00:00
Nick Lewycky	83167df787	Add a small missed optimization: turn X == C ? X : Y into X == C ? C : Y. This removes one use of X which helps it pass the many hasOneUse() checks. In my analysis, this turns up very often where X = A >>exact B and that can't be simplified unless X has one use (except by increasing the lifetime of A which is generally a performance loss). llvm-svn: 128373	2011-03-27 07:30:57 +00:00
Cameron Zwarich	d4174ee43e	Fix a typo and add a test. llvm-svn: 128331	2011-03-26 04:58:50 +00:00
Bill Wendling	db40b5c899	PR9561: A store with a negative offset (via GEP) could erroniously say that it completely overlaps a previous store, thus mistakenly deleting that store. Check for this condition. llvm-svn: 128319	2011-03-26 01:20:37 +00:00
Cameron Zwarich	10ebc189ee	Fix PR9464 by correcting some math that just happened to be right in most cases that were hit in practice. llvm-svn: 128146	2011-03-23 05:25:55 +00:00
Anders Carlsson	ee6bc70d2f	Add an optimization to GlobalOpt that eliminates calls to __cxa_atexit, if the function passed is empty. llvm-svn: 127970	2011-03-20 17:59:11 +00:00
Andrew Trick	1c4b42d00f	Avoid creating canonical induction variables for non-native types. For example, on 32-bit architecture, don't promote all uses of the IV to 64-bits just because one use is a 64-bit cast. Alternate implementation of the patch by Arnaud de Grandmaison. llvm-svn: 127884	2011-03-18 16:50:32 +00:00
Eli Friedman	c17c9a78aa	FileCheck-ize and update test. llvm-svn: 127845	2011-03-18 01:10:31 +00:00
Devang Patel	aad34d882d	Try to not lose variable's debug info during instcombine. This is done by lowering dbg.declare intrinsic into dbg.value intrinsic. Radar 9143931. llvm-svn: 127834	2011-03-17 22:18:16 +00:00
Cameron Zwarich	0454253d7a	Only convert allocas to scalars if it is profitable. The profitability metric I chose is having a non-memcpy/memset use and being larger than any native integer type. Originally I chose having an access of a size smaller than the total size of the alloca, but this caused some minor issues on the spirit benchmark where SRoA runs again after some inlining. This fixes <rdar://problem/8613163>. llvm-svn: 127718	2011-03-16 00:13:44 +00:00
Cameron Zwarich	7b0f3c6a1a	Add native integer type TargetData to some existing tests. llvm-svn: 127717	2011-03-16 00:13:40 +00:00
Cameron Zwarich	0b8cdfb6ec	Do not add PHIs with no users when creating LCSSA form. Patch by Andrew Clinton. llvm-svn: 127674	2011-03-15 07:41:25 +00:00
Eli Friedman	c4414c6e92	PR9450: Make switch optimization in SimplifyCFG not dependent on the ordering of pointers in an std::map. llvm-svn: 127650	2011-03-15 02:23:35 +00:00
Eric Christopher	2139d3148f	If we don't know how long a string is we can't fold an _chk version to the normal version. Fixes rdar://9123638 llvm-svn: 127636	2011-03-15 00:25:41 +00:00
Benjamin Kramer	5acc751b6f	Teach ComputeMaskedBits about sub nsw. llvm-svn: 127548	2011-03-12 17:18:11 +00:00
Cameron Zwarich	338d362200	Roll r127459 back in: Optimize trivial branches in CodeGenPrepare, which often get created from the lowering of objectsize intrinsics. Unfortunately, a number of tests were relying on llc not optimizing trivial branches, so I had to add an option to allow them to continue to test what they originally tested. This fixes <rdar://problem/8785296> and <rdar://problem/9112893>. llvm-svn: 127498	2011-03-11 21:52:04 +00:00
Daniel Dunbar	94ccb27b43	Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often get created from the", it broke some GCC test suite tests. llvm-svn: 127477	2011-03-11 19:30:30 +00:00
Benjamin Kramer	391a946fa9	ComputeMaskedBits: sub falls through to add, and sub doesn't have the same overflow semantics as add. Should fix the selfhost failures that started with r127463. llvm-svn: 127465	2011-03-11 14:46:49 +00:00
Benjamin Kramer	51897bcd3e	InstCombine: Fix a thinko where transform an icmp under the assumption that it's a zero comparison when it's not. Fixes PR9454. llvm-svn: 127464	2011-03-11 11:37:40 +00:00
Nick Lewycky	cc79973856	Teach ComputeMaskedBits about nsw on add. I don't think there's anything we can do with nuw here, but sub and mul should be given similar treatment. Fixes PR9343 #15! llvm-svn: 127463	2011-03-11 09:00:19 +00:00
Cameron Zwarich	cc27b3acc4	Optimize trivial branches in CodeGenPrepare, which often get created from the lowering of objectsize intrinsics. Unfortunately, a number of tests were relying on llc not optimizing trivial branches, so I had to add an option to allow them to continue to test what they originally tested. This fixes <rdar://problem/8785296> and <rdar://problem/9112893>. llvm-svn: 127459	2011-03-11 04:54:27 +00:00
Dan Gohman	154ed49784	Fix reassociate to postpone certain instruction deletions until after it has finished all of its reassociations, because its habit of unlinking operands and holding them in a datastructure while working means that it's not easy to determine when an instruction is really dead until after all its regular work is done. rdar://9096268. llvm-svn: 127424	2011-03-10 19:51:54 +00:00
Benjamin Kramer	b49b964b98	InstCombine: Turn umul_with_overflow into mul nuw if we can prove that it cannot overflow. This happens a lot in clang-compiled C++ code because it adds overflow checks to operator new[]: unsigned foo(unsigned n) { return new unsigned[n]; } We can optimize away the overflow check on 64 bit targets because (uint64_t)n4 cannot overflow. llvm-svn: 127418	2011-03-10 18:40:14 +00:00
Benjamin Kramer	1885d21700	Fix mistyped CHECK lines. llvm-svn: 127366	2011-03-09 22:07:31 +00:00
Devang Patel	13f8c7d48e	Preserve line number information while simplifying libcalls. llvm-svn: 127362	2011-03-09 21:27:52 +00:00
Cameron Zwarich	718918b07a	Add a test case for r127320. llvm-svn: 127321	2011-03-09 08:11:02 +00:00
Nick Lewycky	980104d1d6	Add another micro-optimization. Apologies for the lack of refactoring, but I gave up when I realized I couldn't come up with a good name for what the refactored function would be, to describe what it does. This is PR9343 test12, which is test3 with arguments reordered. Whoops! llvm-svn: 127318	2011-03-09 06:26:03 +00:00
Cameron Zwarich	3b649f4d01	Add support to scalar replacement for partial vector accesses of an alloca, e.g. a union of a float, <2 x float>, and <4 x float>. This mostly comes up with the use of vector intrinsics, especially in NEON when programmers know the layout of the register file. This enables codegen to eliminate a lot of the subregister traffic it would otherwise generate. This commit only enables this for a small number of floating-point cases, but a lot more integer cases. I assume this is okay for all ports, but I did not do extensive testing of the quality of code involving i512 vectors and the like. If there is a use case where this generates worse code than before, let me know and we can scale it back. This fixes <rdar://problem/9036264>. llvm-svn: 127317	2011-03-09 05:43:05 +00:00
Eli Friedman	a81a82dcaf	PR9346: Prevent SimplifyDemandedBits from incorrectly introducing INT_MIN % -1. llvm-svn: 127306	2011-03-09 01:28:35 +00:00
Eli Friedman	aac35b3fbb	PR9420; an instruction before an unreachable is guaranteed not to have any reachable uses, but there still might be uses in dead blocks. Use the standard solution of replacing all the uses with undef. This is a rare case because it's very sensitive to phase ordering in SimplifyCFG. llvm-svn: 127299	2011-03-09 00:48:33 +00:00
Duncan Sands	7dc3d47c34	Fix PR9331. Simplified version of a patch by Jakub Staszak. llvm-svn: 127243	2011-03-08 12:39:03 +00:00
Devang Patel	97d0be8ee1	While sinking an instruction, do not lose llvm.dbg.value intrinsic. llvm-svn: 127214	2011-03-08 03:06:19 +00:00
Devang Patel	d00c628f8f	Preserve line no. info. Radar `9097659` llvm-svn: 127182	2011-03-07 22:43:45 +00:00
Rafael Espindola	15a29867ed	Add test for r127138. llvm-svn: 127172	2011-03-07 21:28:14 +00:00
Nick Lewycky	ac55c79dd6	Tweak this test. We can analyze what happens and show that we still do the right thing, instead of merely being unable to analyze and the transform doesn't occur. llvm-svn: 127149	2011-03-07 02:10:18 +00:00
Nick Lewycky	e467979d0a	Add more analysis of the sign bit of an srem instruction. If the LHS is negative then the result could go either way. If it's provably positive then so is the srem. Fixes PR9343 #7! llvm-svn: 127146	2011-03-07 01:50:10 +00:00
Nick Lewycky	92db8e8e39	ConstantInt has some getters which return ConstantInt's or ConstantVector's of the value splatted into every element. Extend this to getTrue and getFalse which by providing new overloads that take Types that are either i1 or <N x i1>. Use it in InstCombine to add vector support to some code, fixing PR8469! llvm-svn: 127116	2011-03-06 03:36:19 +00:00
Nick Lewycky	9719a719c7	Thread comparisons over udiv/sdiv/ashr/lshr exact and lshr nuw/nsw whenever possible. This goes into instcombine and instsimplify because instsimplify doesn't need to check hasOneUse since it returns (almost exclusively) constants. This fixes PR9343 #4 #5 and #8! llvm-svn: 127064	2011-03-05 05:19:11 +00:00
Nick Lewycky	25cc338d88	Try once again to optimize "icmp (srem X, Y), Y" by turning the comparison into true/false or "icmp slt/sge Y, 0". llvm-svn: 127063	2011-03-05 04:28:48 +00:00
Nick Lewycky	41c529bd09	Revert broken srem logic from r126991. llvm-svn: 127021	2011-03-04 19:26:08 +00:00
Nick Lewycky	8e3a79da9f	Fold "icmp pred (srem X, Y), Y" like we do for urem. Handle signed comparisons in the urem case, though not the other way around. This is enough to get #3 from PR9343! llvm-svn: 126991	2011-03-04 10:06:52 +00:00
Nick Lewycky	3cec6f5563	Teach instruction simplify to use constant ranges to solve problems of the form "icmp pred %X, CI" and a number of examples where "%X = binop %Y, CI2". Some of these cases (div and rem) used to make it through opt -O2, but the others are probably now making code elsewhere redundant (probably instcombine). llvm-svn: 126988	2011-03-04 07:00:57 +00:00
Richard Osborne	af52c52569	Optimize fprintf -> iprintf if there are no floating point arguments and siprintf is available on the target. llvm-svn: 126940	2011-03-03 14:20:22 +00:00
Richard Osborne	2dfb888392	Optimize sprintf -> siprintf if there are no floating point arguments and siprintf is available on the target. llvm-svn: 126937	2011-03-03 14:09:28 +00:00
Richard Osborne	815de536e5	Optimize printf -> iprintf if there are no floating point arguments and iprintf is available on the target. Currently iprintf is only marked as being available on the XCore. llvm-svn: 126935	2011-03-03 13:17:51 +00:00
Anders Carlsson	da80afef99	Make InstCombiner::FoldAndOfICmps create a ConstantRange that's the intersection of the LHS and RHS ConstantRanges and return "false" when the range is empty. This simplifies some code and catches some extra cases. llvm-svn: 126744	2011-03-01 15:05:01 +00:00
Nick Lewycky	c9d20067cd	Optimize "icmp pred (urem X, Y), Y" --> true/false depending on pred. There's more work to do here, "icmp ult (urem X, 10), 11" doesn't optimize away yet. Fixes example 3 from PR9343! llvm-svn: 126741	2011-03-01 08:15:50 +00:00
Eli Friedman	683bbc16c4	Add an obvious missing safety check to DAE::RemoveDeadArgumentsFromCallers. llvm-svn: 126720	2011-03-01 00:33:47 +00:00
Dan Gohman	6564ca0c23	Delete obsolete test. llvm-svn: 126680	2011-02-28 19:58:14 +00:00
Frits van Bommel	8ae07996c9	Teach SimplifyCFG that (switch (select cond, X, Y)) is better expressed as a branch. Based on a patch by Alistair Lynn. llvm-svn: 126647	2011-02-28 09:44:07 +00:00
Nick Lewycky	66f4f22f7b	srem doesn't actually have the same resulting sign as its numerator, you could also have a zero when numerator = denominator. Reverts parts of r126635 and r126637. llvm-svn: 126644	2011-02-28 09:17:39 +00:00
Nick Lewycky	174a705497	Teach InstCombine to fold "(shr exact X, Y) == 0" --> X == 0, fixing #1 from PR9343. llvm-svn: 126643	2011-02-28 08:31:40 +00:00
Nick Lewycky	6b445419b0	The sign of an srem instruction is the sign of its dividend (the first argument), regardless of the divisor. Teach instcombine about this and fix test7 in PR9343! llvm-svn: 126635	2011-02-28 06:20:05 +00:00
Benjamin Kramer	ceb5daa567	Revert "SimplifyCFG: GEPs with just one non-constant index are also cheap." Yes, there are other types than i8* and GEPs on them can produce an add+multiply. We don't consider that cheap enough to be speculatively executed. llvm-svn: 126481	2011-02-25 10:33:33 +00:00
Benjamin Kramer	dfdca1a14d	SimplifyCFG: GEPs with just one non-constant index are also cheap. llvm-svn: 126452	2011-02-24 23:26:09 +00:00
Benjamin Kramer	27361a7124	SimplifyCFG: GEPs with constant indices are cheap enough to be executed unconditionally. llvm-svn: 126445	2011-02-24 22:46:11 +00:00
Chris Lattner	adf38b3e09	change instcombine to not turn a call to non-varargs bitcast of function prototype into a call to a varargs prototype. We do allow the xform if we have a definition, but otherwise we don't want to risk that we're changing the abi in a subtle way. On X86-64, for example, varargs require passing stuff in %al. llvm-svn: 126363	2011-02-24 05:10:56 +00:00
Cameron Zwarich	826308586c	Make LoopDeletion work on loops with multiple edges, as long as the incoming values from all of the loop's exiting blocks are equal. Patch by Andrew Clinton. llvm-svn: 126253	2011-02-22 22:25:39 +00:00
Benjamin Kramer	d5d7f37beb	InstCombine: Add a bunch of combines of the form x \| (y ^ z). We usually catch this kind of optimization through InstSimplify's distributive magic, but or doesn't distribute over xor in general. "A \| ~(A \| B) -> A \| ~B" hits 24 times on gcc.c. llvm-svn: 126081	2011-02-20 13:23:43 +00:00
Nick Lewycky	c8a1569950	Teach RecursivelyDeleteDeadPHINodes to handle multiple self-references. Patch by Andrew Clinton! llvm-svn: 126077	2011-02-20 08:38:20 +00:00
Eli Friedman	ef200db4fd	PR9218: SimplifyDemandedVectorElts can return a non-null value that is not the instruction passed in. Make sure to account for this correctly, instead of looping infinitely. llvm-svn: 126058	2011-02-19 22:42:40 +00:00
Chris Lattner	72a35fb974	rewrite the memset_pattern pattern generation stuff to accept any 2/4/8/16-byte constant, including globals. This makes us generate much more "pretty" pattern globals as well because it doesn't break it down to an array of bytes all the time. This enables us to handle stores of relocatable globals. This kicks in about 48 times in 254.gap, giving us stuff like this: @.memset_pattern40 = internal constant [2 x %struct.TypHeader* (%struct.TypHeader, %struct.TypHeader)] [%struct.TypHeader (%struct.TypHeader, %struct .TypHeader)* @IsFalse, %struct.TypHeader* (%struct.TypHeader, %struct.TypHeader)* @IsFalse], align 16 ... call void @memset_pattern16(i8* %scevgep5859, i8* bitcast ([2 x %struct.TypHeader* (%struct.TypHeader, %struct.TypHeader)] @.memset_pattern40 to i8* ), i64 %tmp75) nounwind llvm-svn: 126044	2011-02-19 19:56:44 +00:00
Chris Lattner	acf6b0776a	Stores of null pointers should turn into memset, we weren't recognizing them as splat values. llvm-svn: 126041	2011-02-19 19:35:49 +00:00
Chris Lattner	0f4a64011e	Implement rdar://9009151, transforming strided loop stores of unsplatable values into memset_pattern16 when it is available (recent darwins). This transforms lots of strided loop stores of ints for example, like 5 in vpr: Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25) from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4 llvm-svn: 126040	2011-02-19 19:31:39 +00:00
Duncan Sands	84653b3674	Add some transforms of the kind X-Y>X -> 0>Y which are valid when there is no overflow. These subsume some existing equality transforms, so zap those. llvm-svn: 125843	2011-02-18 16:25:37 +00:00
Chris Lattner	6b88c76f13	add a testcase for r125827 llvm-svn: 125831	2011-02-18 05:05:01 +00:00
Chris Lattner	1a924e770a	prevent jump threading from merging blocks when their address is taken (and used!). This prevents merging the blocks (invalidating the block addresses) in a case like this: #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) void foo() { printf("%p\n", _THIS_IP_); printf("%p\n", _THIS_IP_); printf("%p\n", _THIS_IP_); } which fixes PR4151. llvm-svn: 125829	2011-02-18 04:43:06 +00:00
Chris Lattner	a8fed47eed	have instcombine preserve nsw/nuw/exact when sinking common operations through a phi. llvm-svn: 125790	2011-02-17 23:01:49 +00:00
Chris Lattner	abb8eb2c63	fix instcombine merging GEPs through a PHI to only make the result inbounds if all of the inputs are inbounds. llvm-svn: 125785	2011-02-17 22:21:26 +00:00
Nadav Rotem	7cc6d12ad0	Enhance constant folding of bitcast operations on vectors of floats. Add getAllOnesValue of FP numbers to Constants and APFloat. Add more tests. llvm-svn: 125776	2011-02-17 21:22:27 +00:00
Duncan Sands	e522001171	Transform "A + B >= A + C" into "B >= C" if the adds do not wrap. Likewise for some variations (some of these were already present so I unified the code). Spotted by my auto-simplifier as occurring a lot. llvm-svn: 125734	2011-02-17 07:46:37 +00:00
Chris Lattner	5592071768	preserve NUW/NSW when transforming add x,x llvm-svn: 125711	2011-02-17 02:23:02 +00:00
Chris Lattner	0ad64291d8	filecheckize llvm-svn: 125710	2011-02-17 02:21:03 +00:00
Chris Lattner	3eb0af94c4	fix PR9215, preventing -reassociate from clearing nsw/nuw when it swaps the LHS/RHS of a single binop. llvm-svn: 125700	2011-02-17 01:29:24 +00:00
Nick Lewycky	038124b671	Teach PatternMatch that splat vectors could be floating point as well as integer. Fixes PR9228! llvm-svn: 125613	2011-02-15 23:13:23 +00:00
Nadav Rotem	67d67a0385	Fix 9216 - Endless loop in InstCombine pass. The pattern "A&(A^B) -> A & ~B" recreated itself because ~B is actually a xor -1. llvm-svn: 125557	2011-02-15 07:13:48 +00:00
Devang Patel	3058398655	Do not hoist @llvm.dbg.value. Here, @llvm.dbg.value is "referring" a value that is modified inside loop. llvm-svn: 125529	2011-02-14 23:03:23 +00:00
Duncan Sands	d114ab331c	Teach instsimplify that X+Y>=X+Z is the same as Y>=Z if neither side overflows, plus some variations of this. According to my auto-simplifier this occurs a lot but usually in combination with max/min idioms. Because max/min aren't handled yet this unfortunately doesn't have much effect in the testsuite. llvm-svn: 125462	2011-02-13 17:15:40 +00:00
Nadav Rotem	0e162c57f8	Fix test llvm-svn: 125460	2011-02-13 16:13:16 +00:00
Nadav Rotem	27b848afb0	Fix a regression from r125393; It caused a crash in MultiSource/Benchmarks/Bullet. Opt hit an assertion with "opt -std-compile-opts" because Constant::getAllOnesValue doesn't know how to handle floats. This patch added a test to reproduce the problem and a check that the destination vector is of integer type. Thank you Benjamin! llvm-svn: 125459	2011-02-13 15:45:34 +00:00
Chris Lattner	333e27d74b	add PR# llvm-svn: 125455	2011-02-13 08:27:31 +00:00
Chris Lattner	43273affb9	implement instcombine folding for things like (x >> c) < 42. We were previously simplifying divisions, but not right shifts! llvm-svn: 125454	2011-02-13 08:07:21 +00:00
Daniel Dunbar	210ce0feb5	SimplifyLibCalls: Add missing legalize check on various printf to puts and putchar transforms, their return values are not compatible. llvm-svn: 125442	2011-02-12 18:19:57 +00:00
Daniel Dunbar	76c95562bc	tests: FileCheckize llvm-svn: 125441	2011-02-12 18:19:53 +00:00
Benjamin Kramer	1800d823de	Also fold (A+B) == A -> B == 0 when the add is commuted. llvm-svn: 125411	2011-02-11 21:46:48 +00:00
Nadav Rotem	10134c33f2	Fix 9173. Add more folding patterns to constant expressions of vector selects and vector bitcasts. llvm-svn: 125393	2011-02-11 19:37:55 +00:00
Cameron Zwarich	4c898c239e	Add a test for the LSR issue exposed by r125254. llvm-svn: 125325	2011-02-11 00:49:27 +00:00
Nick Lewycky	ac0b62c277	Tolerate degenerate phi nodes that can occur in the middle of optimization passes. Fixes PR9112. Patch by Jakub Staszak! llvm-svn: 125319	2011-02-10 23:54:10 +00:00
Cameron Zwarich	d8e66038f4	Rename 'loopsimplify' to 'loop-simplify'. llvm-svn: 125317	2011-02-10 23:38:10 +00:00
Chris Lattner	d86ded17ad	implement the first part of PR8882: when lowering an inbounds gep to explicit addressing, we know that none of the intermediate computation overflows. This could use review: it seems that the shifts certainly wouldn't overflow, but could the intermediate adds overflow if there is a negative index? Previously the testcase would instcombine to: define i1 @test(i64 %i) { %p1.idx.mask = and i64 %i, 4611686018427387903 %cmp = icmp eq i64 %p1.idx.mask, 1000 ret i1 %cmp } now we get: define i1 @test(i64 %i) { %cmp = icmp eq i64 %i, 1000 ret i1 %cmp } llvm-svn: 125271	2011-02-10 07:11:16 +00:00
Chris Lattner	6b657aed33	Enhance a bunch of transformations in instcombine to start generating exact/nsw/nuw shifts and have instcombine infer them when it can prove that the relevant properties are true for a given shift without them. Also, a variety of refactoring to use the new patternmatch logic thrown in for good luck. I believe that this takes care of a bunch of related code quality issues attached to PR8862. llvm-svn: 125267	2011-02-10 05:36:31 +00:00
Chris Lattner	98457101fc	Enhance the "compare with shift" and "compare with div" optimizations to be much more aggressive in the face of exact/nsw/nuw div and shifts. For example, these (which are the same except the first is 'exact' sdiv: define i1 @sdiv_icmp4_exact(i64 %X) nounwind { %A = sdiv exact i64 %X, -5 ; X/-5 == 0 --> x == 0 %B = icmp eq i64 %A, 0 ret i1 %B } define i1 @sdiv_icmp4(i64 %X) nounwind { %A = sdiv i64 %X, -5 ; X/-5 == 0 --> x == 0 %B = icmp eq i64 %A, 0 ret i1 %B } compile down to: define i1 @sdiv_icmp4_exact(i64 %X) nounwind { %1 = icmp eq i64 %X, 0 ret i1 %1 } define i1 @sdiv_icmp4(i64 %X) nounwind { %X.off = add i64 %X, 4 %1 = icmp ult i64 %X.off, 9 ret i1 %1 } This happens when you do something like: (ptr1-ptr2) == 42 where the pointers are pointers to non-unit types. llvm-svn: 125266	2011-02-10 05:23:05 +00:00
Chris Lattner	dcef03fba2	more cleanups, notably bitcast isn't used for "signed to unsigned type conversions". :) llvm-svn: 125265	2011-02-10 05:17:27 +00:00
Chris Lattner	9e4aa0259f	Teach instsimplify some tricks about exact/nuw/nsw shifts. improve interfaces to instsimplify to take this info. llvm-svn: 125196	2011-02-09 17:15:04 +00:00
Chris Lattner	206b065afb	merge two tests. llvm-svn: 125195	2011-02-09 17:06:41 +00:00
Nick Lewycky	292e78c3cd	When removing a function from the function set and adding it to deferred, we could end up removing a different function than we intended because it was functionally equivalent, then end up with a comparison of a function against itself in the next round of comparisons (the one in the function set and the one on the deferred list). To fix this, I introduce a choice in the form of comparison for ComparableFunctions, either normal or "pointer only" used to find exact Function*'s in lookups. Also add some debugging statements. llvm-svn: 125180	2011-02-09 06:32:02 +00:00
Benjamin Kramer	8d6a8c130b	SimplifyCFG: Track the number of used icmps when turning a icmp chain into a switch. If we used only one icmp, don't turn it into a switch. Also prevent the switch-to-icmp transform from creating identity adds, noticed by Marius Wachtler. llvm-svn: 125056	2011-02-07 22:37:28 +00:00
Chris Lattner	6e57b15228	teach instsimplify to transform (X / Y) * Y to X when the div is an exact udiv. llvm-svn: 124994	2011-02-06 22:05:31 +00:00
Chris Lattner	9c70414551	rename test. llvm-svn: 124993	2011-02-06 21:59:10 +00:00
Chris Lattner	35315d065b	enhance vmcore to know that udiv's can be exact, and add a trivial instcombine xform to exercise this. Nothing forms exact udivs yet though. This is progress on PR8862 llvm-svn: 124992	2011-02-06 21:44:57 +00:00
Anders Carlsson	d21b06a0db	When loading from a constant, fold inttoptr if the integer type and the resulting pointer type both have the same size. llvm-svn: 124987	2011-02-06 20:11:56 +00:00
Benjamin Kramer	62aa46b852	SimplifyCFG: Also transform switches that represent a range comparison but are not sorted into sub+icmp. This transforms another 1000 switches in gcc.c. llvm-svn: 124826	2011-02-03 22:51:41 +00:00
Duncan Sands	06504025d2	Improve threading of comparisons over select instructions (spotted by my auto-simplifier). This has a big impact on Ada code, but not much else. Unfortunately the impact is mostly negative! This is due to PR9004 (aka SCCP failing to resolve conditional branch conditions in the destination blocks of the branch), in which simple correlated expressions are not resolved but complicated ones are, so simplifying has a bad effect! llvm-svn: 124788	2011-02-03 09:37:39 +00:00
Duncan Sands	5747abab10	Reenable the transform "(X*Y)/Y->X" when the multiplication is known not to overflow (nsw flag), which was disabled because it breaks 254.gap. I have informed the GAP authors of the mistake in their code, and arranged for the testsuite to use -fwrapv when compiling this benchmark. llvm-svn: 124746	2011-02-02 20:52:00 +00:00
Benjamin Kramer	f4ea1d5f79	SimplifyCFG: Turn switches into sub+icmp+branch if possible. This makes the job of the later optzn passes easier, allowing the vast amount of icmp transforms to chew on it. We transform 840 switches in gcc.c, leading to a 16k byte shrink of the resulting binary on i386-linux. The testcase from README.txt now compiles into decl %edi cmpl $3, %edi sbbl %eax, %eax andl $1, %eax ret llvm-svn: 124724	2011-02-02 15:56:22 +00:00
Dan Gohman	08d2c98c23	Fix reassociate to clear optional flags, such as nsw. llvm-svn: 124712	2011-02-02 02:02:34 +00:00
Duncan Sands	cf0ff030a8	Have m_One also match constant vectors for which every element is 1. llvm-svn: 124655	2011-02-01 08:39:12 +00:00
Anders Carlsson	f23a6da271	Recognize and simplify (A+B) == A -> B == 0 A == (A+B) -> B == 0 llvm-svn: 124567	2011-01-30 22:01:13 +00:00
Duncan Sands	2e5a58da8f	Commit 124487 broke 254.gap. See if disabling the part that might be triggered by PR9088 fixes things. llvm-svn: 124561	2011-01-30 18:24:20 +00:00
Duncan Sands	b67edc6a29	Transform (X/Y)*Y into X if the division is exact. Instcombine already knows how to do this and more, but would only do it if X/Y had only one use. Spotted as the most common missed simplification in SPEC by my auto-simplifier, now that it knows about nuw/nsw/exact flags. This removes a bunch of multiplications from 447.dealII and 483.xalancbmk. It also removes a lot from tramp3d-v4, which results in much more inlining. llvm-svn: 124560	2011-01-30 18:03:50 +00:00
Nick Lewycky	97a2895e73	Add the select optimization recently added to instcombine to constant folding. This is the one where one of the branches of the select is another select on the same condition. llvm-svn: 124547	2011-01-29 20:35:06 +00:00
Frits van Bommel	c2549661af	Move InstCombine's knowledge of fdiv to SimplifyInstruction(). llvm-svn: 124534	2011-01-29 15:26:31 +00:00
Duncan Sands	2e9e4f1be3	Fix typo: should have been testing that X was odd, not V. llvm-svn: 124533	2011-01-29 13:27:00 +00:00
Evan Cheng	73c29178ac	Add a test for TCE return duplication. llvm-svn: 124527	2011-01-29 04:53:35 +00:00
Evan Cheng	d983eba7dc	Re-apply r124518 with fix. Watch out for invalidated iterator. llvm-svn: 124526	2011-01-29 04:46:23 +00:00
Evan Cheng	65b8ccf6ac	Revert r124518. It broke Linux self-host. llvm-svn: 124522	2011-01-29 02:43:04 +00:00
Evan Cheng	d4eff31476	Re-commit r124462 with fixes. Tail recursion elim will now dup ret into unconditional predecessor to enable TCE on demand. llvm-svn: 124518	2011-01-29 01:29:26 +00:00
Duncan Sands	771e82a863	My auto-simplifier noticed that ((X/Y)Y)/Y occurs several times in SPEC benchmarks, and that it can be simplified to X/Y. (In general you can only simplify (ZY)/Y to Z if the multiplication did not overflow; if Z has the form "X/Y" then this is the case). This patch implements that transform and moves some Div logic out of instcombine and into InstructionSimplify. Unfortunately instcombine gets in the way somewhat, since it likes to change (X/Y)Y into X-(X rem Y), so I had to teach instcombine about this too. Finally, thanks to the NSW/NUW flags, sometimes we know directly that "ZY" does not overflow, because the flag says so, so I added that logic too. This eliminates a bunch of divisions and subtractions in 447.dealII, and has good effects on some other benchmarks too. It seems to have quite an effect on tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions changed, resulting in massive changes all over. llvm-svn: 124487	2011-01-28 16:51:11 +00:00
Evan Cheng	aaa9606b2f	Revert r124462. There are a few big regressions that I need to fix first. llvm-svn: 124478	2011-01-28 07:12:38 +00:00
Nick Lewycky	db34be0e31	Clean up the tests a little, make sure we match an instruction in the right test. llvm-svn: 124473	2011-01-28 05:13:17 +00:00
Nick Lewycky	b074e32641	Fold select + select where both selects are on the same condition. llvm-svn: 124469	2011-01-28 03:28:10 +00:00
Evan Cheng	417fca86c4	- Stop simplifycfg from duplicating "ret" instructions into unconditional branches. PR8575, rdar://5134905, rdar://8911460. - Allow codegen tail duplication to dup small return blocks after register allocation is done. llvm-svn: 124462	2011-01-28 02:19:21 +00:00
Nick Lewycky	13e04aef2a	Fix surprising missed optimization in mergefunc where we forgot to consider that relationships like "i8* null" is equivalent to "i32* null". llvm-svn: 124368	2011-01-27 08:38:19 +00:00
Duncan Sands	69bdb585b2	Fix PR9039, a use-after-free in reassociate. The issue was that the operand being factorized (and erased) could occur several times in Ops, resulting in freed memory being used when the next occurrence in Ops was analyzed. llvm-svn: 124287	2011-01-26 10:08:38 +00:00
Duncan Sands	9e9d5b25e2	In which I discover that zero+zero is zero, d'oh! llvm-svn: 124188	2011-01-25 15:14:15 +00:00
Duncan Sands	c78548d791	Turn off this test - the corresponding instsimplify logic has been disabled. llvm-svn: 124185	2011-01-25 12:31:43 +00:00
Duncan Sands	d395108394	According to my auto-simplifier the most common missed simplifications in optimized code are: (non-negative number)+(power-of-two) != 0 -> true and (x \| 1) != 0 -> true Instcombine knows about the second one of course, but only does it if X\|1 has only one use. These fire thousands of times in the testsuite. llvm-svn: 124183	2011-01-25 09:38:29 +00:00
Nick Lewycky	f1cec164ce	Teach mergefunc how to emit aliases safely again -- but keep it turned it off for now. It's controlled by the HasGlobalAliases variable which is not attached to any flag yet. llvm-svn: 124182	2011-01-25 08:56:50 +00:00
Chris Lattner	2bcec1297e	merge all the "crash tests" into crash.ll llvm-svn: 124101	2011-01-24 03:37:34 +00:00
Chris Lattner	b4017769ae	fix PR9017, a bug where we'd assert when promoting in unreachable code. llvm-svn: 124100	2011-01-24 03:29:07 +00:00
Chris Lattner	d83e7b0ff6	enhance SRoA to promote allocas that are used by PHI nodes. This often occurs because instcombine sinks loads and inserts phis. This kicks in on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in spec2006 in some app that uses std::deque. This resolves the last of rdar://7339113. llvm-svn: 124090	2011-01-24 01:07:11 +00:00
Chris Lattner	a960725d18	Enhance SRoA to promote allocas that are used by selects in some common cases. This triggers a surprising number of times in SPEC2K6 because min/max idioms end up doing this. For example, code from the STL ends up looking like this to SRoA: %202 = load i64* %__old_size, align 8, !tbaa !3 %203 = load i64* %__old_size, align 8, !tbaa !3 %204 = load i64* %__n, align 8, !tbaa !3 %205 = icmp ult i64 %203, %204 %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size %206 = load i64* %storemerge.i, align 8, !tbaa !3 We can now promote both the __n and the __old_size allocas. This addresses another chunk of rdar://7339113, poor codegen on stringswitch. llvm-svn: 124088	2011-01-23 22:04:55 +00:00
Chris Lattner	9491dee24e	Enhance SRoA to be more aggressive about scalarization of aggregate allocas that have PHI or select uses of their element pointers. This can often happen when instcombine sinks two loads into a successor, inserting a phi or select. With this patch, we can scalarize the alloca, but the pinned elements are not yet promoted. This is still a win for large aggregates where only one element is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen on stringswitch). llvm-svn: 124070	2011-01-23 08:27:54 +00:00
Chris Lattner	a587ab7b94	remove an old hack that avoided creating MMX datatypes. The X86 backend has been fixed. llvm-svn: 124064	2011-01-23 06:40:33 +00:00
Dan Gohman	19e30d5a7d	Actually check memcpy lengths, instead of just commenting about how they should be checked. llvm-svn: 123999	2011-01-21 22:07:57 +00:00
Owen Anderson	a834200dbe	Just because we have determined that an (fcmp \| fcmp) is true for A < B, A == B, and A > B, does not mean we can fold it to true. We still need to check for A ? B (A unordered B). llvm-svn: 123993	2011-01-21 19:39:42 +00:00
Chris Lattner	b5e15d1907	fix PR9013, an infinite loop in instcombine. llvm-svn: 123968	2011-01-21 05:29:50 +00:00
Nick Lewycky	6a083cf820	Don't try to pull vector bitcasts that change the number of elements through a select. A vector select is pairwise on each element so we'd need a new condition with the right number of elements to select on. Fixes PR8994. llvm-svn: 123963	2011-01-21 02:30:43 +00:00
Nick Lewycky	39b12c059d	Add a constant folding of casts from zero to zero. Fixes PR9011! While here, I'd like to complain about how vector is not an aggregate type according to llvm::Type::isAggregateType(), but they're listed under aggregate types in the LangRef and zero vectors are stored as ConstantAggregateZero. llvm-svn: 123956	2011-01-21 01:12:09 +00:00
Duncan Sands	8fb2c3827c	At -O123 the early-cse pass is run before instcombine has run. According to my auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0. This patch adds this transform and some related logic to InstructionSimplify and removes some of the logic from instcombine (unfortunately not all because there are several situations in which instcombine can improve things by making new instructions, whereas instsimplify is not allowed to do this). At -O2 this often results in more than 15% more simplifications by early-cse, and results in hundreds of lines of bitcode being eliminated from the testsuite. I did see some small negative effects in the testsuite, for example a few additional instructions in three programs. One program, 483.xalancbmk, got an additional 35 instructions, which seems to be due to a function getting an additional instruction and then being inlined all over the place. llvm-svn: 123911	2011-01-20 13:21:55 +00:00
Rafael Espindola	fc355bc070	Add unnamed_addr when we can show that address of a global is not used. llvm-svn: 123834	2011-01-19 16:32:21 +00:00
Duncan Sands	99589d07e9	For completeness, generalize the (X + Y) - Y -> X transform and add X - (X + 1) -> -1. These were not recommended by my auto-simplifier since they don't fire often enough. However they do fire from time to time, for example they remove one subtraction from the final bitcode for 483.xalancbmk. llvm-svn: 123755	2011-01-18 11:50:19 +00:00
Duncan Sands	9b8e2bd8ef	Simplify (X<<1)-X into X. According to my auto-simplier this is the most common missed simplification in fully optimized code. It occurs sporadically in the testsuite, and many times in 403.gcc: the final bitcode has 131 fewer subtractions after this change. The reason that the multiplies are not eliminated is the same reason that instcombine did not catch this: they are used by other instructions (instcombine catches this with a more general transform which in general is only profitable if the operands have only one use). llvm-svn: 123754	2011-01-18 09:24:58 +00:00
Nick Lewycky	872a453ada	Test for lazy value info's ability to prove the absense of NULLs in pointers. llvm-svn: 123601	2011-01-16 21:57:20 +00:00
Anders Carlsson	d3db83349e	Teach DAE to look for functions whose arguments are unused, and change all callers to pass in an undefvalue instead. llvm-svn: 123596	2011-01-16 21:25:33 +00:00
Rafael Espindola	751677a040	Don't merge two constants if we care about the address of both. This fixes the original testcase in PR8927. It also causes a clang binary built with a patched clang to increase in size by 0.21%. We can probably get some of the size back by writing a pass that detects that a global never has its pointer compared and adds unnamed_addr to it (maybe extend global opt). It is also possible that there are some other cases clang could add unnamed_addr to. I will investigate extending globalopt next. llvm-svn: 123584	2011-01-16 17:05:09 +00:00
Owen Anderson	ec3b10fc56	Reduce and merge testcases. llvm-svn: 123579	2011-01-16 09:13:31 +00:00
Chris Lattner	e5f8de8639	fix PR8932, a case where arg promotion could infinitely promote. llvm-svn: 123574	2011-01-16 08:09:24 +00:00
Chris Lattner	6fab2e9418	if an alloca is only ever accessed as a unit, and is accessed with load/store instructions, then don't try to decimate it into its individual pieces. This will just make a mess of the IR and is pointless if none of the elements are individually accessed. This was generating really terrible code for std::bitset (PR8980) because it happens to be lowered by clang as an {[8 x i8]} structure instead of {i64}. The testcase now is optimized to: define i64 @test2(i64 %X) { br label %L2 L2: ; preds = %0 ret i64 %X } before we generated: define i64 @test2(i64 %X) { %sroa.store.elt = lshr i64 %X, 56 %1 = trunc i64 %sroa.store.elt to i8 %sroa.store.elt8 = lshr i64 %X, 48 %2 = trunc i64 %sroa.store.elt8 to i8 %sroa.store.elt9 = lshr i64 %X, 40 %3 = trunc i64 %sroa.store.elt9 to i8 %sroa.store.elt10 = lshr i64 %X, 32 %4 = trunc i64 %sroa.store.elt10 to i8 %sroa.store.elt11 = lshr i64 %X, 24 %5 = trunc i64 %sroa.store.elt11 to i8 %sroa.store.elt12 = lshr i64 %X, 16 %6 = trunc i64 %sroa.store.elt12 to i8 %sroa.store.elt13 = lshr i64 %X, 8 %7 = trunc i64 %sroa.store.elt13 to i8 %8 = trunc i64 %X to i8 br label %L2 L2: ; preds = %0 %9 = zext i8 %1 to i64 %10 = shl i64 %9, 56 %11 = zext i8 %2 to i64 %12 = shl i64 %11, 48 %13 = or i64 %12, %10 %14 = zext i8 %3 to i64 %15 = shl i64 %14, 40 %16 = or i64 %15, %13 %17 = zext i8 %4 to i64 %18 = shl i64 %17, 32 %19 = or i64 %18, %16 %20 = zext i8 %5 to i64 %21 = shl i64 %20, 24 %22 = or i64 %21, %19 %23 = zext i8 %6 to i64 %24 = shl i64 %23, 16 %25 = or i64 %24, %22 %26 = zext i8 %7 to i64 %27 = shl i64 %26, 8 %28 = or i64 %27, %25 %29 = zext i8 %8 to i64 %30 = or i64 %29, %28 ret i64 %30 } In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough PHIs are in play that instcombine backs off. It's better to not generate this stuff in the first place. llvm-svn: 123571	2011-01-16 06:18:28 +00:00
Chris Lattner	d55581ded8	enhance FoldOpIntoPhi in instcombine to try harder when a phi has multiple uses. In some cases, all the uses are the same operation, so instcombine can go ahead and promote the phi. In the testcase this pushes an add out of the loop. llvm-svn: 123568	2011-01-16 05:28:59 +00:00
Owen Anderson	4e54efd625	Improve the safety of my globalopt enhancement by ensuring that the bitcast of the stored value to the new store type is always. Also, add a testcase. llvm-svn: 123563	2011-01-16 04:33:33 +00:00
Chris Lattner	08f43456c9	fix PR8983, a broken assertion. llvm-svn: 123562	2011-01-16 03:43:53 +00:00
Chris Lattner	1e209b87ad	remove the partial specialization pass. It is unmaintained and has bugs. llvm-svn: 123554	2011-01-16 00:27:10 +00:00
Nick Lewycky	0296a481f9	Make constmerge a two-pass algorithm so that it won't miss merging opporuntities. Fixes PR8978. llvm-svn: 123541	2011-01-15 18:14:21 +00:00
Chris Lattner	af26390790	temporarily revert r123526. While working on a follow-on patch I realize that ConstantFoldTerminator doesn't preserve dominfo. llvm-svn: 123527	2011-01-15 07:51:19 +00:00
Chris Lattner	8df83c4a24	fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code The basic issue is that isel (very reasonably!) expects conditional branches to be folded, so CGP leaving around a bunch dead computation feeding conditional branches isn't such a good idea. Just fold branches on constants into unconditional branches. llvm-svn: 123526	2011-01-15 07:36:13 +00:00
Chris Lattner	1b93be501d	Now that instruction optzns can update the iterator as they go, we can have objectsize folding recursively simplify away their result when it folds. It is important to catch this here, because otherwise we won't eliminate the cross-block values at isel and other times. llvm-svn: 123524	2011-01-15 07:25:29 +00:00
Chris Lattner	9c10d587f6	implement an instcombine xform that canonicalizes casts outside of and-with-constant operations. This fixes rdar://8808586 which observed that we used to compile: union xy { struct x { _Bool b[15]; } x; __attribute__((packed)) struct y { __attribute__((packed)) unsigned long b0to7; __attribute__((packed)) unsigned int b8to11; __attribute__((packed)) unsigned short b12to13; __attribute__((packed)) unsigned char b14; } y; }; struct x foo(union xy *xy) { return xy->x; } into: _foo: ## @foo movq (%rdi), %rax movabsq $1095216660480, %rcx ## imm = 0xFF00000000 andq %rax, %rcx movabsq $-72057594037927936, %rdx ## imm = 0xFF00000000000000 andq %rax, %rdx movzbl %al, %esi orq %rdx, %rsi movq %rax, %rdx andq $65280, %rdx ## imm = 0xFF00 orq %rsi, %rdx movq %rax, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rdx, %rsi movl %eax, %edx andl $-16777216, %edx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rdx orq %rcx, %rdx movabsq $280375465082880, %rcx ## imm = 0xFF0000000000 movq %rax, %rsi andq %rcx, %rsi orq %rdx, %rsi movabsq $71776119061217280, %r8 ## imm = 0xFF000000000000 andq %r8, %rax orq %rsi, %rax movzwl 12(%rdi), %edx movzbl 14(%rdi), %esi shlq $16, %rsi orl %edx, %esi movq %rsi, %r9 shlq $32, %r9 movl 8(%rdi), %edx orq %r9, %rdx andq %rdx, %rcx movzbl %sil, %esi shlq $32, %rsi orq %rcx, %rsi movl %edx, %ecx andl $-16777216, %ecx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rcx movq %rdx, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rcx, %rsi movq %rdx, %rcx andq $65280, %rcx ## imm = 0xFF00 orq %rsi, %rcx movzbl %dl, %esi orq %rcx, %rsi andq %r8, %rdx orq %rsi, %rdx ret We now compile this into: _foo: ## @foo ## BB#0: ## %entry movzwl 12(%rdi), %eax movzbl 14(%rdi), %ecx shlq $16, %rcx orl %eax, %ecx shlq $32, %rcx movl 8(%rdi), %edx orq %rcx, %rdx movq (%rdi), %rax ret A small improvement :-) llvm-svn: 123520	2011-01-15 06:32:33 +00:00
Duncan Sands	d6f1a9584d	Turn X-(X-Y) into Y. According to my auto-simplifier this is the most common simplification present in fully optimized code (I think instcombine fails to transform some of these when "X-Y" has more than one use). Fires here and there all over the test-suite, for example it eliminates 8 subtractions in the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc. llvm-svn: 123442	2011-01-14 15:26:10 +00:00
Duncan Sands	571fd9a606	Factorize common code out of the InstructionSimplify shift logic. Add in threading of shifts over selects and phis while there. This fires here and there in the testsuite, to not much effect. For example when compiling spirit it fires 5 times, during early-cse, resulting in 6 more cse simplifications, and 3 more terminators being folded by jump threading, but the final bitcode doesn't change in any interesting way: other optimizations would have caught the opportunity anyway, only later. llvm-svn: 123441	2011-01-14 14:44:12 +00:00
Duncan Sands	c3eb0f4b2e	Rename this test. llvm-svn: 123440	2011-01-14 14:16:33 +00:00
Chris Lattner	5e0fef8531	relax testcase a bit. llvm-svn: 123433	2011-01-14 07:46:33 +00:00
Duncan Sands	7f60dc1eb0	Move some shift transforms out of instcombine and into InstructionSimplify. While there, I noticed that the transform "undef >>a X -> undef" was wrong. For example if X is 2 then the top two bits must be equal, so the result can not be anything. I fixed this in the constant folder as well. Also, I made the transform for "X << undef" stronger: it now folds to undef always, even though X might be zero. This is in accordance with the LangRef, but I must admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef" following the LangRef and the constant folder, likewise fairly aggressive. llvm-svn: 123417	2011-01-14 00:37:45 +00:00
Bob Wilson	08713d3c5f	Extend SROA to handle arrays accessed as homogeneous structs and vice versa. This is a minor extension of SROA to handle a special case that is important for some ARM NEON operations. Some of the NEON intrinsics return multiple values, which are handled as struct types containing multiple elements of the same vector type. The corresponding return types declared in the arm_neon.h header have equivalent arrays. We need SROA to recognize that it can split up those arrays and structs into separate vectors, even though they are not always accessed with the same type. SROA already handles loads and stores of an entire alloca by using insertvalue/extractvalue to access the individual pieces, and that code works the same regardless of whether the type is a struct or an array. So, all that needs to be done is to check for compatible arrays and homogeneous structs. llvm-svn: 123381	2011-01-13 17:45:11 +00:00
Bob Wilson	12eec40c83	Make SROA more aggressive with allocas containing padding. SROA only split up structs and arrays one level at a time, so padding can only cause trouble if it is located in between the struct or array elements. llvm-svn: 123380	2011-01-13 17:45:08 +00:00
Duncan Sands	8d25a7c3a0	The most common simplification missed by instsimplify in unoptimized bitcode is "X != 0 -> X" when X is a boolean. This occurs a lot because of the way llvm-gcc converts gcc's conditional expressions. Add this, and a few other similar transforms for completeness. llvm-svn: 123372	2011-01-13 08:56:29 +00:00
Chris Lattner	dd5f60b7a7	revert 123144, reenabling the rest of memset formation. llvm-svn: 123302	2011-01-12 03:25:15 +00:00
Chris Lattner	654098f411	revert r123146 which disabled code that wasn't the root cause of the bootstrap miscompare issue. llvm-svn: 123299	2011-01-12 01:52:23 +00:00
Chris Lattner	054d2a8525	merge tests into one crash.ll test. llvm-svn: 123220	2011-01-11 07:50:07 +00:00
Chris Lattner	63fe78de68	remove a bogus assertion: the latch block of a loop is not neccesarily an uncond branch to the header. This fixes PR8955 (the assertion tripping). llvm-svn: 123219	2011-01-11 07:47:59 +00:00
Chandler Carruth	b1e7f557b7	Teach constant folding to perform conversions from constant floating point values to their integer representation through the SSE intrinsic calls. This is the last part of a README.txt entry for which I have real world examples. llvm-svn: 123206	2011-01-11 01:07:24 +00:00
Chandler Carruth	fdf4969149	FileCheck-ize a test, and move a no-longer calling test case to another file and make it actually test something... llvm-svn: 123205	2011-01-11 01:07:20 +00:00
Owen Anderson	d490c2d2ae	Fix a random missed optimization by making InstCombine more aggressive when determining which bits are demanded by a comparison against a constant. llvm-svn: 123203	2011-01-11 00:36:45 +00:00
Chandler Carruth	cf414cf0a6	Teach instcombine about the rest of the SSE and SSE2 conversion intrinsics element dependencies. Reviewed by Nick. llvm-svn: 123161	2011-01-10 07:19:37 +00:00
Chandler Carruth	7bb282ebb1	Fold two related tests into the newly FileCheck-ized test, migrating them to FileCheck as well. llvm-svn: 123154	2011-01-10 02:53:58 +00:00
Chandler Carruth	ef7aac5961	Clean up and FileCheck-ize a test. llvm-svn: 123153	2011-01-10 02:53:54 +00:00
Chris Lattner	ec1387cf4b	fix typo llvm-svn: 123148	2011-01-10 02:33:34 +00:00
Chris Lattner	4662bd4b13	another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost back to life. llvm-svn: 123146	2011-01-10 00:47:34 +00:00
Chris Lattner	1017fa6746	temporarily disable memset formation from memsets in an effort to restore buildbot stability. llvm-svn: 123144	2011-01-09 23:52:48 +00:00
Tobias Grosser	cc21c4aa98	Instcombine: Fix pattern where the sext did not dominate the icmp using it llvm-svn: 123121	2011-01-09 16:00:11 +00:00
Chris Lattner	9a1d63ba9f	Merge memsets followed by neighboring memsets and other stores into larger memsets. Among other things, this fixes rdar://8760394 and allows us to handle "Example 2" from http://blog.regehr.org/archives/320, compiling it into a single 4096-byte memset: _mad_synth_mute: ## @mad_synth_mute ## BB#0: ## %entry pushq %rax movl $4096, %esi ## imm = 0x1000 callq ___bzero popq %rax ret llvm-svn: 123089	2011-01-08 21:19:19 +00:00
Chris Lattner	5120ebf184	fix an issue in IsPointerOffset that prevented us from recognizing that P and P+1 are relative to the same base pointer. llvm-svn: 123087	2011-01-08 21:07:56 +00:00
Chris Lattner	4dc1fd938f	enhance memcpyopt to merge a store and a subsequent memset into a single larger memset. llvm-svn: 123086	2011-01-08 20:54:51 +00:00
Chris Lattner	9dbbc49f74	merge two tests and filecheckify llvm-svn: 123082	2011-01-08 20:27:22 +00:00
Chris Lattner	59c82f850d	When loop rotation happens, it is very common for the duplicated condbr to be foldable into an uncond branch. When this happens, we can make a much simpler CFG for the loop, which is important for nested loop cases where we want the outer loop to be aggressively optimized. Handle this case more aggressively. For example, previously on phi-duplicate.ll we would get this: define void @test(i32 %N, double* %G) nounwind ssp { entry: %cmp1 = icmp slt i64 1, 1000 br i1 %cmp1, label %bb.nph, label %for.end bb.nph: ; preds = %entry br label %for.body for.body: ; preds = %bb.nph, %for.cond %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ] %arrayidx = getelementptr inbounds double* %G, i64 %j.02 %tmp3 = load double* %arrayidx %sub = sub i64 %j.02, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.02, 1 br label %for.cond for.cond: ; preds = %for.body %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge for.cond.for.end_crit_edge: ; preds = %for.cond br label %for.end for.end: ; preds = %for.cond.for.end_crit_edge, %entry ret void } Now we get the much nicer: define void @test(i32 %N, double* %G) nounwind ssp { entry: br label %for.body for.body: ; preds = %entry, %for.body %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds double* %G, i64 %j.01 %tmp3 = load double* %arrayidx %sub = sub i64 %j.01, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.01, 1 %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void } With all of these recent changes, we are now able to compile: void foo(char X) { for (int i = 0; i != 100; ++i) for (int j = 0; j != 100; ++j) X[j+i100] = 0; } into a single memset of 10000 bytes. This series of changes should also be helpful for other nested loop scenarios as well. llvm-svn: 123079	2011-01-08 19:59:06 +00:00
Chris Lattner	063dca0f6a	Three major changes: 1. Rip out LoopRotate's domfrontier updating code. It isn't needed now that LICM doesn't use DF and it is super complex and gross. 2. Make DomTree updating code a lot simpler and faster. The old loop over all the blocks was just to find a block?? 3. Change the code that inserts the new preheader to just use SplitCriticalEdge instead of doing an overcomplex reimplementation of it. No behavior change, except for the name of the inserted preheader. llvm-svn: 123072	2011-01-08 18:52:51 +00:00
Frits van Bommel	6a1fb8f235	Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little. llvm-svn: 123061	2011-01-08 10:51:36 +00:00
Chris Lattner	8c5defd0b0	Have loop-rotate simplify instructions (yay instsimplify!) as it clones them into the loop preheader, eliminating silly instructions like "icmp i32 0, 100" in fixed tripcount loops. This also better exposes the bigger problem with loop rotate that I'd like to fix: once this has been folded, the duplicated conditional branch often turns into an uncond branch. Not aggressively handling this is pessimizing later loop optimizations somethin' fierce by making "dominates all exit blocks" checks fail. llvm-svn: 123060	2011-01-08 08:24:46 +00:00
Tobias Grosser	fc3d7f664b	InstCombine: Match min/max hidden by sext/zext X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X Instead of calculating this with mixed types promote all to the larger type. This enables scalar evolution to analyze this expression. PR8866 llvm-svn: 123034	2011-01-07 21:33:14 +00:00
Benjamin Kramer	134cde912a	Revert 122959, it needs more thought. Add it back to README.txt with additional notes. llvm-svn: 123030	2011-01-07 20:42:20 +00:00
Benjamin Kramer	ae67cc13a9	InstCombine: Turn _chk functions into the "unsafe" variant if length and max langth are equal. This happens when we take the (non-constant) length from a malloc. llvm-svn: 122961	2011-01-06 14:22:52 +00:00
Benjamin Kramer	799b011276	InstCombine: If we call llvm.objectsize on a malloc call we can replace it with the size passed to malloc. llvm-svn: 122959	2011-01-06 13:11:05 +00:00
Benjamin Kramer	a76cc117e0	InstCombine: Teach llvm.objectsize folding to look through GEPs. llvm-svn: 122958	2011-01-06 13:07:49 +00:00
Chris Lattner	5858e091a6	implement constant folding support for an exotic constant expr: ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64) to "ret i64 1000". This allows us to correctly compute the trip count on a loop in PR8883, which occurs with std::fill on a char array. This allows us to transform it into a memset with a constant size. llvm-svn: 122950	2011-01-06 06:19:46 +00:00
Chris Lattner	c86e67e110	fix an off-by-one bug that caused a crash analyzing ashr's with huge shift amounts, PR8896 llvm-svn: 122814	2011-01-04 18:19:15 +00:00
Chris Lattner	8643810ede	Teach loop-idiom to turn a loop containing a memset into a larger memset when safe. The testcase is basically this nested loop: void foo(char X) { for (int i = 0; i != 100; ++i) for (int j = 0; j != 100; ++j) X[j+i100] = 0; } which gets turned into a single memset now. clang -O3 doesn't optimize this yet though due to a phase ordering issue I haven't analyzed yet. llvm-svn: 122806	2011-01-04 07:46:33 +00:00
Chris Lattner	bde6ec1db6	Duncan deftly points out that readnone functions aren't invalidated by stores, so they can be handled as 'simple' operations. llvm-svn: 122785	2011-01-03 23:38:13 +00:00
Chris Lattner	9e5e9ed79a	earlycse can do trivial with-a-block dead store elimination as well. This deletes 60 stores in 176.gcc that largely come from bitfield code. llvm-svn: 122736	2011-01-03 04:17:24 +00:00
Chris Lattner	e0e32a9ef0	now that loads are in their own table, we can implement store->load forwarding. This allows EarlyCSE to zap 600 more loads from 176.gcc. llvm-svn: 122732	2011-01-03 03:46:34 +00:00
Chris Lattner	0446bb23f8	add a testcase for readonly call CSE llvm-svn: 122730	2011-01-03 03:33:47 +00:00
Chris Lattner	b9a8efc960	Teach EarlyCSE to do trivial CSE of loads and read-only calls. On 176.gcc, this catches 13090 loads and calls, and increases the number of simple instructions CSE'd from 29658 to 36208. llvm-svn: 122727	2011-01-03 03:18:43 +00:00
Chris Lattner	8fac5db251	add DEBUG and -stats output to earlycse. Teach it to CSE the rest of the non-side-effecting instructions. llvm-svn: 122716	2011-01-02 23:19:45 +00:00
Chris Lattner	18ae5436b1	Enhance earlycse to do CSE of casts, instsimplify and die. Add a testcase. llvm-svn: 122715	2011-01-02 23:04:14 +00:00
Chris Lattner	9c69406f2b	fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to make sure that the loop we're promoting into a memcpy doesn't mutate the input of the memcpy. Before we were just checking that the dest of the memcpy wasn't mod/ref'd by the loop. llvm-svn: 122712	2011-01-02 21:14:18 +00:00
Chris Lattner	5702a43c09	If a loop iterates exactly once (has backedge count = 0) then don't mess with it. We'd rather peel/unroll it than convert all of its stores into memsets. llvm-svn: 122711	2011-01-02 20:24:21 +00:00
Chris Lattner	8455b6e45e	enhance loop idiom recognition to scan all unconditionally executed blocks in a loop, instead of just the header block. This makes it more aggressive, able to handle Duncan's Ada examples. llvm-svn: 122704	2011-01-02 19:01:03 +00:00
Duncan Sands	64f1c0dcda	Fix PR8702 by not having LoopSimplify claim to preserve LCSSA form. As described in the PR, the pass could break LCSSA form when inserting preheaders. It probably would be easy enough to fix this, but since currently we always go into LCSSA form after running this pass, doing so is not urgent. llvm-svn: 122695	2011-01-02 13:38:21 +00:00
Chris Lattner	ddf58010bd	Allow loop-idiom to run on multiple BB loops, but still only scan the loop header for now for memset/memcpy opportunities. It turns out that loop-rotate is successfully rotating loops, but DOESN'T MERGE THE BLOCKS, turning "for loops" into 2 basic block loops that loop-idiom was ignoring. With this fix, we form many many more memcpy and memsets than before, including on the "history" loops in the viterbi benchmark, which look like this: for (j=0; j<MAX_history; ++j) { history_new[i][j+1] = history[2*i][j]; } Transforming these loops into memcpy's speeds up the viterbi benchmark from 11.98s to 3.55s on my machine. Woo. llvm-svn: 122685	2011-01-02 07:58:36 +00:00
Chris Lattner	85b6d81d41	teach loop idiom recognition to form memcpy's from simple loops. llvm-svn: 122678	2011-01-02 03:37:56 +00:00
Chris Lattner	1903c42b97	fix a globalopt crash on two Adobe-C++ testcases that the recent loop idiom pass exposed. llvm-svn: 122674	2011-01-01 22:31:46 +00:00
Chris Lattner	a3514441e0	add a validity check that was missed, fixing a crash on the new testcase. llvm-svn: 122662	2011-01-01 20:12:04 +00:00
Duncan Sands	772749aea1	Revert commit 122654 at the request of Chris, who reckons that instsimplify is the wrong hammer for this nail, and is probably right. llvm-svn: 122661	2011-01-01 20:08:02 +00:00
Chris Lattner	91a4435875	improve validity check to handle constant-trip-count loops more aggressively. In practice, this doesn't help anything though, see the todo. llvm-svn: 122660	2011-01-01 19:54:22 +00:00
Chris Lattner	8b3baf6d75	implement the "no aliasing accesses in loop" safety check. This pass should be correct now. llvm-svn: 122659	2011-01-01 19:39:01 +00:00
Duncan Sands	e3c539581c	Fix a README item by having InstructionSimplify do a mild form of value numbering, in which it considers (for example) "%a = add i32 %x, %y" and "%b = add i32 %x, %y" to be equal because the operands are equal and the result of the instructions only depends on the values of the operands. This has almost no effect (it removes 4 instructions from gcc-as-one-file), and perhaps slows down compilation: I measured a 0.4% slowdown on the large gcc-as-one-file testcase, but it wasn't statistically significant. llvm-svn: 122654	2011-01-01 16:12:09 +00:00
NAKAMURA Takumi	a834008959	test/Transforms/ConstProp/logicaltest.ll: FileCheck-ize. llvm-svn: 122620	2010-12-29 03:58:56 +00:00
Chris Lattner	29e14edc8d	implement enough of the memset inference algorithm to recognize and insert memsets. This is still missing one important validity check, but this is enough to compile stuff like this: void test0(std::vector<char> &X) { for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I) *I = 0; } void test1(std::vector<int> &X) { for (long i = 0, e = X.size(); i != e; ++i) X[i] = 0x01010101; } With: $ clang t.cpp -S -o - -O2 -emit-llvm \| opt -loop-idiom \| opt -O3 \| llc to: __Z5test0RSt6vectorIcSaIcEE: ## @_Z5test0RSt6vectorIcSaIcEE ## BB#0: ## %entry subq $8, %rsp movq (%rdi), %rax movq 8(%rdi), %rsi cmpq %rsi, %rax je LBB0_2 ## BB#1: ## %bb.nph subq %rax, %rsi movq %rax, %rdi callq ___bzero LBB0_2: ## %for.end addq $8, %rsp ret ... __Z5test1RSt6vectorIiSaIiEE: ## @_Z5test1RSt6vectorIiSaIiEE ## BB#0: ## %entry subq $8, %rsp movq (%rdi), %rax movq 8(%rdi), %rdx subq %rax, %rdx cmpq $4, %rdx jb LBB1_2 ## BB#1: ## %for.body.preheader andq $-4, %rdx movl $1, %esi movq %rax, %rdi callq _memset LBB1_2: ## %for.end addq $8, %rsp ret llvm-svn: 122573	2010-12-26 23:42:51 +00:00
Chris Lattner	6cf8d6cc6e	start using irbuilder to make mem intrinsics in a few passes. llvm-svn: 122572	2010-12-26 22:57:41 +00:00

... 3 4 5 6 7 ...

2490 Commits