llvm-project

Commit Graph

Author	SHA1	Message	Date
Anders Carlsson	f23a6da271	Recognize and simplify (A+B) == A -> B == 0 A == (A+B) -> B == 0 llvm-svn: 124567	2011-01-30 22:01:13 +00:00
Francois Pichet	326e4a2966	Unbreak the MSVC build. The DEBUG() call at line 606 demands to see raw_ostream's definition. I have no idea why this seems to only break MSVC. llvm-svn: 124545	2011-01-29 20:06:16 +00:00
Frits van Bommel	2a55951d08	Call SimplifyFDivInst() in InstCombiner::visitFDiv(). llvm-svn: 124535	2011-01-29 17:50:27 +00:00
Frits van Bommel	c2549661af	Move InstCombine's knowledge of fdiv to SimplifyInstruction(). llvm-svn: 124534	2011-01-29 15:26:31 +00:00
Evan Cheng	73c29178ac	Add a test for TCE return duplication. llvm-svn: 124527	2011-01-29 04:53:35 +00:00
Evan Cheng	d983eba7dc	Re-apply r124518 with fix. Watch out for invalidated iterator. llvm-svn: 124526	2011-01-29 04:46:23 +00:00
Evan Cheng	65b8ccf6ac	Revert r124518. It broke Linux self-host. llvm-svn: 124522	2011-01-29 02:43:04 +00:00
Evan Cheng	d4eff31476	Re-commit r124462 with fixes. Tail recursion elim will now dup ret into unconditional predecessor to enable TCE on demand. llvm-svn: 124518	2011-01-29 01:29:26 +00:00
Andrew Trick	24f5ff0f23	Implementation of path profiling. Modified patch by Adam Preuss. This builds on the existing framework for block tracing, edge profiling and optimal edge profiling. See -help-hidden for new flags. For documentation, see the technical report "Implementation of Path Profiling..." in llvm.org/pubs. llvm-svn: 124515	2011-01-29 01:09:53 +00:00
Duncan Sands	771e82a863	My auto-simplifier noticed that ((X/Y)Y)/Y occurs several times in SPEC benchmarks, and that it can be simplified to X/Y. (In general you can only simplify (ZY)/Y to Z if the multiplication did not overflow; if Z has the form "X/Y" then this is the case). This patch implements that transform and moves some Div logic out of instcombine and into InstructionSimplify. Unfortunately instcombine gets in the way somewhat, since it likes to change (X/Y)Y into X-(X rem Y), so I had to teach instcombine about this too. Finally, thanks to the NSW/NUW flags, sometimes we know directly that "ZY" does not overflow, because the flag says so, so I added that logic too. This eliminates a bunch of divisions and subtractions in 447.dealII, and has good effects on some other benchmarks too. It seems to have quite an effect on tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions changed, resulting in massive changes all over. llvm-svn: 124487	2011-01-28 16:51:11 +00:00
Nick Lewycky	cfb284cf96	Rename functions to follow coding standard. Also rejiggers comments. No functionality change. llvm-svn: 124482	2011-01-28 08:43:14 +00:00
Nick Lewycky	aaf401241a	Add a doxygen comment for this class. llvm-svn: 124480	2011-01-28 08:19:00 +00:00
Nick Lewycky	564fcca856	Reorder for readability. (Chris, is this what you meant?) llvm-svn: 124479	2011-01-28 07:36:21 +00:00
Evan Cheng	aaa9606b2f	Revert r124462. There are a few big regressions that I need to fix first. llvm-svn: 124478	2011-01-28 07:12:38 +00:00
Nick Lewycky	c5eb3733f7	Reduce the number of functions we look at in the first pass, and preallocate the function equality set. llvm-svn: 124475	2011-01-28 05:48:15 +00:00
Nick Lewycky	b074e32641	Fold select + select where both selects are on the same condition. llvm-svn: 124469	2011-01-28 03:28:10 +00:00
Evan Cheng	417fca86c4	- Stop simplifycfg from duplicating "ret" instructions into unconditional branches. PR8575, rdar://5134905, rdar://8911460. - Allow codegen tail duplication to dup small return blocks after register allocation is done. llvm-svn: 124462	2011-01-28 02:19:21 +00:00
Benjamin Kramer	57e3d65884	Unbreak the build. llvm-svn: 124426	2011-01-27 20:30:54 +00:00
Nick Lewycky	e2d46d30ae	Expound upon this comparison! llvm-svn: 124406	2011-01-27 19:51:31 +00:00
Nick Lewycky	5a37e950e1	Use dyn_cast instead of isa+cast. llvm-svn: 124404	2011-01-27 19:42:43 +00:00
Nick Lewycky	13e04aef2a	Fix surprising missed optimization in mergefunc where we forgot to consider that relationships like "i8* null" is equivalent to "i32* null". llvm-svn: 124368	2011-01-27 08:38:19 +00:00
Duncan Sands	69bdb585b2	Fix PR9039, a use-after-free in reassociate. The issue was that the operand being factorized (and erased) could occur several times in Ops, resulting in freed memory being used when the next occurrence in Ops was analyzed. llvm-svn: 124287	2011-01-26 10:08:38 +00:00
Nick Lewycky	91543447a6	AttrListPtr has an overloaded operator== which does this for us, we should use it. No functionality change! llvm-svn: 124286	2011-01-26 09:23:19 +00:00
Nick Lewycky	82d4db8662	Teach mergefunc that intptr_t is the same width as a pointer. We still can't merge vector<intptr_t>::push_back() and vector<void>::push_back() because Enumerate() doesn't realize that "i64 null" and "i8** null" are equivalent. llvm-svn: 124285	2011-01-26 09:13:58 +00:00
Nick Lewycky	fb622f9920	There are no vectors of pointer or arrays, so we don't need to check vector elements for type equivalence. llvm-svn: 124284	2011-01-26 08:50:18 +00:00
Nick Lewycky	f1cec164ce	Teach mergefunc how to emit aliases safely again -- but keep it turned it off for now. It's controlled by the HasGlobalAliases variable which is not attached to any flag yet. llvm-svn: 124182	2011-01-25 08:56:50 +00:00
Dan Gohman	0f124e1987	Give GetUnderlyingObject a TargetData, to keep it in sync with BasicAA's DecomposeGEPExpression, which recently began using a TargetData. This fixes PR8968, though the testcase is awkward to reduce. Also, update several off GetUnderlyingObject's users which happen to have a TargetData handy to pass it in. llvm-svn: 124134	2011-01-24 18:53:32 +00:00
Chris Lattner	b4017769ae	fix PR9017, a bug where we'd assert when promoting in unreachable code. llvm-svn: 124100	2011-01-24 03:29:07 +00:00
Chris Lattner	23289c385a	fix PR9015, a crash linking recursive metadata. llvm-svn: 124099	2011-01-24 03:18:24 +00:00
Chris Lattner	d83e7b0ff6	enhance SRoA to promote allocas that are used by PHI nodes. This often occurs because instcombine sinks loads and inserts phis. This kicks in on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in spec2006 in some app that uses std::deque. This resolves the last of rdar://7339113. llvm-svn: 124090	2011-01-24 01:07:11 +00:00
Chris Lattner	a960725d18	Enhance SRoA to promote allocas that are used by selects in some common cases. This triggers a surprising number of times in SPEC2K6 because min/max idioms end up doing this. For example, code from the STL ends up looking like this to SRoA: %202 = load i64* %__old_size, align 8, !tbaa !3 %203 = load i64* %__old_size, align 8, !tbaa !3 %204 = load i64* %__n, align 8, !tbaa !3 %205 = icmp ult i64 %203, %204 %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size %206 = load i64* %storemerge.i, align 8, !tbaa !3 We can now promote both the __n and the __old_size allocas. This addresses another chunk of rdar://7339113, poor codegen on stringswitch. llvm-svn: 124088	2011-01-23 22:04:55 +00:00
Ted Kremenek	3c4408ceb6	Null initialize a few variables flagged by clang's -Wuninitialized-experimental warning. While these don't look like real bugs, clang's -Wuninitialized-experimental analysis is stricter than GCC's, and these fixes have the benefit of being general nice cleanups. llvm-svn: 124073	2011-01-23 17:05:06 +00:00
Chris Lattner	9491dee24e	Enhance SRoA to be more aggressive about scalarization of aggregate allocas that have PHI or select uses of their element pointers. This can often happen when instcombine sinks two loads into a successor, inserting a phi or select. With this patch, we can scalarize the alloca, but the pinned elements are not yet promoted. This is still a win for large aggregates where only one element is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen on stringswitch). llvm-svn: 124070	2011-01-23 08:27:54 +00:00
Cameron Zwarich	07d6fe34b3	Convert two std::vectors to SmallVectors for a 3.4% speedup running -scalarrepl on test-suite + SPEC2000 & SPEC2006. llvm-svn: 124068	2011-01-23 08:03:04 +00:00
Chris Lattner	8acbb79506	have AllocaInfo store the alloca being inspected, simplifying callers. No functionality change. llvm-svn: 124067	2011-01-23 07:29:29 +00:00
Chris Lattner	3e56c29068	Rearrange some code a bit. Change MarkUnsafe to handle the "Transformation preventing inst" printing, so that -scalarrepl -debug will always print the rejected instruction. No functionality change. llvm-svn: 124066	2011-01-23 07:05:44 +00:00
Chris Lattner	a587ab7b94	remove an old hack that avoided creating MMX datatypes. The X86 backend has been fixed. llvm-svn: 124064	2011-01-23 06:40:33 +00:00
Dan Gohman	19e30d5a7d	Actually check memcpy lengths, instead of just commenting about how they should be checked. llvm-svn: 123999	2011-01-21 22:07:57 +00:00
Owen Anderson	a834200dbe	Just because we have determined that an (fcmp \| fcmp) is true for A < B, A == B, and A > B, does not mean we can fold it to true. We still need to check for A ? B (A unordered B). llvm-svn: 123993	2011-01-21 19:39:42 +00:00
Nick Lewycky	ae0275e018	SCCP doesn't actually preserve the CFG. It will delete and insert terminator instructions. llvm-svn: 123973	2011-01-21 08:38:09 +00:00
Chris Lattner	b5e15d1907	fix PR9013, an infinite loop in instcombine. llvm-svn: 123968	2011-01-21 05:29:50 +00:00
Chris Lattner	f4ca47bda8	update obsolete comment. llvm-svn: 123965	2011-01-21 05:08:26 +00:00
Nick Lewycky	6a083cf820	Don't try to pull vector bitcasts that change the number of elements through a select. A vector select is pairwise on each element so we'd need a new condition with the right number of elements to select on. Fixes PR8994. llvm-svn: 123963	2011-01-21 02:30:43 +00:00
Duncan Sands	8fb2c3827c	At -O123 the early-cse pass is run before instcombine has run. According to my auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0. This patch adds this transform and some related logic to InstructionSimplify and removes some of the logic from instcombine (unfortunately not all because there are several situations in which instcombine can improve things by making new instructions, whereas instsimplify is not allowed to do this). At -O2 this often results in more than 15% more simplifications by early-cse, and results in hundreds of lines of bitcode being eliminated from the testsuite. I did see some small negative effects in the testsuite, for example a few additional instructions in three programs. One program, 483.xalancbmk, got an additional 35 instructions, which seems to be due to a function getting an additional instruction and then being inlined all over the place. llvm-svn: 123911	2011-01-20 13:21:55 +00:00
Rafael Espindola	fc355bc070	Add unnamed_addr when we can show that address of a global is not used. llvm-svn: 123834	2011-01-19 16:32:21 +00:00
Chris Lattner	86d56c651d	fix rdar://8878965, a regression I introduced with the recent llvm.objectsize changes. llvm-svn: 123771	2011-01-18 20:53:04 +00:00
Cameron Zwarich	fc210c79b7	Convert a std::map to a DenseMap for another 1.7% speedup on -scalarrepl. llvm-svn: 123732	2011-01-18 04:50:38 +00:00
Cameron Zwarich	6968c41ac8	Make a std::vector a SmallVector<*, 32> like the other vectors in the same function. This seems to be about a 1.5% speedup of -scalarrepl on test-suite with SPEC2000 and SPEC2006. llvm-svn: 123731	2011-01-18 04:41:32 +00:00
Rafael Espindola	ecd5b9abe9	Reduce indentation and remove commented out code. llvm-svn: 123729	2011-01-18 04:36:06 +00:00
Cameron Zwarich	b703654edc	Remove code for updating dominance frontiers and some outdated references to dominance and post-dominance frontiers. llvm-svn: 123725	2011-01-18 04:11:31 +00:00
Cameron Zwarich	4694e69540	Remove outdated references to dominance frontiers. llvm-svn: 123724	2011-01-18 03:53:26 +00:00
Owen Anderson	459e079912	Remove dead code, that I apparently wrote a while back. We seem to be doing well enough without whatever this was trying to do. When/if someone has the time to do some empirical evaluations, it might be worth it to figure out what this code was trying to do and see if it's worth resurrecting/fixing. llvm-svn: 123684	2011-01-17 22:39:54 +00:00
Cameron Zwarich	b410858a5f	Roll r123609 back in with two changes that fix test failures with expensive checks enabled: 1) Use '<' to compare integers in a comparison function rather than '<='. 2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize the priority queue. The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at just under 16% rather than 17%. llvm-svn: 123662	2011-01-17 17:38:41 +00:00
Cameron Zwarich	67431d7943	Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot. llvm-svn: 123618	2011-01-17 07:26:51 +00:00
Cameron Zwarich	814cd9233e	Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to eliminating a potentially quadratic data structure, this also gives a 17% speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial experiment gave a greater speedup around 25%, but I moved the dominator tree level computation from dominator tree construction to PromoteMemToReg. Since this approach to computing IDFs has a much lower overhead than the old code using precomputed DFs, it is worth looking at using this new code for the second scalarrepl pass as well. llvm-svn: 123609	2011-01-17 01:08:59 +00:00
Anders Carlsson	d3db83349e	Teach DAE to look for functions whose arguments are unused, and change all callers to pass in an undefvalue instead. llvm-svn: 123596	2011-01-16 21:25:33 +00:00
Chris Lattner	7c9f4c9c2b	tidy up a comment, as suggested by duncan llvm-svn: 123590	2011-01-16 17:46:19 +00:00
Rafael Espindola	751677a040	Don't merge two constants if we care about the address of both. This fixes the original testcase in PR8927. It also causes a clang binary built with a patched clang to increase in size by 0.21%. We can probably get some of the size back by writing a pass that detects that a global never has its pointer compared and adds unnamed_addr to it (maybe extend global opt). It is also possible that there are some other cases clang could add unnamed_addr to. I will investigate extending globalopt next. llvm-svn: 123584	2011-01-16 17:05:09 +00:00
Chris Lattner	e5f8de8639	fix PR8932, a case where arg promotion could infinitely promote. llvm-svn: 123574	2011-01-16 08:09:24 +00:00
Chris Lattner	ed1fb92cfe	simplify a little llvm-svn: 123573	2011-01-16 07:11:21 +00:00
Chris Lattner	6fab2e9418	if an alloca is only ever accessed as a unit, and is accessed with load/store instructions, then don't try to decimate it into its individual pieces. This will just make a mess of the IR and is pointless if none of the elements are individually accessed. This was generating really terrible code for std::bitset (PR8980) because it happens to be lowered by clang as an {[8 x i8]} structure instead of {i64}. The testcase now is optimized to: define i64 @test2(i64 %X) { br label %L2 L2: ; preds = %0 ret i64 %X } before we generated: define i64 @test2(i64 %X) { %sroa.store.elt = lshr i64 %X, 56 %1 = trunc i64 %sroa.store.elt to i8 %sroa.store.elt8 = lshr i64 %X, 48 %2 = trunc i64 %sroa.store.elt8 to i8 %sroa.store.elt9 = lshr i64 %X, 40 %3 = trunc i64 %sroa.store.elt9 to i8 %sroa.store.elt10 = lshr i64 %X, 32 %4 = trunc i64 %sroa.store.elt10 to i8 %sroa.store.elt11 = lshr i64 %X, 24 %5 = trunc i64 %sroa.store.elt11 to i8 %sroa.store.elt12 = lshr i64 %X, 16 %6 = trunc i64 %sroa.store.elt12 to i8 %sroa.store.elt13 = lshr i64 %X, 8 %7 = trunc i64 %sroa.store.elt13 to i8 %8 = trunc i64 %X to i8 br label %L2 L2: ; preds = %0 %9 = zext i8 %1 to i64 %10 = shl i64 %9, 56 %11 = zext i8 %2 to i64 %12 = shl i64 %11, 48 %13 = or i64 %12, %10 %14 = zext i8 %3 to i64 %15 = shl i64 %14, 40 %16 = or i64 %15, %13 %17 = zext i8 %4 to i64 %18 = shl i64 %17, 32 %19 = or i64 %18, %16 %20 = zext i8 %5 to i64 %21 = shl i64 %20, 24 %22 = or i64 %21, %19 %23 = zext i8 %6 to i64 %24 = shl i64 %23, 16 %25 = or i64 %24, %22 %26 = zext i8 %7 to i64 %27 = shl i64 %26, 8 %28 = or i64 %27, %25 %29 = zext i8 %8 to i64 %30 = or i64 %29, %28 ret i64 %30 } In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough PHIs are in play that instcombine backs off. It's better to not generate this stuff in the first place. llvm-svn: 123571	2011-01-16 06:18:28 +00:00
Chris Lattner	7cd8cf7d24	Use an irbuilder to get some trivial constant folding when doing a store of a constant. llvm-svn: 123570	2011-01-16 05:58:24 +00:00
Chris Lattner	adb1a233b1	remove a dead check, this was needed before we had an explicit veto on uses of phis. llvm-svn: 123569	2011-01-16 05:37:55 +00:00
Chris Lattner	d55581ded8	enhance FoldOpIntoPhi in instcombine to try harder when a phi has multiple uses. In some cases, all the uses are the same operation, so instcombine can go ahead and promote the phi. In the testcase this pushes an add out of the loop. llvm-svn: 123568	2011-01-16 05:28:59 +00:00
Chris Lattner	ea7131a062	remove the AllowAggressive argument to FoldOpIntoPhi. It is forced to false in the first line of the function because it isn't a good idea, even for compares. llvm-svn: 123566	2011-01-16 05:14:26 +00:00
Chris Lattner	ff2e737714	more cleanups: use the IR builder. llvm-svn: 123565	2011-01-16 05:08:00 +00:00
Chris Lattner	25ce280511	tidy up code. llvm-svn: 123564	2011-01-16 04:37:29 +00:00
Owen Anderson	4e54efd625	Improve the safety of my globalopt enhancement by ensuring that the bitcast of the stored value to the new store type is always. Also, add a testcase. llvm-svn: 123563	2011-01-16 04:33:33 +00:00
Chris Lattner	8b4952fcf7	simplify this code, it is still broken but will follow up on llvm-commits. llvm-svn: 123558	2011-01-16 02:05:10 +00:00
Chris Lattner	1e209b87ad	remove the partial specialization pass. It is unmaintained and has bugs. llvm-svn: 123554	2011-01-16 00:27:10 +00:00
Nick Lewycky	4a1ff16b29	Add missing whitespace. llvm-svn: 123543	2011-01-15 18:42:52 +00:00
Nick Lewycky	0296a481f9	Make constmerge a two-pass algorithm so that it won't miss merging opporuntities. Fixes PR8978. llvm-svn: 123541	2011-01-15 18:14:21 +00:00
Benjamin Kramer	ed5f2e504e	Try to unbreak selfhost. llvm-svn: 123537	2011-01-15 11:25:34 +00:00
Nick Lewycky	540f9536c8	Add a cache that protects mergefunc's internals from more surprises in DenseSet. Also, replace tabs with spaces. Yes, it's 2011. llvm-svn: 123535	2011-01-15 10:16:23 +00:00
Chris Lattner	af26390790	temporarily revert r123526. While working on a follow-on patch I realize that ConstantFoldTerminator doesn't preserve dominfo. llvm-svn: 123527	2011-01-15 07:51:19 +00:00
Chris Lattner	8df83c4a24	fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code The basic issue is that isel (very reasonably!) expects conditional branches to be folded, so CGP leaving around a bunch dead computation feeding conditional branches isn't such a good idea. Just fold branches on constants into unconditional branches. llvm-svn: 123526	2011-01-15 07:36:13 +00:00
Chris Lattner	ee588defc6	simplify code, no functionality change. llvm-svn: 123525	2011-01-15 07:29:01 +00:00
Chris Lattner	1b93be501d	Now that instruction optzns can update the iterator as they go, we can have objectsize folding recursively simplify away their result when it folds. It is important to catch this here, because otherwise we won't eliminate the cross-block values at isel and other times. llvm-svn: 123524	2011-01-15 07:25:29 +00:00
Chris Lattner	7a2771440f	make the current instruction iterator an ivar, allowing xforms that potentially invalidate it (like inline asm lowering) to be sunk into their proper place, cleaning up a ton of code. llvm-svn: 123523	2011-01-15 07:14:54 +00:00
Chris Lattner	9c10d587f6	implement an instcombine xform that canonicalizes casts outside of and-with-constant operations. This fixes rdar://8808586 which observed that we used to compile: union xy { struct x { _Bool b[15]; } x; __attribute__((packed)) struct y { __attribute__((packed)) unsigned long b0to7; __attribute__((packed)) unsigned int b8to11; __attribute__((packed)) unsigned short b12to13; __attribute__((packed)) unsigned char b14; } y; }; struct x foo(union xy *xy) { return xy->x; } into: _foo: ## @foo movq (%rdi), %rax movabsq $1095216660480, %rcx ## imm = 0xFF00000000 andq %rax, %rcx movabsq $-72057594037927936, %rdx ## imm = 0xFF00000000000000 andq %rax, %rdx movzbl %al, %esi orq %rdx, %rsi movq %rax, %rdx andq $65280, %rdx ## imm = 0xFF00 orq %rsi, %rdx movq %rax, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rdx, %rsi movl %eax, %edx andl $-16777216, %edx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rdx orq %rcx, %rdx movabsq $280375465082880, %rcx ## imm = 0xFF0000000000 movq %rax, %rsi andq %rcx, %rsi orq %rdx, %rsi movabsq $71776119061217280, %r8 ## imm = 0xFF000000000000 andq %r8, %rax orq %rsi, %rax movzwl 12(%rdi), %edx movzbl 14(%rdi), %esi shlq $16, %rsi orl %edx, %esi movq %rsi, %r9 shlq $32, %r9 movl 8(%rdi), %edx orq %r9, %rdx andq %rdx, %rcx movzbl %sil, %esi shlq $32, %rsi orq %rcx, %rsi movl %edx, %ecx andl $-16777216, %ecx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rcx movq %rdx, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rcx, %rsi movq %rdx, %rcx andq $65280, %rcx ## imm = 0xFF00 orq %rsi, %rcx movzbl %dl, %esi orq %rcx, %rsi andq %r8, %rdx orq %rsi, %rdx ret We now compile this into: _foo: ## @foo ## BB#0: ## %entry movzwl 12(%rdi), %eax movzbl 14(%rdi), %ecx shlq $16, %rcx orl %eax, %ecx shlq $32, %rcx movl 8(%rdi), %edx orq %rcx, %rdx movq (%rdi), %rax ret A small improvement :-) llvm-svn: 123520	2011-01-15 06:32:33 +00:00
Chris Lattner	e20dd530d0	one more instcombine variant that is needed to work with future changes, no functionality change currently. llvm-svn: 123517	2011-01-15 05:50:18 +00:00
Chris Lattner	497459d5fd	fix typo llvm-svn: 123516	2011-01-15 05:42:47 +00:00
Chris Lattner	f3c4eefff8	Catch ~x < cst just like ~x < ~y, we currently handle this through means that are about to disappear. llvm-svn: 123515	2011-01-15 05:41:33 +00:00
Chris Lattner	311aa63c87	reduce indentation llvm-svn: 123514	2011-01-15 05:40:29 +00:00
Chris Lattner	b68ec5c339	Generalize LoadAndStorePromoter a bit and switch LICM to use it. llvm-svn: 123501	2011-01-15 00:12:35 +00:00
Owen Anderson	3e2f6cf7ae	Fix a false-positive warning. llvm-svn: 123480	2011-01-14 22:31:13 +00:00
Owen Anderson	9eb7cb48e4	Enhance GlobalOpt to be able evaluate initializers that involve stores through bitcasts, at least in simple cases. This fixes clang's CodeGenCXX/virtual-base-dtor.cpp llvm-svn: 123477	2011-01-14 22:19:20 +00:00
Chris Lattner	b498f9aff3	switch SRoA to use LoadAndStorePromoter instead of its own copy of the code. llvm-svn: 123457	2011-01-14 19:50:47 +00:00
Chris Lattner	95294b8796	Add a new LoadAndStorePromoter class, which implements the general "promote a bunch of load and stores" logic, allowing the code to be shared and reused. llvm-svn: 123456	2011-01-14 19:36:13 +00:00
Chris Lattner	9987a6f49b	split SROA into two passes: one that uses DomFrontiers (-scalarrepl) and one that uses SSAUpdater (-scalarrepl-ssa) llvm-svn: 123436	2011-01-14 08:13:00 +00:00
Chris Lattner	543384efb4	Implement full support for promoting allocas to registers using SSAUpdater instead of DomTree/DomFrontier. This may be interesting for reducing compile time. This is currently disabled, but seems to work just fine. When this is enabled, we eliminate two runs of dominator frontier, one in the "early per-function" optimizations and one in the "interlaced with inliner" function passes. llvm-svn: 123434	2011-01-14 07:50:47 +00:00
Chris Lattner	90f3a9a1c7	indentation llvm-svn: 123426	2011-01-14 04:23:53 +00:00
Duncan Sands	7f60dc1eb0	Move some shift transforms out of instcombine and into InstructionSimplify. While there, I noticed that the transform "undef >>a X -> undef" was wrong. For example if X is 2 then the top two bits must be equal, so the result can not be anything. I fixed this in the constant folder as well. Also, I made the transform for "X << undef" stronger: it now folds to undef always, even though X might be zero. This is in accordance with the LangRef, but I must admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef" following the LangRef and the constant folder, likewise fairly aggressive. llvm-svn: 123417	2011-01-14 00:37:45 +00:00
Bob Wilson	328e91bbe1	Fix whitespace. llvm-svn: 123396	2011-01-13 20:59:44 +00:00
Bob Wilson	c8056a952e	Check for empty structs, and for consistency, zero-element arrays. llvm-svn: 123383	2011-01-13 18:26:59 +00:00
Bob Wilson	08713d3c5f	Extend SROA to handle arrays accessed as homogeneous structs and vice versa. This is a minor extension of SROA to handle a special case that is important for some ARM NEON operations. Some of the NEON intrinsics return multiple values, which are handled as struct types containing multiple elements of the same vector type. The corresponding return types declared in the arm_neon.h header have equivalent arrays. We need SROA to recognize that it can split up those arrays and structs into separate vectors, even though they are not always accessed with the same type. SROA already handles loads and stores of an entire alloca by using insertvalue/extractvalue to access the individual pieces, and that code works the same regardless of whether the type is a struct or an array. So, all that needs to be done is to check for compatible arrays and homogeneous structs. llvm-svn: 123381	2011-01-13 17:45:11 +00:00
Bob Wilson	12eec40c83	Make SROA more aggressive with allocas containing padding. SROA only split up structs and arrays one level at a time, so padding can only cause trouble if it is located in between the struct or array elements. llvm-svn: 123380	2011-01-13 17:45:08 +00:00
Devang Patel	30f3ebbc1f	Use SmallVector instead of SmallPtrSet and avoid non-deterministic behavior. llvm-svn: 123318	2011-01-12 19:12:45 +00:00
Chris Lattner	dd5f60b7a7	revert 123144, reenabling the rest of memset formation. llvm-svn: 123302	2011-01-12 03:25:15 +00:00
Chris Lattner	654098f411	revert r123146 which disabled code that wasn't the root cause of the bootstrap miscompare issue. llvm-svn: 123299	2011-01-12 01:52:23 +00:00
Chris Lattner	fa7c29d255	revert r123149, reenabling an improvement to memcpyopt that wasn't the source of the bootstrap problem. llvm-svn: 123298	2011-01-12 01:43:46 +00:00
Jakob Stoklund Olesen	12cc296bd4	Remove the PR8954 workaround. llvm-svn: 123288	2011-01-11 22:56:41 +00:00
Jakob Stoklund Olesen	f2407aa98b	Fix a non-deterministic loop in llvm::MergeBlockIntoPredecessor. DT->changeImmediateDominator() trivially ignores identity updates, so there is really no need for the uniqueing provided by SmallPtrSet. I expect this to fix PR8954. llvm-svn: 123286	2011-01-11 22:54:38 +00:00
Cameron Zwarich	cb9c4f85ec	Dial back the speculative fix for PR8954 a bit, so that we only recompute dominators once at the beginning of GVN instead of once per iteration. llvm-svn: 123278	2011-01-11 22:14:42 +00:00
Cameron Zwarich	51eb403907	Attempt to fix the bootstrap buildbot. Rafael says this works for him on x86-64 Linux. llvm-svn: 123270	2011-01-11 20:23:34 +00:00
Owen Anderson	0022a4b417	Remove dead variable, const-ref-ize an APInt. llvm-svn: 123248	2011-01-11 18:26:37 +00:00
Chris Lattner	d41db8f9cb	this pass claims to preserve scev, make sure to tell it about deletions. llvm-svn: 123247	2011-01-11 18:14:50 +00:00
Frits van Bommel	8e158495f1	Factor the actual simplification out of SimplifyIndirectBrOnSelect and into a new helper function so it can be reused in e.g. an upcoming SimplifySwitchOnSelect. No functional change. llvm-svn: 123234	2011-01-11 12:52:11 +00:00
Chris Lattner	193ce7c4d1	update memdep when an instruction is deleted. This code isn't actually reached in the testcase in PR8954, but it's safe and good practice. llvm-svn: 123224	2011-01-11 08:19:16 +00:00
Chris Lattner	e2523b287c	when MergeBlockIntoPredecessor merges two blocks, update MemDep if it is floating around in the ether. llvm-svn: 123223	2011-01-11 08:16:49 +00:00
Chris Lattner	f6ae904e34	Fix FoldSingleEntryPHINodes to update memdep and AA when it deletes phi nodes. It is called from MergeBlockIntoPredecessor which is called from GVN, which claims to preserve these. I'm skeptical that this is the actual problem behind PR8954, but this is a stab in the right direction. llvm-svn: 123222	2011-01-11 08:13:40 +00:00
Chris Lattner	dfcfcb49fa	random cleanups llvm-svn: 123221	2011-01-11 08:00:40 +00:00
Chris Lattner	63fe78de68	remove a bogus assertion: the latch block of a loop is not neccesarily an uncond branch to the header. This fixes PR8955 (the assertion tripping). llvm-svn: 123219	2011-01-11 07:47:59 +00:00
Owen Anderson	d490c2d2ae	Fix a random missed optimization by making InstCombine more aggressive when determining which bits are demanded by a comparison against a constant. llvm-svn: 123203	2011-01-11 00:36:45 +00:00
Chandler Carruth	cf414cf0a6	Teach instcombine about the rest of the SSE and SSE2 conversion intrinsics element dependencies. Reviewed by Nick. llvm-svn: 123161	2011-01-10 07:19:37 +00:00
Chris Lattner	88bc848ab6	another random stab in the dark trying to fix llvm-gcc-i386-linux-selfhost llvm-svn: 123149	2011-01-10 02:34:11 +00:00
Chris Lattner	4662bd4b13	another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost back to life. llvm-svn: 123146	2011-01-10 00:47:34 +00:00
Chris Lattner	1017fa6746	temporarily disable memset formation from memsets in an effort to restore buildbot stability. llvm-svn: 123144	2011-01-09 23:52:48 +00:00
Chris Lattner	caf5c0d037	fix a few old bugs (found by inspection) where we would zap instructions without informing memdep. This could cause nondeterminstic weirdness based on where instructions happen to get allocated, and will hopefully breath some life into some broken testers. llvm-svn: 123124	2011-01-09 19:26:10 +00:00
Tobias Grosser	cc21c4aa98	Instcombine: Fix pattern where the sext did not dominate the icmp using it llvm-svn: 123121	2011-01-09 16:00:11 +00:00
Cameron Zwarich	a42e5915bf	LoopInstSimplify preserves LoopSimplify. llvm-svn: 123117	2011-01-09 12:35:16 +00:00
Chris Lattner	a337f5ec5c	reduce indentation. Print <nuw> and <nsw> when dumping SCEV AddRec's that have the bit set. llvm-svn: 123104	2011-01-09 02:16:18 +00:00
Chris Lattner	7d6433ae76	fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't updating memdep when fusing stores together. This fixes the crash optimizing the bullet benchmark. llvm-svn: 123091	2011-01-08 22:19:21 +00:00
Chris Lattner	ff6ed2ac5f	tryMergingIntoMemset can only handle constant length memsets. llvm-svn: 123090	2011-01-08 22:11:56 +00:00
Chris Lattner	9a1d63ba9f	Merge memsets followed by neighboring memsets and other stores into larger memsets. Among other things, this fixes rdar://8760394 and allows us to handle "Example 2" from http://blog.regehr.org/archives/320, compiling it into a single 4096-byte memset: _mad_synth_mute: ## @mad_synth_mute ## BB#0: ## %entry pushq %rax movl $4096, %esi ## imm = 0x1000 callq ___bzero popq %rax ret llvm-svn: 123089	2011-01-08 21:19:19 +00:00
Chris Lattner	5120ebf184	fix an issue in IsPointerOffset that prevented us from recognizing that P and P+1 are relative to the same base pointer. llvm-svn: 123087	2011-01-08 21:07:56 +00:00
Chris Lattner	4dc1fd938f	enhance memcpyopt to merge a store and a subsequent memset into a single larger memset. llvm-svn: 123086	2011-01-08 20:54:51 +00:00
Chris Lattner	c638147e9f	constify TargetData references. Split memset formation logic out into its own "tryMergingIntoMemset" helper function. llvm-svn: 123081	2011-01-08 20:24:01 +00:00
Chris Lattner	59c82f850d	When loop rotation happens, it is very common for the duplicated condbr to be foldable into an uncond branch. When this happens, we can make a much simpler CFG for the loop, which is important for nested loop cases where we want the outer loop to be aggressively optimized. Handle this case more aggressively. For example, previously on phi-duplicate.ll we would get this: define void @test(i32 %N, double* %G) nounwind ssp { entry: %cmp1 = icmp slt i64 1, 1000 br i1 %cmp1, label %bb.nph, label %for.end bb.nph: ; preds = %entry br label %for.body for.body: ; preds = %bb.nph, %for.cond %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ] %arrayidx = getelementptr inbounds double* %G, i64 %j.02 %tmp3 = load double* %arrayidx %sub = sub i64 %j.02, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.02, 1 br label %for.cond for.cond: ; preds = %for.body %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge for.cond.for.end_crit_edge: ; preds = %for.cond br label %for.end for.end: ; preds = %for.cond.for.end_crit_edge, %entry ret void } Now we get the much nicer: define void @test(i32 %N, double* %G) nounwind ssp { entry: br label %for.body for.body: ; preds = %entry, %for.body %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds double* %G, i64 %j.01 %tmp3 = load double* %arrayidx %sub = sub i64 %j.01, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.01, 1 %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void } With all of these recent changes, we are now able to compile: void foo(char X) { for (int i = 0; i != 100; ++i) for (int j = 0; j != 100; ++j) X[j+i100] = 0; } into a single memset of 10000 bytes. This series of changes should also be helpful for other nested loop scenarios as well. llvm-svn: 123079	2011-01-08 19:59:06 +00:00
Chris Lattner	30f318e5d1	split ssa updating code out to its own helper function. Don't bother moving the OrigHeader block anymore: we just merge it away anyway so its code layout doesn't matter. llvm-svn: 123077	2011-01-08 19:26:33 +00:00
Chris Lattner	2615130e1d	Implement a TODO: Enhance loopinfo to merge away the unconditional branch that it was leaving in loops after rotation (between the original latch block and the original header. With this change, it is possible for rotated loops to have just a single basic block, which is useful. llvm-svn: 123075	2011-01-08 19:10:28 +00:00
Chris Lattner	930b716e1b	various code cleanups, enhance MergeBlockIntoPredecessor to preserve loop info. llvm-svn: 123074	2011-01-08 19:08:40 +00:00
Chris Lattner	fee37c5fa3	inline preserveCanonicalLoopForm now that it is simple. llvm-svn: 123073	2011-01-08 18:55:50 +00:00
Chris Lattner	063dca0f6a	Three major changes: 1. Rip out LoopRotate's domfrontier updating code. It isn't needed now that LICM doesn't use DF and it is super complex and gross. 2. Make DomTree updating code a lot simpler and faster. The old loop over all the blocks was just to find a block?? 3. Change the code that inserts the new preheader to just use SplitCriticalEdge instead of doing an overcomplex reimplementation of it. No behavior change, except for the name of the inserted preheader. llvm-svn: 123072	2011-01-08 18:52:51 +00:00
Chris Lattner	30d95f9f87	reduce nesting. llvm-svn: 123071	2011-01-08 18:47:43 +00:00
Chris Lattner	7fab23bc1d	LoopRotate requires canonical loop form, so it always has preheaders and latch blocks. Reorder entry conditions to make hte pass faster and more logical. llvm-svn: 123069	2011-01-08 18:06:22 +00:00
Chris Lattner	d62691f4e8	use the LI ivar. llvm-svn: 123068	2011-01-08 17:49:51 +00:00
Chris Lattner	385f2ec6d8	some cleanups: remove dead arguments and eliminate ivars that are just passed to one function. llvm-svn: 123067	2011-01-08 17:48:33 +00:00
Chris Lattner	25ba40a0cc	fix an issue duncan pointed out, which could cause loop rotate to violate LCSSA form llvm-svn: 123066	2011-01-08 17:38:45 +00:00
Cameron Zwarich	b4ab257bcc	Fix coding style issues. llvm-svn: 123065	2011-01-08 17:07:11 +00:00
Cameron Zwarich	84986b298a	Make more passes preserve dominators (or state that they preserve dominators if they all ready do). This removes two dominator recomputations prior to isel, which is a 1% improvement in total llc time for 403.gcc. The only potentially suspect thing is making GCStrategy recompute dominators if it used a custom lowering strategy. llvm-svn: 123064	2011-01-08 17:01:52 +00:00
Cameron Zwarich	80bd9af7c5	Contract subloop bodies. However, it is still important to visit the phis at the top of subloop headers, as the phi uses logically occur outside of the subloop. llvm-svn: 123062	2011-01-08 15:52:22 +00:00
Frits van Bommel	6a1fb8f235	Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little. llvm-svn: 123061	2011-01-08 10:51:36 +00:00
Chris Lattner	8c5defd0b0	Have loop-rotate simplify instructions (yay instsimplify!) as it clones them into the loop preheader, eliminating silly instructions like "icmp i32 0, 100" in fixed tripcount loops. This also better exposes the bigger problem with loop rotate that I'd like to fix: once this has been folded, the duplicated conditional branch often turns into an uncond branch. Not aggressively handling this is pessimizing later loop optimizations somethin' fierce by making "dominates all exit blocks" checks fail. llvm-svn: 123060	2011-01-08 08:24:46 +00:00
Chris Lattner	43f8d16482	Revamp the ValueMapper interfaces in a couple ways: 1. Take a flags argument instead of a bool. This makes it more clear to the reader what it is used for. 2. Add a flag that says that "remapping a value not in the map is ok". 3. Reimplement MapValue to share a bunch of code and be a lot more efficient. For lookup failures, don't drop null values into the map. 4. Using the new flag a bunch of code can vaporize in LinkModules and LoopUnswitch, kill it. No functionality change. llvm-svn: 123058	2011-01-08 08:15:20 +00:00
Chris Lattner	2b3f20e6ec	two minor changes: switch to the standard ValueToValueMapTy map from ValueMapper.h (giving us access to its utilities) and add a fastpath in the loop rotation code, avoiding expensive ssa updator manipulation for values with nothing to update. llvm-svn: 123057	2011-01-08 07:21:31 +00:00
Tobias Grosser	fc3d7f664b	InstCombine: Match min/max hidden by sext/zext X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X Instead of calculating this with mixed types promote all to the larger type. This enables scalar evolution to analyze this expression. PR8866 llvm-svn: 123034	2011-01-07 21:33:14 +00:00
Tobias Grosser	411e6eedff	Some whitespace fixes llvm-svn: 123033	2011-01-07 21:33:13 +00:00
Benjamin Kramer	134cde912a	Revert 122959, it needs more thought. Add it back to README.txt with additional notes. llvm-svn: 123030	2011-01-07 20:42:20 +00:00
Jay Foad	89afb43b1e	Remove all uses of the "ugly" method BranchInst::setUnconditionalDest(). llvm-svn: 123025	2011-01-07 20:25:56 +00:00

1 2 3 4 5 ...

7742 Commits