llvm-project

Commit Graph

Author	SHA1	Message	Date
Cameron Zwarich	3088e0a179	Make tTAILJMPr/tTAILJMPrND emit a tBX without a preceding MOV of PC to LR. This fixes <rdar://problem/9495913> llvm-svn: 132042	2011-05-25 04:45:27 +00:00
Rafael Espindola	fc9bae6f8b	Replace the -unwind-tables option with a per function flag. This is more LTO friendly as we can now correctly merge files compiled with or without -fasynchronous-unwind-tables. llvm-svn: 132033	2011-05-25 03:44:17 +00:00
Akira Hatanaka	aac670c1c8	Fix lowering of DYNAMIC_STACKALLOC nodes. llvm-svn: 132030	2011-05-25 02:20:00 +00:00
Eric Christopher	1b724948e9	Implement the arm 'L' asm modifier. Part of rdar://9119939 llvm-svn: 132024	2011-05-24 23:27:13 +00:00
Eric Christopher	b1dda56ac2	Implement the immediate part of the 'B' modifier. Part of rdar://9119939 llvm-svn: 132023	2011-05-24 23:15:43 +00:00
Eric Christopher	7617883ce3	Add support for the arm 'y' asm modifier. Fixes part of rdar://9444657 llvm-svn: 132011	2011-05-24 22:10:34 +00:00
Akira Hatanaka	2486729839	Test case for r132003. llvm-svn: 132005	2011-05-24 21:28:18 +00:00
Akira Hatanaka	ce4037ebcf	Fix test case. llvm-svn: 131988	2011-05-24 19:37:15 +00:00
Akira Hatanaka	0f30561bae	Revision 131986 test case. llvm-svn: 131987	2011-05-24 19:29:37 +00:00
Rafael Espindola	0f33be1b87	Fix the defaults for .eh_frame. We were marking it as writable. llvm-svn: 131951	2011-05-24 02:50:20 +00:00
Evan Cheng	88f9137fd7	- Teach SelectionDAG::isKnownNeverZero to return true (op x, c) when c is non-zero. - Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero. rdar://9490949 llvm-svn: 131948	2011-05-24 01:48:22 +00:00
Akira Hatanaka	6af5bd2537	Add pattern for double-to-integer conversion. Patch by Sasa Stankovic. llvm-svn: 131927	2011-05-23 22:16:43 +00:00
Dan Gohman	6c4a319088	When checking for signed multiplication overflow, watch out for INT_MIN and -1. This fixes PR9845. llvm-svn: 131919	2011-05-23 21:07:39 +00:00
Akira Hatanaka	f9e5750fc8	Change StackDirection from StackGrowsUp to StackGrowsDown. The following improvements are accomplished as a result of applying this patch: - Fixed frame objects' offsets (relative to either the virtual frame pointer or the stack pointer) are set before instruction selection is completed. There is no need to wait until Prologue/Epilogue Insertion is run to set them. - Calculation of final offsets of fixed frame objects is straightforward. It is no longer necessary to assign negative offsets to fixed objects for incoming arguments in order to distinguish them from the others. - Since a fixed object has its relative offset set during instruction selection, there is no need to conservatively set its alignment to 4. - It is no longer necessary to reorder non-fixed frame objects in MipsFrameLowering::adjustMipsStackFrame. llvm-svn: 131915	2011-05-23 20:16:59 +00:00
Devang Patel	9987d3098b	Test case for r131908. llvm-svn: 131909	2011-05-23 17:49:29 +00:00
Devang Patel	c4d9a84159	While replacing all uses of a SDValue with another value, do not forget to transfer SDDbgValue. llvm-svn: 131907	2011-05-23 17:35:08 +00:00
Cameron Zwarich	bc90690b24	Fix <rdar://problem/9476260> by having tail calls always generate 32-bit branches in Darwin Thumb2 code. Tail calls are already disabled on Thumb1. llvm-svn: 131894	2011-05-23 01:57:17 +00:00
Renato Golin	4cd5187f5b	RTABI chapter 4.3.4 specifies __eabi_mem* calls. Specifically, __eabi_memset accepts parameters (ptr, size, value) in a different order than GNU's memset (ptr, value, size), therefore the special lowering in AAPCS mode. Implementation by Evzen Muller. llvm-svn: 131868	2011-05-22 21:41:23 +00:00
Benjamin Kramer	2fd48f2730	Implement mulo x, 2 -> addo x, x in DAGCombiner. llvm-svn: 131800	2011-05-21 18:31:55 +00:00
Benjamin Kramer	e08fb1dce9	Merge and FileCheckize test cases. llvm-svn: 131799	2011-05-21 18:31:48 +00:00
Eli Friedman	60afcc2a6f	Add fast-isel support for byval calls on x86. llvm-svn: 131764	2011-05-20 22:21:04 +00:00
Stuart Hastings	91f1d24736	Re-commit 131641 with fixes; de-pseudoize MOVSX16rr8 and friends. rdar://problem/8614450 llvm-svn: 131746	2011-05-20 19:04:40 +00:00
Akira Hatanaka	43407fe633	Make $fp and $ra callee-saved registers and let PrologEpilogInserter handle saving and restoring them. llvm-svn: 131745	2011-05-20 18:39:33 +00:00
Chad Rosier	ad00f3d0b9	Fixed regression due to commit 131709, which disables vararg tail call optimizations on Win64 llvm-svn: 131740	2011-05-20 17:49:39 +00:00
Benjamin Kramer	0bf26746d9	Rename the "sandybridge" subtarget to "corei7-avx", for GCC compatibility. llvm-svn: 131730	2011-05-20 15:11:26 +00:00
Cameron Zwarich	e0a52df6e5	Fix PR9960 by teaching SimpleRegisterCoalescing::AdjustCopiesBackFrom() to preserve the phikill flag. llvm-svn: 131717	2011-05-20 03:54:04 +00:00
Akira Hatanaka	fe4f9d5977	Fix bug in which nodes that write to argument registers do not get glued with the JALR node. Patch by Sasa Stankovic llvm-svn: 131714	2011-05-20 02:30:51 +00:00
Chad Rosier	552f8c4819	Don't attempt to tail call optimize for Win64. llvm-svn: 131709	2011-05-20 00:59:28 +00:00
Evan Cheng	e8d2e9eb35	Revert r131664 and fix it in instcombine instead. rdar://9467055 llvm-svn: 131708	2011-05-20 00:54:37 +00:00
Eli Friedman	22da799428	Add fast-isel support for zeroext and signext ret instructions on x86. llvm-svn: 131689	2011-05-19 22:16:13 +00:00
Eric Christopher	4014e5e208	Oddly people want to use the 'r' constraint for fp constants on x86. Fixes rdar://9218925 Fixes PR9601 llvm-svn: 131682	2011-05-19 21:33:47 +00:00
Eli Friedman	e53a77d3a6	Fix up this test to use explicit triples (Win64 passes a different number of arguments in registers). llvm-svn: 131676	2011-05-19 21:13:08 +00:00
Akira Hatanaka	9e6a8cca5d	Align i64 arguments to 64 bit boundaries. llvm-svn: 131668	2011-05-19 20:29:48 +00:00
Evan Cheng	2b9bd38678	crc32 with 64-bit output zeros upper 32-bits. rdar://9467055 llvm-svn: 131664	2011-05-19 18:57:12 +00:00
Stuart Hastings	ae012a7525	Move test to Transforms/InstCombine. llvm-svn: 131634	2011-05-19 05:53:22 +00:00
Tanya Lattner	1d11720ae4	Handle perfect shuffle case that generates a vrev for vectors of floats. Add test case. llvm-svn: 131582	2011-05-18 21:44:54 +00:00
Chad Rosier	f4e832b14e	Enables vararg functions that pass all arguments via registers to be optimized into tail-calls when possible. llvm-svn: 131560	2011-05-18 19:59:50 +00:00
Stuart Hastings	51d696766c	An imminent fix to the x86_64 byval logic will expose a flaw in the x86_64 sibcall logic. I've filed PR9943 for the sibcall problem, and this patch alters the testcase to work around the flaw. When PR9943 is fixed, this patch should be reverted. llvm-svn: 131557	2011-05-18 19:19:17 +00:00
Eli Friedman	3f46c3e702	Force a triple on a couple of tests; we don't support fast-isel of ret on Win64. llvm-svn: 131540	2011-05-18 17:16:37 +00:00
Stuart Hastings	38849debb5	Merge pmovzx test case into existing file. llvm-svn: 131539	2011-05-18 17:02:04 +00:00
Justin Holewinski	bbdcd17d44	PTX: add flag to disable mad/fma selection Patch by Dan Bailey llvm-svn: 131537	2011-05-18 15:42:23 +00:00
Tanya Lattner	48b182c3a4	In r131488 I misunderstood how VREV works. It splits the vector in half and splits each half. Therefore, the real problem was that we were using a VREV64 for a 4xi16, when we should have been using a VREV32. Updated test case and reverted change to the PerfectShuffle Table. llvm-svn: 131529	2011-05-18 06:42:21 +00:00
Eli Friedman	7d7ad8374f	Make some of the fast-isel tests actually test fast-isel (and fix test failures). llvm-svn: 131510	2011-05-18 00:00:10 +00:00
Stuart Hastings	5bd18b6638	X86 pmovsx/pmovzx ignore the upper half of their inputs. rdar://problem/6945110 llvm-svn: 131493	2011-05-17 22:13:31 +00:00
Tanya Lattner	c7e291b354	vrev is incorrectly defined in the perfect shuffle table. The ordering is backwards (should be 0x3210 versus 0x1032) which exposed a bug when doing a shuffle on a 4xi16. I've attached a test case. llvm-svn: 131488	2011-05-17 20:48:40 +00:00
Galina Kistanova	dd45646a47	Move test for appropriate directory. llvm-svn: 131477	2011-05-17 19:06:43 +00:00
Eli Friedman	7b27942fe7	Add x86 fast-isel for calls returning first-class aggregates. rdar://9435872. This is r131438 with a couple small fixes. llvm-svn: 131474	2011-05-17 18:29:03 +00:00
Eli Friedman	7335e8a720	Back out r131444 and r131438; they're breaking nightly tests. I'll look into it more tomorrow. llvm-svn: 131451	2011-05-17 02:36:59 +00:00
Eli Friedman	e5f7f26df0	Fix test. llvm-svn: 131444	2011-05-17 00:39:14 +00:00
Evan Cheng	54459240e3	Add target triple so test doesn't fail on Windows machines. llvm-svn: 131439	2011-05-17 00:15:58 +00:00
Eli Friedman	83ba150f3a	Add x86 fast-isel for calls returning first-class aggregates. rdar://9435872. llvm-svn: 131438	2011-05-17 00:13:47 +00:00
Jakob Stoklund Olesen	4edf17d91f	Teach LiveInterval::isZeroLength about null SlotIndexes. When instructions are deleted, they leave tombstone SlotIndex entries. The isZeroLength method should ignore these null indexes. This causes RABasic to sometimes spill a callee-saved register in the abi-isel.ll test, so don't run that test with -regalloc=basic. Prioritizing register allocation according to spill weight can cause more registers to be used. llvm-svn: 131436	2011-05-16 23:50:05 +00:00
Eli Friedman	d4a3609d30	Remove dead code. Fix associated test to use FileCheck. llvm-svn: 131424	2011-05-16 21:28:22 +00:00
Eli Friedman	a4d4a0162d	Make fast-isel work correctly s/uadd.with.overflow intrinsics. llvm-svn: 131420	2011-05-16 21:06:17 +00:00
Eli Friedman	9ac944774f	Basic fast-isel of extractvalue. Not too helpful on its own, given the IR clang generates for cases like this, but it should become more useful soon. llvm-svn: 131417	2011-05-16 20:27:46 +00:00
Rafael Espindola	df9db7ed92	Don't produce a vmovntdq if we don't have AVX support. llvm-svn: 131330	2011-05-14 00:30:01 +00:00
Rafael Espindola	e53b7d1a11	Make codegen able to handle values of empty types. This is one way to fix PR9900. I will keep it open until sable is able to comment on it. llvm-svn: 131294	2011-05-13 15:18:06 +00:00
Stuart Hastings	aa02c0847d	Since I can't reproduce the failures from 131261, re-trying with a simplified version. <rdar://problem/9298790> llvm-svn: 131274	2011-05-13 00:51:54 +00:00
Stuart Hastings	8d57d8ea64	Revert 131266 and 131261 due to buildbot complaints. rdar://problem/9298790 llvm-svn: 131269	2011-05-13 00:15:17 +00:00
Stuart Hastings	ef4940254f	Tweak 131261 (thumb2-cbnz.ll) to generate the intended cbnz. rdar://problem/9298790 llvm-svn: 131266	2011-05-13 00:10:03 +00:00
Stuart Hastings	89f1b47e3a	Non-fast-isel followup to 129634; correctly handle branches controlled by non-CMP expressions. The executable test case (129821) would test this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC tester. Alas, the ARM assembly would be very difficult to check with FileCheck. The thumb2-cbnz.ll test is affected; it generates larger code (tst.w vs. cmp #0), but I believe the new version is correct. rdar://problem/9298790 llvm-svn: 131261	2011-05-12 23:36:41 +00:00
Galina Kistanova	9e56e51fab	Correction. Use explicit target triple in the test. llvm-svn: 131252	2011-05-12 21:55:34 +00:00
Evan Cheng	43054e6159	Re-enable branchfolding common code hoisting optimization. Fixed a liveness test bug and also taught it to update liveins. llvm-svn: 131241	2011-05-12 20:30:01 +00:00
Stuart Hastings	114ecbd0f4	Move this test to CodeGen/Thumb. rdar://problem/9416774 llvm-svn: 131196	2011-05-11 19:41:28 +00:00
Devang Patel	34a6620748	Identify end of prologue (and beginning of function body) using DW_LNS_set_prologue_end line table opcode. llvm-svn: 131194	2011-05-11 19:22:19 +00:00
Nadav Rotem	8a7beb80f0	Fixes a bug in the DAGCombiner. LoadSDNodes have two values (data, chain). If there is a store after the load node, then there is a chain, which means that there is another user. Thus, asking hasOneUser would fail. Instead we ask hasNUsesOfValue on the 'data' value. llvm-svn: 131183	2011-05-11 14:40:50 +00:00
Nadav Rotem	8f971c27fb	Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector. llvm-svn: 131179	2011-05-11 08:12:09 +00:00
Rafael Espindola	2a09d65979	Revert 131172 as it is causing clang to miscompile itself. I will try to provide a reduced testcase. llvm-svn: 131176	2011-05-11 03:27:17 +00:00
Evan Cheng	05fc35e275	Add a late optimization to BranchFolding that hoist common instruction sequences at the start of basic blocks to their common predecessor. It's actually quite common (e.g. about 50 times in JM/lencod) and has shown to be a nice code size benefit. e.g. pushq %rax testl %edi, %edi jne LBB0_2 ## BB#1: xorb %al, %al popq %rdx ret LBB0_2: xorb %al, %al callq _foo popq %rdx ret => pushq %rax xorb %al, %al testl %edi, %edi je LBB0_2 ## BB#1: callq _foo LBB0_2: popq %rdx ret rdar://9145558 llvm-svn: 131172	2011-05-11 01:03:01 +00:00
Rafael Espindola	19c1a56287	Produce a __debug_frame section on darwin ARM when appropriate. llvm-svn: 131151	2011-05-10 21:04:45 +00:00
Justin Holewinski	3c0447259c	PTX: add test cases for cvt, fneg, and selp Patch by Dan Bailey llvm-svn: 131128	2011-05-10 14:53:13 +00:00
Benjamin Kramer	d724a590e5	X86: Add a bunch of peeps for add and sub of SETB. "b + ((a < b) ? 1 : 0)" compiles into cmpl %esi, %edi adcl $0, %esi instead of cmpl %esi, %edi sbbl %eax, %eax andl $1, %eax addl %esi, %eax This saves a register, a false dependency on %eax (Intel's CPUs still don't ignore it) and it's shorter. llvm-svn: 131070	2011-05-08 18:36:07 +00:00
Jakob Stoklund Olesen	a5c889982a	Emit a proper error message when register allocators run out of registers. This can't be just an assertion, users can always write impossible inline assembly. Such an assembly statement should be included in the error message. llvm-svn: 131024	2011-05-06 21:58:30 +00:00
Justin Holewinski	11d70b6b32	PTX: add PTX 2.3 language target Patch by Wei-Ren Chen llvm-svn: 130980	2011-05-06 11:40:36 +00:00
Eli Friedman	5401962643	Re-revert r130877; it's apparently causing a regression on 197.parser, possibly related to cbnz formation. llvm-svn: 130977	2011-05-06 05:23:07 +00:00
Rafael Espindola	a4982bddf3	Don't produce a __debug_frame. I tested both gdb on a bootstrapped clang and and the gdb testsuite on OS X (snow leopard) and both are happy using __eh_frame. llvm-svn: 130937	2011-05-05 18:43:39 +00:00
Eli Friedman	441a01a2b8	Avoid extra vreg copies for arguments passed in registers. Specifically, this can make MachineCSE more effective in some cases (especially in small functions). PR8361 / part of rdar://problem/8259436 . llvm-svn: 130928	2011-05-05 16:53:34 +00:00
Jakob Stoklund Olesen	17d4f9bbcc	Prepare remaining tests for -join-physreg going away. llvm-svn: 130893	2011-05-04 23:54:59 +00:00
Jakob Stoklund Olesen	369bddf5ad	Fix a batch of x86 tests to be coalescer independent. Most of these tests require a single mov instruction that can come either before or after a 2-addr instruction. -join-physregs changes the behavior, but the results are equivalent. llvm-svn: 130891	2011-05-04 23:54:51 +00:00
Dan Gohman	dd550305e6	Give this test an explicit register allocator, so that it can work even if the default register allocator is changed. llvm-svn: 130883	2011-05-04 23:14:02 +00:00
Bill Wendling	2a40131f6b	SjLj EH could produce a machine basic block that legitimately has more than one landing pad as its successor. SjLj exception handling jumps to the correct landing pad via a switch statement that's generated right before code-gen. Loosen the constraint in the machine instruction verifier to allow for this. Note, this isn't the most rigorous check since we cannot determine where that switch statement came from. But it's marginally better than turning this check off when SjLj exceptions are used. <rdar://problem/9187612> llvm-svn: 130881	2011-05-04 22:54:05 +00:00
Eli Friedman	0fe4608af2	Re-commit r130862 with a minor change to avoid an iterator running off the edge in some cases. Original message: Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 . llvm-svn: 130877	2011-05-04 22:10:36 +00:00
Galina Kistanova	e53ae508ec	This test fails on ARM. The test shouldn't explicitly specify alignment (and alignment 4 is wrong) and requires hard-float. llvm-svn: 130875	2011-05-04 21:57:44 +00:00
Eli Friedman	3bd79ba856	Back out r130862; it appears to be breaking bootstrap. llvm-svn: 130867	2011-05-04 20:48:42 +00:00
Eli Friedman	a16fc2fec0	Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 . llvm-svn: 130862	2011-05-04 19:54:24 +00:00
Jakob Stoklund Olesen	28a93a49bb	Fix more register and coalescing dependencies. llvm-svn: 130859	2011-05-04 19:02:11 +00:00
Jakob Stoklund Olesen	d7fd7bfc31	Explicitly request physreg coalesing for a bunch of Thumb2 unit tests. These tests all follow the same pattern: mov r2, r0 movs r0, #0 $CMP r2, r1 it eq moveq r0, #1 bx lr The first 'mov' can be eliminated by rematerializing 'movs r0, #0' below the test instruction: $CMP r0, r1 mov.w r0, #0 it eq moveq r0, #1 bx lr So far, only physreg coalescing can do that. The register allocators won't yet split live ranges just to eliminate copies. They can learn, but this particular problem is not likely to show up in real code. It only appears because r0 is used for both the function argument and return value. llvm-svn: 130858	2011-05-04 19:02:07 +00:00
Jakob Stoklund Olesen	e7528c45ea	FileCheckize and break dependence on coalescing order. llvm-svn: 130856	2011-05-04 19:02:01 +00:00
Jakob Stoklund Olesen	067ba3c23c	Explicitly request -join-physregs for some tests that depend on it. llvm-svn: 130855	2011-05-04 19:01:59 +00:00
Devang Patel	39ecf816c5	Do not emit location expression size twice. llvm-svn: 130854	2011-05-04 19:00:57 +00:00
Akira Hatanaka	3bace5d223	Remove LLVM IR metadata in test case committed in r130847. llvm-svn: 130849	2011-05-04 18:28:36 +00:00
Akira Hatanaka	23e8ecf125	Prevent instructions using $gp from being placed between a jalr and the instruction that restores the clobbered $gp. llvm-svn: 130847	2011-05-04 17:54:27 +00:00
Jakob Stoklund Olesen	f1b401800a	Don't depend on the physreg coalescing order. llvm-svn: 130818	2011-05-04 01:01:47 +00:00
Jakob Stoklund Olesen	5b5abb4ea1	Don't run this test through -regalloc=basic. The basic allocator is really bad about hinting, so it doesn't eliminate all copies when physreg joining is disabled. llvm-svn: 130817	2011-05-04 01:01:44 +00:00
Jakob Stoklund Olesen	d3b2f44c9d	Fix register-dependent XCore tests llvm-svn: 130816	2011-05-04 01:01:41 +00:00
Jakob Stoklund Olesen	7f7fc82141	Fix register-dependent test in MSP430. llvm-svn: 130815	2011-05-04 01:01:39 +00:00
Jakob Stoklund Olesen	51b35f7bb1	Fix a bunch of ARM tests to be register allocation independent. llvm-svn: 130800	2011-05-03 22:31:21 +00:00
Bill Wendling	db0996c822	Replace the "movnt" intrinsics with a native store + nontemporal metadata bit. <rdar://problem/8460511> llvm-svn: 130791	2011-05-03 21:11:17 +00:00
Evan Cheng	93b5cdc5ab	Make the test less likely to fail with minor changes. llvm-svn: 130778	2011-05-03 19:09:32 +00:00
Bob Wilson	c5242b0e78	Remove test for iOS divmod function, since that is disabled for now. llvm-svn: 130769	2011-05-03 17:54:49 +00:00
Bruno Cardoso Lopes	168c9005b5	Add a few ARM coprocessor intrinsics. Testcases included llvm-svn: 130763	2011-05-03 17:29:22 +00:00
Dan Gohman	6136e94897	Add an unfolded offset field to LSR's Formula record. This is used to model constants which can be added to base registers via add-immediate instructions which don't require an additional register to materialize the immediate. llvm-svn: 130743	2011-05-03 00:46:49 +00:00
Rafael Espindola	5164e6e8b2	Add 130690 back. llvm-svn: 130693	2011-05-02 15:58:16 +00:00
Rafael Espindola	a392475865	Revert while I debug the tests that use march but not mtriple. llvm-svn: 130691	2011-05-02 15:42:31 +00:00
Rafael Espindola	c2aad4f2a3	Move ppc OS X to cfi too. I am building it on an old ppc mini, but it will take some time. llvm-svn: 130690	2011-05-02 15:00:52 +00:00
Rafael Espindola	fc8223670a	Add r130623 back now that ELF has been fixed to work with -fno-dwarf2-cfi-asm. llvm-svn: 130658	2011-05-01 15:44:13 +00:00
Rafael Espindola	750cb61553	GCC uses a different encoding of pointers in the FDE when using -fno-dwarf2-cfi-asm. Implement the same behavior. llvm-svn: 130637	2011-05-01 04:49:54 +00:00
Rafael Espindola	b7c2286055	Revert the previous patch while I figure out how to make llvm-gcc less agressive about disabling cfi on linux :-( llvm-svn: 130626	2011-04-30 23:03:44 +00:00
Rafael Espindola	5265bc483e	Enable CFI on OS X. Currently the output should be almost identical to the one produced by CodeGen to make the transition easier. The only two differences I know of are: * Some files get an extra advance loc of size 0. This will be fixed when relaxations are enabled. * The optimization of declaring an EH symbol as an external variable is not implemented. This is a subset of adding the nounwind attribute, so we if really this at -O0 we should probably do it at the IL level. llvm-svn: 130623	2011-04-30 22:29:54 +00:00
Jakob Stoklund Olesen	f5eaa8dc62	Allow folded spills in test. llvm-svn: 130599	2011-04-30 08:00:50 +00:00
Jakob Stoklund Olesen	edfabc9aad	Weekly fix of register allocation dependent unit tests. llvm-svn: 130567	2011-04-30 01:37:52 +00:00
Eli Friedman	4105ed1523	Make FastEmit_ri_ try a bit harder to succeed for supported operations; FastEmit_i can fail for non-Thumb2 ARM. Makes ARMSimplifyAddress work correctly, and reduces the number of fast-isel bailouts on non-Thumb ARM. llvm-svn: 130560	2011-04-29 23:34:52 +00:00
Eli Friedman	328bad02fa	Switch to ImmLeaf (which can be used by FastISel) for a few more common ARM/Thumb2 patterns. llvm-svn: 130552	2011-04-29 22:48:03 +00:00
Eli Friedman	dd937843d3	Fix run-line, again. :( llvm-svn: 130540	2011-04-29 21:33:03 +00:00
Eli Friedman	86caced370	Re-committing r130454, which does not in fact break anything. Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register. rdar://problem/9338332 . llvm-svn: 130539	2011-04-29 21:22:56 +00:00
Eric Christopher	8d46b47787	Add trunc->branch support, this won't help with clang's i8->i1 truncations for bools, but is a start. llvm-svn: 130534	2011-04-29 20:02:39 +00:00
Rafael Espindola	697edc89a5	Change DwarfCFIException's member variables to track what it actually emmits: .cfi_personality, .cfi_lsda and the moves. llvm-svn: 130503	2011-04-29 14:48:51 +00:00
Andrew Trick	e794e17524	Teach Thumb2 isel to fold and->rotr ==> ROR. Generalization of Nate Begeman's patch! llvm-svn: 130502	2011-04-29 14:18:15 +00:00
Andrew Trick	65266ed4d7	Combine thumb2-ror tests. llvm-svn: 130498	2011-04-29 14:02:41 +00:00
Eli Friedman	517728b1ae	Revert r130454; apparently this doesn't actually work. llvm-svn: 130462	2011-04-28 23:55:14 +00:00
Eli Friedman	37b9ede969	Fix runline. llvm-svn: 130455	2011-04-28 23:12:24 +00:00
Eli Friedman	e4ecd42926	Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register. rdar://problem/9338332 . llvm-svn: 130454	2011-04-28 23:03:25 +00:00
Eli Friedman	7cd5101ad3	fast-isel sret calls, try 2. We actually do need to do something on x86-32. rdar://problem/9303592 . llvm-svn: 130429	2011-04-28 20:19:12 +00:00
Eli Friedman	3cf6d4032a	Actually revert r130348 correctly. llvm-svn: 130418	2011-04-28 18:20:24 +00:00
Eli Friedman	d5a80ca3c8	Revert r130348; causing buildbot issues on x86-32. llvm-svn: 130412	2011-04-28 18:06:10 +00:00
Devang Patel	3e021533cd	Teach dwarf writer to handle complex address expression for .debug_loc entries. This fixes clang generated blocks' variables' debug info. Radar 9279956. llvm-svn: 130373	2011-04-28 02:22:40 +00:00
Eli Friedman	33c133919a	Fix a silly mistake in r130338. llvm-svn: 130360	2011-04-28 00:42:03 +00:00
Justin Holewinski	18e6ac83ea	PTX: support for bitwise operations on predicates - selection of bitwise preds (AND, OR, XOR) - new bitwise.ll test Patch by Dan Bailey llvm-svn: 130353	2011-04-28 00:19:51 +00:00
Eli Friedman	8bd572fc58	fast-isel sret. We actually don't need to do anything special on x86. :) rdar://problem/9303592 . llvm-svn: 130348	2011-04-27 23:58:52 +00:00
Eli Friedman	406c471b69	Make the fast-isel code for literal 0.0 a bit shorter/faster, since 0.0 is common. rdar://problem/9303592 . llvm-svn: 130338	2011-04-27 22:41:55 +00:00
Evan Cheng	9808d31b9e	If converter was being too cute. It look for root BBs (which don't have successors) and use inverse depth first search to traverse the BBs. However that doesn't work when the CFG has infinite loops. Simply do a linear traversal of all BBs work just fine. rdar://9344645 llvm-svn: 130324	2011-04-27 19:32:43 +00:00
Jakob Stoklund Olesen	71d3b895ba	Also add <imp-def> operands for defined and dead super-registers when rewriting. We cannot rely on the <imp-def> operands added by LiveIntervals in all cases as demonstrated by the test case. llvm-svn: 130313	2011-04-27 17:42:31 +00:00
Eli Friedman	0eea0293d9	Fix an edge case involving branches in fast-isel on x86. rdar://problem/9303306 . llvm-svn: 130272	2011-04-27 01:34:27 +00:00
Evan Cheng	1355bbdd11	Be careful about scheduling nodes above previous calls. It increase usages of more callee-saved registers and introduce copies. Only allows it if scheduling a node above calls would end up lessen register pressure. Call operands also has added ABI restrictions for register allocation, so be extra careful with hoisting them above calls. rdar://9329627 llvm-svn: 130245	2011-04-26 21:31:35 +00:00
Evan Cheng	dbb86b8108	This test should be in MC. It breaks with changes to scheduling / register allocation so it's being removed. llvm-svn: 130243	2011-04-26 21:09:04 +00:00
Benjamin Kramer	1d4c835089	Force a triple on this test to unbreak windows buildbots. llvm-svn: 130226	2011-04-26 18:47:43 +00:00
Dan Gohman	7da91aee83	Fast-isel support for simple inline asms. llvm-svn: 130205	2011-04-26 17:18:34 +00:00
Rafael Espindola	580eebaa20	Add test for PR9743. llvm-svn: 130198	2011-04-26 14:17:42 +00:00
Chris Lattner	189ca1498f	don't emit the symbol name twice for local bss and common symbols. For example, don't emit: .comm _i,4,2 ## @i ## @i instead emit: .comm _i,4,2 ## @i llvm-svn: 130192	2011-04-26 06:14:13 +00:00
Eric Christopher	238a21f2d5	Make this test disable fast isel as it's not needed. llvm-svn: 130165	2011-04-25 22:39:46 +00:00
Akira Hatanaka	0e7ee666b7	Lower BlockAddress node when relocation-model is static. llvm-svn: 130131	2011-04-25 17:10:45 +00:00
Devang Patel	734f2218ac	A dbg.declare may not be in entry block, even if it is referring to an incoming argument. However, It is appropriate to emit DBG_VALUE referring to this incoming argument in entry block in MachineFunction. llvm-svn: 130129	2011-04-25 16:33:52 +00:00
Benjamin Kramer	ba446cc12a	Make tests more useful. lit needs a linter ... llvm-svn: 130126	2011-04-25 10:12:01 +00:00
Andrew Trick	76dca78cb4	Accidental function name mangling. llvm-svn: 130050	2011-04-23 04:08:15 +00:00
Andrew Trick	0ed5778a1e	Thumb2 and ARM add/subtract with carry fixes. Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>. t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the assembly printer correctly prints the 's' suffix. Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags. Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS. Fixes ARM SBC lowering to check for live carry (potential bug). llvm-svn: 130048	2011-04-23 03:55:32 +00:00
Andrew Trick	1a1f8d4640	whitespace llvm-svn: 130046	2011-04-23 03:24:11 +00:00
NAKAMURA Takumi	576273cf56	test/CodeGen/X86/shrink-compare.ll: Relax expressions for Win64. llvm-svn: 130039	2011-04-23 00:15:45 +00:00
Chris Lattner	6d277517d1	Recommit the fix for rdar://9289512 with a couple tweaks to fix bugs exposed by the gcc dejagnu testsuite: 1. The load may actually be used by a dead instruction, which would cause an assert. 2. The load may not be used by the current chain of instructions, and we could move it past a side-effecting instruction. Change how we process uses to define the problem away. llvm-svn: 130018	2011-04-22 21:59:37 +00:00
Benjamin Kramer	341c11da3b	DAGCombine: fold "(zext x) == C" into "x == (trunc C)" if the trunc is lossless. On x86 this allows to fold a load into the cmp, greatly reducing register pressure. movzbl (%rdi), %eax cmpl $47, %eax -> cmpb $47, (%rdi) This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :) llvm-svn: 130005	2011-04-22 18:47:44 +00:00
Benjamin Kramer	4c81624735	X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039) This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is uint64_t foo(uint64_t x) { return (x&1) << 42; } which used to compile into bloated code: shlq $42, %rdi ## encoding: [0x48,0xc1,0xe7,0x2a] movabsq $4398046511104, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00] andq %rdi, %rax ## encoding: [0x48,0x21,0xf8] ret ## encoding: [0xc3] with this patch we can fold the immediate into the and: andq $1, %rdi ## encoding: [0x48,0x83,0xe7,0x01] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] shlq $42, %rax ## encoding: [0x48,0xc1,0xe0,0x2a] ret ## encoding: [0xc3] It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing that without making this code even more complicated. See the TODOs in the code. llvm-svn: 129990	2011-04-22 15:30:40 +00:00
Evan Cheng	c0d2004e3c	In Thumb2 mode, lower frame indix references to: add <rd>, sp, #<imm8> ldr <rd>, [sp, #<imm8>] When the offset from sp is multiple of 4 and in range of 0-1020. This saves code size by utilizing 16-bit instructions. rdar://9321541 llvm-svn: 129971	2011-04-22 01:42:52 +00:00
Devang Patel	94ad6ac13c	Fix DWARF description of Q registers. llvm-svn: 129952	2011-04-21 23:22:35 +00:00
Devang Patel	3712c14be9	Fix DWARF description of S registers. llvm-svn: 129947	2011-04-21 22:48:26 +00:00
Devang Patel	be22131c28	Test case for r129922 llvm-svn: 129934	2011-04-21 20:16:43 +00:00
Daniel Dunbar	6309828206	Revert r1296656, "Fix rdar://9289512 - not folding load into compare at -O0...", which broke a couple GCC test suite tests at -O0. llvm-svn: 129914	2011-04-21 16:14:46 +00:00
Che-Liang Chiou	14c48e5d66	ptx: fix parameter ordering This patch depends on the prior fix r129908 that changes to use std::find, rather than std::binary_search, on unordered array. Patch by Dan Bailey llvm-svn: 129909	2011-04-21 10:56:58 +00:00
Evan Cheng	5f1ba4cd2d	Remove -use-divmod-libcall. Let targets opt in when they are available. llvm-svn: 129884	2011-04-20 22:20:12 +00:00
Stuart Hastings	1b06a10d62	Un-XFAIL this test for ARM. <rdar://problem/7662569> llvm-svn: 129875	2011-04-20 21:47:45 +00:00
Justin Holewinski	7d8895e767	PTX: Add intrinsics to list of built-in intrinsics, which allows them to be used by Clang. To help Clang integration, the PTX target has been split into two targets: ptx32 and ptx64, depending on the desired pointer size. - Add GCCBuiltin class to all intrinsics - Split PTX target into ptx32 and ptx64 llvm-svn: 129851	2011-04-20 15:37:17 +00:00
Eric Christopher	bcaedb5ce0	Rewrite the expander for umulo/smulo to remember to sign extend the input manually and pass all (now) 4 arguments to the mul libcall. Add a new ExpandLibCall for just this (copied gratuitously from type legalization). Fixes rdar://9292577 llvm-svn: 129842	2011-04-20 01:19:45 +00:00
Daniel Dunbar	ed3d5496dc	llc: Eliminate a use of getDarwinMajorNumber(). - As before, there is a minor semantic change here (evidenced by the test change) for Darwin triples that have no version component. I debated changing the default behavior of isOSVersionLT, but decided it made more sense for triples to be explicit. llvm-svn: 129805	2011-04-19 20:46:13 +00:00
Daniel Dunbar	4a7783b0c2	CodeGen: Eliminate a use of getDarwinMajorNumber(). - There is a minor semantic change here (evidenced by the test change) for Darwin triples that have no version component. I debated changing the default behavior of isOSVersionLT, but decided it made more sense for triples to be explicit. llvm-svn: 129802	2011-04-19 20:32:39 +00:00
Bob Wilson	0858c3aaed	This patch combines several changes from Evan Cheng for rdar://8659675. Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Enable these fp vmlx codegen changes for Cortex-A9. llvm-svn: 129775	2011-04-19 18:11:57 +00:00
Bob Wilson	d04a83f8f2	Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637. llvm-svn: 129774	2011-04-19 18:11:52 +00:00
Bob Wilson	a2881ee8a4	Avoid some 's' 16-bit instruction which partially update CPSR (and add false dependency) when it isn't dependent on last CPSR defining instruction. rdar://8928208 llvm-svn: 129773	2011-04-19 18:11:49 +00:00
Bob Wilson	df612ba006	Avoid write-after-write issue hazards for Cortex-A9. Add a avoidWriteAfterWrite() target hook to identify register classes that suffer from write-after-write hazards. For those register classes, try to avoid writing the same register in two consecutive instructions. This is currently disabled by default. We should not spill to avoid hazards! The command line flag -avoid-waw-hazard can be used to enable waw avoidance. llvm-svn: 129772	2011-04-19 18:11:45 +00:00
Eli Friedman	ee92a6b332	Add support for FastISel'ing varargs calls. llvm-svn: 129765	2011-04-19 17:22:22 +00:00
Jakob Stoklund Olesen	fb1249548f	Tighten test case a bit. Ideally, we would match an S-register to its containing D-register, but that requires arithmetic (divide by 2). llvm-svn: 129756	2011-04-19 06:14:45 +00:00
Chris Lattner	91328b317b	Implement support for x86 fastisel of small fixed-sized memcpys, which are generated en-mass for C++ PODs. On my c++ test file, this cuts the fast isel rejects by 10x and shrinks the generated .s file by 5% llvm-svn: 129755	2011-04-19 05:52:03 +00:00
Chris Lattner	5f4b783426	Implement support for fast isel of calls of i1 arguments, even though they are illegal, when they are a truncate from something else. This eliminates fully half of all the fastisel rejections on a test c++ file I'm working with, which should make a substantial improvement for -O0 compile of c++ code. This fixed rdar://9297003 - fast isel bails out on all functions taking bools llvm-svn: 129752	2011-04-19 05:09:50 +00:00
Chris Lattner	d7f7c93914	Handle i1/i8/i16 constant integer arguments to calls by prepromoting them. Before we would bail out on i1 arguments all together, now we just bail on non-constant ones. Also, we used to emit extraneous code. e.g. test12 was: movb $0, %al movzbl %al, %edi callq _test12 and test13 was: movb $0, %al xorl %edi, %edi movb %al, 7(%rsp) callq _test13f Now we get: movl $0, %edi callq _test12 and: movl $0, %edi callq _test13f llvm-svn: 129751	2011-04-19 04:42:38 +00:00
Chris Lattner	c59290a34c	be layout aware, to produce: testb $1, %al je LBB0_2 ## BB#1: ## %if.then movb $0, %al instead of: testb $1, %al jne LBB0_1 jmp LBB0_2 LBB0_1: ## %if.then movb $0, %al how 'bout that. llvm-svn: 129749	2011-04-19 04:26:32 +00:00
Chris Lattner	2c8a4c3b1b	fix rdar://9297006 - fast isel bails out on trunc to i1 -> bools cry, a common cause of fast isel rejects on c++ code. llvm-svn: 129748	2011-04-19 04:22:17 +00:00
Jakob Stoklund Olesen	bf78618db6	Make tests register allocation independent again. llvm-svn: 129739	2011-04-19 00:14:43 +00:00
Evan Cheng	4079133796	Do not lose mem_operands while lowering VLD / VST intrinsics. llvm-svn: 129738	2011-04-19 00:04:03 +00:00
Eric Christopher	c37aa0b26a	Fix a bug where we were counting the alias sets as completely used registers for fast allocation a different way. This has us updating used registers only when we're using that exact register. Fixes rdar://9207598 llvm-svn: 129711	2011-04-18 19:26:25 +00:00
Chris Lattner	48f75ad678	while we're at it, handle 'sdiv exact' of a power of 2 also, this fixes a few rejects on c++ iterator loops. llvm-svn: 129694	2011-04-18 07:00:40 +00:00
Chris Lattner	562d6e82bd	fix rdar://9297011 - udiv by power of two causing fast-isel rejects llvm-svn: 129693	2011-04-18 06:55:51 +00:00
Chris Lattner	07add49a4b	Implement major new fastisel functionality: the matcher can now handle immediates with value constraints on them (when defined as ImmLeaf's). This is particularly important for X86-64, where almost all reg/imm instructions take a i64immSExt32 immediate operand, which has a value constraint. Before this patch we ended up iseling the examples into such amazing code as: movabsq $7, %rax imulq %rax, %rdi movq %rdi, %rax ret now we produce: imulq $7, %rdi, %rax ret This dramatically shrinks the generated code at -O0 on x86-64. llvm-svn: 129691	2011-04-18 06:22:33 +00:00
Chris Lattner	353fda159d	relax this test to just check that the lock prefix is encoded properly, and to not rely on the register allocator's arbitrary operand choices. llvm-svn: 129690	2011-04-18 06:15:35 +00:00
Chris Lattner	b53ccb8e36	1. merge fast-isel-shift-imm.ll into fast-isel-x86-64.ll 2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts 3. teach tblgen to handle shift immediates that are different sizes than the shifted operands, eliminating some code from the X86 fast isel backend. 4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function instead of FastEmit_ri to simplify code. llvm-svn: 129666	2011-04-17 20:23:29 +00:00
Chris Lattner	eb729d48ff	fix an x86 fast isel issue where we'd completely give up on folding an address when we have a global variable base an an index. Instead, just give up on folding the global variable. Before we'd geenrate: _test: ## @test ## BB#0: movq _rtx_length@GOTPCREL(%rip), %rax leaq (%rax), %rax addq %rdi, %rax movzbl (%rax), %eax ret now we generate: _test: ## @test ## BB#0: movq _rtx_length@GOTPCREL(%rip), %rax movzbl (%rax,%rdi), %eax ret The difference is even more significant when there is a scale involved. This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64 llvm-svn: 129664	2011-04-17 17:47:38 +00:00
Chris Lattner	4832660b4d	fix an oversight which caused us to compile the testcase (and other less trivial things) into a dummy lea. Before we generated: _test: ## @test movq _G@GOTPCREL(%rip), %rax leaq (%rax), %rax ret now we produce: _test: ## @test movq _G@GOTPCREL(%rip), %rax ret This is part of rdar://9289558 llvm-svn: 129662	2011-04-17 17:12:08 +00:00
Chris Lattner	045c43855c	Fix rdar://9289512 - not folding load into compare at -O0 The basic issue here is that bottom-up isel is matching the branch and compare, and was failing to fold the load into the branch/compare combo. Fixing this (by allowing folding into any instruction of a sequence that is selected) allows us to produce things like: cmpb $0, 52(%rax) je LBB4_2 instead of: movb 52(%rax), %cl cmpb $0, %cl je LBB4_2 This makes the generated -O0 code run a bit faster, but also speeds up compile time by putting less pressure on the register allocator and generating less code. This was one of the biggest classes of missing load folding. Implementing this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm) line count. llvm-svn: 129656	2011-04-17 06:35:44 +00:00
Eli Friedman	55f7bf3289	Remove working entry from README. llvm-svn: 129654	2011-04-17 02:36:27 +00:00
Chris Lattner	fba7ca63cc	fix rdar://9289583 - fast isel should handle non-canonical commutative binops allowing us to fold the immediate into the 'and' in this case: int test1(int i) { return 8&i; } llvm-svn: 129653	2011-04-17 01:16:47 +00:00
Eli Friedman	55b0acd624	PR9055: extend the fix to PR4050 (r70179) to apply to zext and anyext. Returning a new node makes the code try to replace the old node, which in the included testcase is killed by CSE. llvm-svn: 129650	2011-04-16 23:25:34 +00:00
Evan Cheng	b14ce09fca	Fix divmod libcall lowering. Convert to {S\|U}DIVREM first and then expand the node to a libcall. rdar://9280991 llvm-svn: 129633	2011-04-16 03:08:26 +00:00
Akira Hatanaka	2cb3aa30dd	Re-enable test o32_cc_vararg.ll. llvm-svn: 129616	2011-04-15 22:23:09 +00:00
Cameron Zwarich	9c65e4d69c	Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate a case involving EOR, so I only added a test for ORR. llvm-svn: 129610	2011-04-15 21:24:38 +00:00
Rafael Espindola	9fef721830	Add this test back for Darwin. llvm-svn: 129607	2011-04-15 21:06:27 +00:00
Cameron Zwarich	0829b3065a	The AND instruction leaves the V flag unmodified, so it falls victim to the same problem as all of the other instructions we fold with CMPs. llvm-svn: 129602	2011-04-15 20:45:00 +00:00
Cameron Zwarich	93eae1571c	Add missing register forms of instructions to the ARM CMP-folding code. This fixes <rdar://problem/9287901>. llvm-svn: 129599	2011-04-15 20:28:28 +00:00
Akira Hatanaka	279169771b	Add pass that expands pseudo instructions into target instructions after register allocation. Define pseudos that get expanded into mtc1 or mfc1 instructions. llvm-svn: 129594	2011-04-15 19:52:08 +00:00
Rafael Espindola	a01cdb0e37	Add 129518 back with a fix for when we are producing eh just because of debug info. Change ELF systems to use CFI for producing the EH tables. This reduces the size of the clang binary in Debug builds from 690MB to 679MB. llvm-svn: 129571	2011-04-15 15:11:06 +00:00
NAKAMURA Takumi	b5e3e9dd27	Revert r129518, "Change ELF systems to use CFI for producing the EH tables. This reduces the" It broke several builds. llvm-svn: 129557	2011-04-15 03:35:57 +00:00
Evan Cheng	12bb05b75b	Fix another fcopysign lowering bug. If src is f64 and destination is f32, don't forget to right shift the source by 32 first. rdar://9287902 llvm-svn: 129556	2011-04-15 01:31:00 +00:00
Michael J. Spencer	30088ba110	Add 3DNow! intrinsics. llvm-svn: 129551	2011-04-15 00:32:41 +00:00
Evan Cheng	44887f9c7e	Follow up on r127913. Fix Thumb revsh isel. rdar://9286766 llvm-svn: 129548	2011-04-14 23:27:44 +00:00
Rafael Espindola	aa2a7cd828	Change ELF systems to use CFI for producing the EH tables. This reduces the size of the clang binary in Debug builds from 690MB to 679MB. llvm-svn: 129518	2011-04-14 15:18:53 +00:00
Andrew Trick	bfbd972b1f	In the pre-RA scheduler, maintain cmp+br proximity. This is done by pushing physical register definitions close to their use, which happens to handle flag definitions if they're not glued to the branch. This seems to be generally a good thing though, so I didn't need to add a target hook yet. The primary motivation is to generate code closer to what people expect and rule out missed opportunity from enabling macro-op fusion. As a side benefit, we get several 2-5% gains on x86 benchmarks. There is one regression: SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is an independent scheduler bug that will be tracked separately. See rdar://problem/9283108. Incidentally, pre-RA scheduling is only half the solution. Fixing the later passes is tracked by: <rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump Fixes: <rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion llvm-svn: 129508	2011-04-14 05:15:06 +00:00
Bill Wendling	410ec4aad1	As Dan pointed out, movzbl, movsbl, and friends are nicer than their alias (movzx/movsx) because they give more information. Revert that part of the patch. llvm-svn: 129498	2011-04-14 01:46:37 +00:00
Bill Wendling	7e07d6fb69	Have the X86 back-end emit the alias instead of what's being aliased. In most cases, it's much nicer and more informative reading the alias. llvm-svn: 129497	2011-04-14 01:11:51 +00:00
Cameron Zwarich	415b5e8341	Fix a typo in an ARM-specific DAG combine. This fixes <rdar://problem/9278274>. llvm-svn: 129468	2011-04-13 21:01:19 +00:00
Cameron Zwarich	9398197ef1	Fix a regression caused by r102515 where explicit alignment on globals is ignored. There was a test to catch this, but it was just blindly updated in a large change. This fixes another part of <rdar://problem/9275290>. llvm-svn: 129466	2011-04-13 20:36:04 +00:00
Cameron Zwarich	70be27e913	Fix an obvious problem with an alignment computation. AsmPrinter actually does the max itself, so it is not easy to write a test case for this, but I added a test case that would fail if the code in AsmPrinter were removed. llvm-svn: 129432	2011-04-13 09:02:43 +00:00
Cameron Zwarich	cdf59f7016	If a global variable has a specified alignment that is less than the preferred alignment for its type, use the minimum of the specified alignment and the ABI alignment. This fixes <rdar://problem/9275290>. llvm-svn: 129428	2011-04-13 06:03:16 +00:00
Andrew Trick	b53a00d2cb	Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency. Additional fixes: Do something reasonable for subtargets with generic itineraries by handle node latency the same as for an empty itinerary. Now nodes default to unit latency unless an itinerary explicitly specifies a zero cycle stage or it is a TokenFactor chain. Original fixes: UnitsSharePred was a source of randomness in the scheduler: node priority depended on the queue data structure. I rewrote the recent VRegCycle heuristics to completely replace the old heuristic without any randomness. To make the ndoe latency adjustments work, I also needed to do something a little more reasonable with TokenFactor. I gave it zero latency to its consumers and always schedule it as low as possible. llvm-svn: 129421	2011-04-13 00:38:32 +00:00
Bill Wendling	b902f1dd88	Reapply r129401 with patch for clang. llvm-svn: 129419	2011-04-13 00:36:11 +00:00
Eric Christopher	28f4c729f7	Temporarily revert r129408 to see if it brings the bots back. llvm-svn: 129417	2011-04-13 00:20:59 +00:00
Eric Christopher	d829f43c06	Fix a bug where we were counting the alias sets as completely used registers for fast allocation. Fixes rdar://9207598 llvm-svn: 129408	2011-04-12 23:23:14 +00:00
Bill Wendling	dbfde42468	Revert r129401 for now. Clang is using the old way of doing things. llvm-svn: 129403	2011-04-12 22:59:27 +00:00
Bill Wendling	47c24875a1	Remove the unaligned load intrinsics in favor of using native unaligned loads. Now that we have a first-class way to represent unaligned loads, the unaligned load intrinsics are superfluous. First part of <rdar://problem/8460511>. llvm-svn: 129401	2011-04-12 22:46:31 +00:00
Andrew Trick	1b60ad6644	Revert 129383. It causes some targets to hit a scheduler assert. llvm-svn: 129385	2011-04-12 20:14:07 +00:00
Andrew Trick	c5dd24a542	PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency. UnitsSharePred was a source of randomness in the scheduler: node priority depended on the queue data structure. I rewrote the recent VRegCycle heuristics to completely replace the old heuristic without any randomness. To make these heuristic adjustments to node latency work, I also needed to do something a little more reasonable with TokenFactor. I gave it zero latency to its consumers and always schedule it as low as possible. llvm-svn: 129383	2011-04-12 19:54:36 +00:00
Cameron Zwarich	fbcd69b96a	Split a store of a VMOVDRR into two integer stores to avoid mixing NEON and ARM stores of arguments in the same cache line. This fixes the second half of <rdar://problem/8674845>. llvm-svn: 129345	2011-04-12 02:24:17 +00:00
Wesley Peck	1914c39bd4	Add scheduling information for the MBlaze backend. llvm-svn: 129311	2011-04-11 22:31:52 +00:00
Evan Cheng	ef42bea704	Look pass copies when determining whether hoisting would end up inserting more copies. rdar://9266679 llvm-svn: 129297	2011-04-11 21:09:18 +00:00
Chris Lattner	214f114aa7	look for the verboten argument slot access in any order, thanks to Frits for pointing this out llvm-svn: 129217	2011-04-09 17:00:34 +00:00
Chris Lattner	af1bccec68	Fix a bug where RecursivelyDeleteTriviallyDeadInstructions could delete the instruction pointed to by CGP's current instruction iterator, leading to a crash on the testcase. This fixes PR9578. llvm-svn: 129200	2011-04-09 07:05:44 +00:00
Chris Lattner	418b1037b0	fix two completely broken tests, which were matching due to PR9629. llvm-svn: 129195	2011-04-09 06:34:38 +00:00
Chris Lattner	ea6afab4b0	remove a bunch of CHECK lines that aren't checking what they thought they were, because alternation was expanding wrong in {{}}'s. llvm-svn: 129194	2011-04-09 06:31:06 +00:00
Chris Lattner	41c80e89f3	have dag combine zap "store undef", which can be formed during call lowering with undef arguments. llvm-svn: 129185	2011-04-09 02:32:02 +00:00
Chris Lattner	1c42a4d159	don't test for codegen of 'store undef' llvm-svn: 129184	2011-04-09 02:31:26 +00:00
Evan Cheng	74d92c1924	Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time. llvm-svn: 129152	2011-04-08 21:37:21 +00:00
Evan Cheng	9a3f2772f0	Add option to emit @llvm.trap as a function call instead of a trap instruction. rdar://9249183. llvm-svn: 129107	2011-04-07 20:31:12 +00:00
Andrew Trick	2ad0b37318	Added a check in the preRA scheduler for potential interference on a induction variable. The preRA scheduler is unaware of induction vars, so we look for potential "virtual register cycles" instead. Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing llvm-svn: 129100	2011-04-07 19:54:57 +00:00
Akira Hatanaka	d6f1c58914	Fix handling of functions with internal linkage. llvm-svn: 129099	2011-04-07 19:51:44 +00:00
Tanya Lattner	266792a55a	Prevent ARM DAG Combiner from doing an AND or OR combine on an illegal vector type (vectors of size 3). Also included test cases. llvm-svn: 129074	2011-04-07 15:24:20 +00:00
Evan Cheng	a7c7b54dde	Change -arm-divmod-libcall to a target neutral option. llvm-svn: 129045	2011-04-07 00:58:44 +00:00
Owen Anderson	bdff1c997a	Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB. llvm-svn: 129038	2011-04-06 23:35:59 +00:00
Jakob Stoklund Olesen	1ec41e2bd9	These tests no longer require linear scan because reserved register coalescing is now universal. llvm-svn: 128936	2011-04-05 21:40:41 +00:00
Jakob Stoklund Olesen	6aa0fbf4c0	Run LiveDebugVariables in RegAllocBasic and RegAllocGreedy. llvm-svn: 128935	2011-04-05 21:40:37 +00:00
Jakob Stoklund Olesen	e20fec7732	Fix one more batch of X86 tests to be register allocation dependent. llvm-svn: 128919	2011-04-05 20:20:30 +00:00
Jakob Stoklund Olesen	18fd84c79a	When dead code elimination removes all but one use, try to fold the single def into the remaining use. Rematerialization can leave single-use loads behind that we might as well fold whenever possible. llvm-svn: 128918	2011-04-05 20:20:26 +00:00
Johnny Chen	293875ef55	Fix test-llvm failures. llvm-svn: 128906	2011-04-05 18:41:40 +00:00
Stuart Hastings	345094777f	ARM doesn't support byval yet. XFAIL this test until it does. llvm-svn: 128891	2011-04-05 17:16:21 +00:00
Jakob Stoklund Olesen	76ad3debab	Ensure all defs referring to a virtual register are marked dead by addRegisterDead(). There can be multiple defs for a single virtual register when they are defining sub-registers. The missing <dead> flag was stopping the inline spiller from eliminating dead code after rematerialization. llvm-svn: 128888	2011-04-05 16:53:50 +00:00
Rafael Espindola	7dd4d6e2e8	Print visibility info for external variables. llvm-svn: 128887	2011-04-05 15:51:32 +00:00
Eric Christopher	f392a69ff7	Fix up testcase for previous commit. llvm-svn: 128870	2011-04-05 00:56:01 +00:00
Jakob Stoklund Olesen	bd09d45489	Fix register-dependent X86 tests. llvm-svn: 128867	2011-04-05 00:32:44 +00:00
Jakob Stoklund Olesen	2e85396509	Allow coalescing with reserved physregs in certain cases: When a virtual register has a single value that is defined as a copy of a reserved register, permit that copy to be joined. These virtual register are usually copies of the stack pointer: %vreg75<def> = COPY %ESP; GR32:%vreg75 MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill> MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0 MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0 CALLpcrel32 ... Coalescing these virtual registers early decreases register pressure. Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after register allocation was completed. The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail because it depends on linear scan spilling a particular register. I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of instructions emitted, and its revision history shows the 'correct' count being edited many times. llvm-svn: 128845	2011-04-04 21:00:03 +00:00
Jakob Stoklund Olesen	8296e30627	Disable the PowerPC/Atomics-64 test. The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is wrong, and I don't know how to fix it. It seems to be using the correct register classes for pointers, but it inserts all 32-bit instructions. llvm-svn: 128835	2011-04-04 17:57:26 +00:00
Jakob Stoklund Olesen	218661346a	Fix PowerPC tests to be register allocator independent. llvm-svn: 128827	2011-04-04 17:07:03 +00:00
Che-Liang Chiou	e34b271718	ptx: support setp's 4-operand format llvm-svn: 128767	2011-04-02 08:51:39 +00:00
Cameron Zwarich	6fe5c29430	Do some peephole optimizations to remove pointless VMOVs from Neon to integer registers that arise from argument shuffling with the soft float ABI. These instructions are particularly slow on Cortex A8. This fixes one half of <rdar://problem/8674845>. llvm-svn: 128759	2011-04-02 02:40:43 +00:00
Jim Grosbach	360c369967	LDRD/STRD instructions should print both Rt and Rt2 in the asm string. llvm-svn: 128736	2011-04-01 20:26:57 +00:00
Akira Hatanaka	93f898f643	Add code for analyzing FP branches. Clean up branch Analysis functions. llvm-svn: 128718	2011-04-01 17:39:08 +00:00
Evan Cheng	a6a992a662	Add test case. llvm-svn: 128707	2011-04-01 06:27:25 +00:00
Evan Cheng	0f86d6de50	FileCheck'ify test. llvm-svn: 128706	2011-04-01 03:36:33 +00:00
Jakob Stoklund Olesen	100f53fd25	Fix Thumb and Thumb2 tests to be register allocator independent. llvm-svn: 128690	2011-03-31 23:31:50 +00:00
Jakob Stoklund Olesen	0709342652	Provide a legal pointer register class when targeting thumb1. The LocalStackSlotAllocation pass was creating illegal registers. llvm-svn: 128687	2011-03-31 23:02:15 +00:00
Jakob Stoklund Olesen	903baeac27	Fix SystemZ tests llvm-svn: 128686	2011-03-31 23:02:12 +00:00
Jakob Stoklund Olesen	0888bcf542	Fix ARM tests to be register allocator independent. llvm-svn: 128680	2011-03-31 22:14:03 +00:00
Evan Cheng	38bf5adcea	Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier accumulator forwarding: vadd d3, d0, d1 vmul d3, d3, d2 => vmul d3, d0, d2 vmla d3, d1, d2 llvm-svn: 128665	2011-03-31 19:38:48 +00:00
Jakob Stoklund Olesen	f4c9754d5c	Fix Mips, Sparc, and XCore tests that were dependent on register allocation. Add an extra run with -regalloc=basic to keep them honest. llvm-svn: 128654	2011-03-31 18:42:43 +00:00
Akira Hatanaka	a535270d91	Added support for FP conditional move instructions and fixed bugs in handling of FP comparisons. llvm-svn: 128650	2011-03-31 18:26:17 +00:00
Jakob Stoklund Olesen	e6e6750670	Don't completely eliminate identity copies that also modify super register liveness. Turn them into noop KILL instructions instead. This lets the scavenger know when super-registers are killed and defined. llvm-svn: 128645	2011-03-31 17:55:25 +00:00
Jakob Stoklund Olesen	9a78835414	Mark all uses as <undef> when joining a copy. This way, shrinkToUses() will ignore the instruction that is about to be deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like. Fix a misunderstanding in MachineVerifier about <def,undef> operands. The <undef> flag is valid on def operands where it has the same meaning as <undef> on a use operand. It only applies to sub-register defines which also read the full register. llvm-svn: 128642	2011-03-31 17:23:25 +00:00
Richard Osborne	9a827b30ab	Add XCore intrinsics for initializing / starting / synchronizing threads. llvm-svn: 128633	2011-03-31 15:13:13 +00:00
Jakob Stoklund Olesen	ae044c06bf	Pick a conservative register class when creating a small live range for remat. The rematerialized instruction may require a more constrained register class than the register being spilled. In the test case, the spilled register has been inflated to the DPR register class, but we are rematerializing a load of the ssub_0 sub-register which only exists for DPR_VFP2 registers. The register class is reinflated after spilling, so the conservative choice is only temporary. llvm-svn: 128610	2011-03-31 03:54:44 +00:00
Evan Cheng	ee9d45dd55	Don't try to create zero-sized stack objects. llvm-svn: 128586	2011-03-30 23:44:13 +00:00
Cameron Zwarich	53dd03d537	Add a ARM-specific SD node for VBSL so that forms with a constant first operand can be recognized. This fixes <rdar://problem/9183078>. llvm-svn: 128584	2011-03-30 23:01:21 +00:00
Evan Cheng	18381b4257	Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends was lowering them to sext / uxt + mul instructions. Unfortunately the optimization passes may hoist the extensions out of the loop and separate them. When that happens, the long multiplication instructions can be broken into several scalar instructions, causing significant performance issue. Note the vmla and vmls intrinsics are not added back. Frontend will codegen them as intrinsics vmull* + add / sub. Also note the isel optimizations for catching mul + sext / zext are not changed either. First part of rdar://8832507, rdar://9203134 llvm-svn: 128502	2011-03-29 23:06:19 +00:00
Cameron Zwarich	143f9aea2b	Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes <rdar://problem/8875309> and <rdar://problem/9057191>. llvm-svn: 128492	2011-03-29 21:41:55 +00:00
Rafael Espindola	6b2fac21ca	Reduce test case. llvm-svn: 128445	2011-03-29 02:18:54 +00:00
Evan Cheng	e2086e740f	Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during isel lowering to fold the zero-extend's and take advantage of no-stall back to back vmul + vmla: vmull q0, d4, d6 vmlal q0, d5, d6 is faster than vaddl q0, d4, d5 vmovl q1, d6 vmul q0, q0, q1 This allows us to vmull + vmlal for: f = vmull_u8( vget_high_u8(s), c); f = vmlal_u8(f, vget_low_u8(s), c); rdar://9197392 llvm-svn: 128444	2011-03-29 01:56:09 +00:00
Bill Wendling	96f962fdff	In some cases, the "fail BB dominator" may be null after the BB was split (and becomes reachable when before it wasn't). Check to make sure that it's not null before trying to use it. llvm-svn: 128434	2011-03-28 23:02:18 +00:00
Jakob Stoklund Olesen	9a624fa993	Collect and coalesce DBG_VALUE instructions before emitting the function. Correctly terminate the range of register DBG_VALUEs when the register is clobbered or when the basic block ends. The code is now ready to deal with variables that are sometimes in a register and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack slot'. llvm-svn: 128327	2011-03-26 02:19:36 +00:00
Eric Christopher	d553096688	Fix the bfi handling for or (and a mask) (and b mask). We need the two masks to match inversely for the code as is to work. For the example given we actually want: bfi r0, r2, #1, #1 not #0, however, given the way the pattern is written it's not possible at the moment. Fixes rdar://9177502 llvm-svn: 128320	2011-03-26 01:21:03 +00:00
Jakob Stoklund Olesen	1886a4c823	Emit less labels for debug info and stop emitting .loc directives for DBG_VALUEs. The .dot directives don't need labels, that is a leftover from when we created line number info manually. Instructions following a DBG_VALUE can share its label since the DBG_VALUE doesn't produce any code. llvm-svn: 128284	2011-03-25 17:20:59 +00:00
Devang Patel	71536de752	Move test in x86 specific area. llvm-svn: 128245	2011-03-24 22:39:09 +00:00
Devang Patel	e01b75cb89	Keep track of directory namd and fIx regression caused by Rafael's patch r119613. A better approach would be to move source id handling inside MC. llvm-svn: 128233	2011-03-24 20:30:50 +00:00
NAKAMURA Takumi	521eb7c11e	Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets. FIXME: Some cleanups would be needed. llvm-svn: 128206	2011-03-24 07:07:00 +00:00
Cameron Zwarich	4649f17db1	Do early taildup of ret in CodeGenPrepare for potential tail calls that have a void return type. This fixes PR9487. llvm-svn: 128197	2011-03-24 04:52:10 +00:00
Devang Patel	abc77347a7	Enable GlobalMerge on darwin. llvm-svn: 128183	2011-03-23 23:34:19 +00:00
Andrew Trick	4ab9a16569	Revert r128175. I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix. llvm-svn: 128181	2011-03-23 23:11:02 +00:00
Evan Cheng	425489d397	Cmp peephole optimization isn't always safe for signed arithmetics. int tries = INT_MAX; while (tries > 0) { tries--; } The check should be: subs r4, #1 cmp r4, #0 bgt LBB0_1 The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop canonicalization apparently does in this case). cmp #0 would have cleared it while not changing the N and Z bits. Since BGT is dependent on the V bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0. rdar://9172742 llvm-svn: 128179	2011-03-23 22:52:04 +00:00
Eli Friedman	4c192305bf	PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND. Also cleaning up some duplicated code while I'm here. llvm-svn: 128176	2011-03-23 22:18:48 +00:00
Andrew Trick	4046a0de91	Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS. (target-specific branchless method for double-width relational comparisons on x86) llvm-svn: 128175	2011-03-23 22:16:02 +00:00
Jakob Stoklund Olesen	ec0ac3ca40	Reapply r128045 and r128051 with fixes. This will extend the ranges of debug info variables in registers until they are clobbered. Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on the stack with DBG_VALUE instructions referring to variables in the frame pointer. This fixes the gdb test-suite failure. Fix 2: Don't trace through copies to physical registers setting up call arguments. These registers are call clobbered, and the source register is more likely to be a callee-saved register that can be extended through the call instruction. llvm-svn: 128114	2011-03-22 22:33:08 +00:00
Andrew Trick	b0f98bb5e9	Revert r128045 and r128051, debug info enhancements. Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem. llvm-svn: 128097	2011-03-22 19:18:42 +00:00
Che-Liang Chiou	7413080cea	ptx: add analyze/insert/remove branch llvm-svn: 128084	2011-03-22 14:12:00 +00:00
Jakob Stoklund Olesen	9c057ee440	Dont emit 'DBG_VALUE %noreg, ...' to terminate user variable ranges. These ranges get completely jumbled by the post-ra scheduler, and it is not really reasonable to expect it to make sense of them. Instead, teach DwarfDebug to notice when user variables in registers are clobbered, and terminate the ranges there. llvm-svn: 128045	2011-03-22 00:21:41 +00:00
Dan Gohman	c1783b31a4	Fix fast-isel address mode folding to avoid folding instructions outside of the current basic block. This fixes PR9500, rdar://9156159. llvm-svn: 128041	2011-03-22 00:04:35 +00:00
Rafael Espindola	1557fd6d39	Write the section table and the section data in the same order that gun as does. This makes it a lot easier to compare the output of both as the addresses are now a lot closer. llvm-svn: 127972	2011-03-20 18:44:20 +00:00
Daniel Dunbar	327cd36f74	Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR", it broke a lot of things. llvm-svn: 127954	2011-03-19 21:47:14 +00:00
Evan Cheng	824a711305	SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953	2011-03-19 17:17:39 +00:00
Nadav Rotem	e7a101ccab	Add support for legalizing UINT_TO_FP of vectors on platforms which do not have native support for this operation (such as X86). The legalized code uses two vector INT_TO_FP operations and is faster than scalarizing. llvm-svn: 127951	2011-03-19 13:09:10 +00:00
Andrew Trick	e7537a0187	FileCheckize a test. (one-by-one until valgrind is happy) llvm-svn: 127925	2011-03-19 00:41:39 +00:00
Evan Cheng	dc1d626a3d	Match a few more obvious patterns to revsh. rdar://9147637. llvm-svn: 127913	2011-03-18 21:52:42 +00:00
Eli Friedman	59721e3238	Revert r127852; it's apparently causing an ICE on mingw. llvm-svn: 127909	2011-03-18 21:12:29 +00:00
Justin Holewinski	0984dcc077	PTX: Fix various codegen issues - Emit mad instead of mad.rn for shader model 1.0 - Emit explicit mov.u32 instructions for reading global variables - (most PTX instructions cannot take global variable immediates) llvm-svn: 127895	2011-03-18 19:24:28 +00:00
Che-Liang Chiou	b1df0fe1cc	ptx: fix parameter order that is reversed llvm-svn: 127874	2011-03-18 11:23:56 +00:00
Che-Liang Chiou	ff9d938e33	ptx: add unconditional and conditional branch llvm-svn: 127873	2011-03-18 11:08:52 +00:00
Eli Friedman	1a916a3c0c	Add a target-specific branchless method for double-width relational comparisons on x86. Essentially, the way this works is that SUB+SBB sets the relevant flags the same way a double-width CMP would. This is a substantial improvement over the generic lowering in LLVM. The output is also shorter than the gcc-generated output; I haven't done any detailed benchmarking, though. llvm-svn: 127852	2011-03-18 02:34:11 +00:00
Benjamin Kramer	cfcea12fe2	BuildUDIV: If the divisor is even we can simplify the fixup of the multiplied value by introducing an early shift. This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into shrl $2, %edi imulq $613566757, %rdi, %rax shrq $32, %rax ret instead of movl %edi, %eax imulq $613566757, %rax, %rcx shrq $32, %rcx subl %ecx, %eax shrl %eax addl %ecx, %eax shrl $4, %eax on x86_64 llvm-svn: 127829	2011-03-17 20:39:14 +00:00
Richard Osborne	6120962d7d	Add XCore intrinsic for setpsc. llvm-svn: 127821	2011-03-17 18:42:05 +00:00
NAKAMURA Takumi	bf9ff6f63b	test/CodeGen/X86/h-registers-1.ll: Add explicit -mtriple=x86_64-linux. It does not need to be checked on x86_64-win32 (aka Win64). llvm-svn: 127800	2011-03-17 04:24:40 +00:00
NAKAMURA Takumi	5b6198dfb9	test/CodeGen/X86/constant-pool-remat-0.ll: FileCheck-ize and add explicit -mtriple=x86_64-linux. llvm-svn: 127775	2011-03-16 23:01:31 +00:00

... 4 5 6 7 8 ...

4742 Commits