Commit Graph

4742 Commits

Author SHA1 Message Date
Cameron Zwarich 3088e0a179 Make tTAILJMPr/tTAILJMPrND emit a tBX without a preceding MOV of PC to LR. This
fixes <rdar://problem/9495913>

llvm-svn: 132042
2011-05-25 04:45:27 +00:00
Rafael Espindola fc9bae6f8b Replace the -unwind-tables option with a per function flag. This is more
LTO friendly as we can now correctly merge files compiled with or without
-fasynchronous-unwind-tables.

llvm-svn: 132033
2011-05-25 03:44:17 +00:00
Akira Hatanaka aac670c1c8 Fix lowering of DYNAMIC_STACKALLOC nodes.
llvm-svn: 132030
2011-05-25 02:20:00 +00:00
Eric Christopher 1b724948e9 Implement the arm 'L' asm modifier.
Part of rdar://9119939

llvm-svn: 132024
2011-05-24 23:27:13 +00:00
Eric Christopher b1dda56ac2 Implement the immediate part of the 'B' modifier.
Part of rdar://9119939

llvm-svn: 132023
2011-05-24 23:15:43 +00:00
Eric Christopher 7617883ce3 Add support for the arm 'y' asm modifier.
Fixes part of rdar://9444657

llvm-svn: 132011
2011-05-24 22:10:34 +00:00
Akira Hatanaka 2486729839 Test case for r132003.
llvm-svn: 132005
2011-05-24 21:28:18 +00:00
Akira Hatanaka ce4037ebcf Fix test case.
llvm-svn: 131988
2011-05-24 19:37:15 +00:00
Akira Hatanaka 0f30561bae Revision 131986 test case.
llvm-svn: 131987
2011-05-24 19:29:37 +00:00
Rafael Espindola 0f33be1b87 Fix the defaults for .eh_frame. We were marking it as writable.
llvm-svn: 131951
2011-05-24 02:50:20 +00:00
Evan Cheng 88f9137fd7 - Teach SelectionDAG::isKnownNeverZero to return true (op x, c) when c is
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
  when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.

rdar://9490949

llvm-svn: 131948
2011-05-24 01:48:22 +00:00
Akira Hatanaka 6af5bd2537 Add pattern for double-to-integer conversion. Patch by Sasa Stankovic.
llvm-svn: 131927
2011-05-23 22:16:43 +00:00
Dan Gohman 6c4a319088 When checking for signed multiplication overflow, watch out for INT_MIN and -1.
This fixes PR9845.

llvm-svn: 131919
2011-05-23 21:07:39 +00:00
Akira Hatanaka f9e5750fc8 Change StackDirection from StackGrowsUp to StackGrowsDown.
The following improvements are accomplished as a result of applying this patch:
- Fixed frame objects' offsets (relative to either the virtual frame pointer or
  the stack pointer) are set before instruction selection is completed. There is
  no need to wait until Prologue/Epilogue Insertion is run to set them.
- Calculation of final offsets of fixed frame objects is straightforward. It is
  no longer necessary to assign negative offsets to fixed objects for incoming
  arguments in order to distinguish them from the others.
- Since a fixed object has its relative offset set during instruction
  selection, there is no need to conservatively set its alignment to 4.
- It is no longer necessary to reorder non-fixed frame objects in 
  MipsFrameLowering::adjustMipsStackFrame.

llvm-svn: 131915
2011-05-23 20:16:59 +00:00
Devang Patel 9987d3098b Test case for r131908.
llvm-svn: 131909
2011-05-23 17:49:29 +00:00
Devang Patel c4d9a84159 While replacing all uses of a SDValue with another value, do not forget to transfer SDDbgValue.
llvm-svn: 131907
2011-05-23 17:35:08 +00:00
Cameron Zwarich bc90690b24 Fix <rdar://problem/9476260> by having tail calls always generate 32-bit branches
in Darwin Thumb2 code. Tail calls are already disabled on Thumb1.

llvm-svn: 131894
2011-05-23 01:57:17 +00:00
Renato Golin 4cd5187f5b RTABI chapter 4.3.4 specifies __eabi_mem* calls. Specifically, __eabi_memset accepts parameters (ptr, size, value) in a different order than GNU's memset (ptr, value, size), therefore the special lowering in AAPCS mode. Implementation by Evzen Muller.
llvm-svn: 131868
2011-05-22 21:41:23 +00:00
Benjamin Kramer 2fd48f2730 Implement mulo x, 2 -> addo x, x in DAGCombiner.
llvm-svn: 131800
2011-05-21 18:31:55 +00:00
Benjamin Kramer e08fb1dce9 Merge and FileCheckize test cases.
llvm-svn: 131799
2011-05-21 18:31:48 +00:00
Eli Friedman 60afcc2a6f Add fast-isel support for byval calls on x86.
llvm-svn: 131764
2011-05-20 22:21:04 +00:00
Stuart Hastings 91f1d24736 Re-commit 131641 with fixes; de-pseudoize MOVSX16rr8 and friends.
rdar://problem/8614450

llvm-svn: 131746
2011-05-20 19:04:40 +00:00
Akira Hatanaka 43407fe633 Make $fp and $ra callee-saved registers and let PrologEpilogInserter handle
saving and restoring them.

llvm-svn: 131745
2011-05-20 18:39:33 +00:00
Chad Rosier ad00f3d0b9 Fixed regression due to commit 131709, which disables vararg tail call optimizations on Win64
llvm-svn: 131740
2011-05-20 17:49:39 +00:00
Benjamin Kramer 0bf26746d9 Rename the "sandybridge" subtarget to "corei7-avx", for GCC compatibility.
llvm-svn: 131730
2011-05-20 15:11:26 +00:00
Cameron Zwarich e0a52df6e5 Fix PR9960 by teaching SimpleRegisterCoalescing::AdjustCopiesBackFrom() to preserve
the phikill flag.

llvm-svn: 131717
2011-05-20 03:54:04 +00:00
Akira Hatanaka fe4f9d5977 Fix bug in which nodes that write to argument registers do not get glued with the JALR node. Patch by Sasa Stankovic
llvm-svn: 131714
2011-05-20 02:30:51 +00:00
Chad Rosier 552f8c4819 Don't attempt to tail call optimize for Win64.
llvm-svn: 131709
2011-05-20 00:59:28 +00:00
Evan Cheng e8d2e9eb35 Revert r131664 and fix it in instcombine instead. rdar://9467055
llvm-svn: 131708
2011-05-20 00:54:37 +00:00
Eli Friedman 22da799428 Add fast-isel support for zeroext and signext ret instructions on x86.
llvm-svn: 131689
2011-05-19 22:16:13 +00:00
Eric Christopher 4014e5e208 Oddly people want to use the 'r' constraint for fp constants on x86.
Fixes rdar://9218925
Fixes PR9601

llvm-svn: 131682
2011-05-19 21:33:47 +00:00
Eli Friedman e53a77d3a6 Fix up this test to use explicit triples (Win64 passes a different number of arguments in registers).
llvm-svn: 131676
2011-05-19 21:13:08 +00:00
Akira Hatanaka 9e6a8cca5d Align i64 arguments to 64 bit boundaries.
llvm-svn: 131668
2011-05-19 20:29:48 +00:00
Evan Cheng 2b9bd38678 crc32 with 64-bit output zeros upper 32-bits. rdar://9467055
llvm-svn: 131664
2011-05-19 18:57:12 +00:00
Stuart Hastings ae012a7525 Move test to Transforms/InstCombine.
llvm-svn: 131634
2011-05-19 05:53:22 +00:00
Tanya Lattner 1d11720ae4 Handle perfect shuffle case that generates a vrev for vectors of floats.
Add test case.

llvm-svn: 131582
2011-05-18 21:44:54 +00:00
Chad Rosier f4e832b14e Enables vararg functions that pass all arguments via registers to be optimized into tail-calls when possible.
llvm-svn: 131560
2011-05-18 19:59:50 +00:00
Stuart Hastings 51d696766c An imminent fix to the x86_64 byval logic will expose a flaw in the
x86_64 sibcall logic.  I've filed PR9943 for the sibcall problem, and
this patch alters the testcase to work around the flaw.  When PR9943
is fixed, this patch should be reverted.

llvm-svn: 131557
2011-05-18 19:19:17 +00:00
Eli Friedman 3f46c3e702 Force a triple on a couple of tests; we don't support fast-isel of ret on Win64.
llvm-svn: 131540
2011-05-18 17:16:37 +00:00
Stuart Hastings 38849debb5 Merge pmovzx test case into existing file.
llvm-svn: 131539
2011-05-18 17:02:04 +00:00
Justin Holewinski bbdcd17d44 PTX: add flag to disable mad/fma selection
Patch by Dan Bailey

llvm-svn: 131537
2011-05-18 15:42:23 +00:00
Tanya Lattner 48b182c3a4 In r131488 I misunderstood how VREV works. It splits the vector in half and splits each half. Therefore, the real problem was that we were using a VREV64 for a 4xi16, when we should have been using a VREV32.
Updated test case and reverted change to the PerfectShuffle Table.

llvm-svn: 131529
2011-05-18 06:42:21 +00:00
Eli Friedman 7d7ad8374f Make some of the fast-isel tests actually test fast-isel (and fix test failures).
llvm-svn: 131510
2011-05-18 00:00:10 +00:00
Stuart Hastings 5bd18b6638 X86 pmovsx/pmovzx ignore the upper half of their inputs.
rdar://problem/6945110

llvm-svn: 131493
2011-05-17 22:13:31 +00:00
Tanya Lattner c7e291b354 vrev is incorrectly defined in the perfect shuffle table. The ordering is backwards (should be 0x3210 versus 0x1032) which exposed a bug when doing a shuffle on a 4xi16. I've attached a test case.
llvm-svn: 131488
2011-05-17 20:48:40 +00:00
Galina Kistanova dd45646a47 Move test for appropriate directory.
llvm-svn: 131477
2011-05-17 19:06:43 +00:00
Eli Friedman 7b27942fe7 Add x86 fast-isel for calls returning first-class aggregates. rdar://9435872.
This is r131438 with a couple small fixes.

llvm-svn: 131474
2011-05-17 18:29:03 +00:00
Eli Friedman 7335e8a720 Back out r131444 and r131438; they're breaking nightly tests. I'll look into
it more tomorrow.

llvm-svn: 131451
2011-05-17 02:36:59 +00:00
Eli Friedman e5f7f26df0 Fix test.
llvm-svn: 131444
2011-05-17 00:39:14 +00:00
Evan Cheng 54459240e3 Add target triple so test doesn't fail on Windows machines.
llvm-svn: 131439
2011-05-17 00:15:58 +00:00
Eli Friedman 83ba150f3a Add x86 fast-isel for calls returning first-class aggregates. rdar://9435872.
llvm-svn: 131438
2011-05-17 00:13:47 +00:00
Jakob Stoklund Olesen 4edf17d91f Teach LiveInterval::isZeroLength about null SlotIndexes.
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.

This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic.  Prioritizing
register allocation according to spill weight can cause more registers to be
used.

llvm-svn: 131436
2011-05-16 23:50:05 +00:00
Eli Friedman d4a3609d30 Remove dead code. Fix associated test to use FileCheck.
llvm-svn: 131424
2011-05-16 21:28:22 +00:00
Eli Friedman a4d4a0162d Make fast-isel work correctly s/uadd.with.overflow intrinsics.
llvm-svn: 131420
2011-05-16 21:06:17 +00:00
Eli Friedman 9ac944774f Basic fast-isel of extractvalue. Not too helpful on its own, given the IR clang generates for cases like this, but it should become more useful soon.
llvm-svn: 131417
2011-05-16 20:27:46 +00:00
Rafael Espindola df9db7ed92 Don't produce a vmovntdq if we don't have AVX support.
llvm-svn: 131330
2011-05-14 00:30:01 +00:00
Rafael Espindola e53b7d1a11 Make codegen able to handle values of empty types. This is one way
to fix PR9900. I will keep it open until sable is able to comment on it.

llvm-svn: 131294
2011-05-13 15:18:06 +00:00
Stuart Hastings aa02c0847d Since I can't reproduce the failures from 131261, re-trying with a
simplified version.  <rdar://problem/9298790>

llvm-svn: 131274
2011-05-13 00:51:54 +00:00
Stuart Hastings 8d57d8ea64 Revert 131266 and 131261 due to buildbot complaints.
rdar://problem/9298790

llvm-svn: 131269
2011-05-13 00:15:17 +00:00
Stuart Hastings ef4940254f Tweak 131261 (thumb2-cbnz.ll) to generate the intended cbnz.
rdar://problem/9298790

llvm-svn: 131266
2011-05-13 00:10:03 +00:00
Stuart Hastings 89f1b47e3a Non-fast-isel followup to 129634; correctly handle branches controlled
by non-CMP expressions.  The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester.  Alas, the ARM assembly would be very difficult to check with
FileCheck.

The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790

llvm-svn: 131261
2011-05-12 23:36:41 +00:00
Galina Kistanova 9e56e51fab Correction. Use explicit target triple in the test.
llvm-svn: 131252
2011-05-12 21:55:34 +00:00
Evan Cheng 43054e6159 Re-enable branchfolding common code hoisting optimization. Fixed a liveness test bug and also taught it to update liveins.
llvm-svn: 131241
2011-05-12 20:30:01 +00:00
Stuart Hastings 114ecbd0f4 Move this test to CodeGen/Thumb. rdar://problem/9416774
llvm-svn: 131196
2011-05-11 19:41:28 +00:00
Devang Patel 34a6620748 Identify end of prologue (and beginning of function body) using DW_LNS_set_prologue_end line table opcode.
llvm-svn: 131194
2011-05-11 19:22:19 +00:00
Nadav Rotem 8a7beb80f0 Fixes a bug in the DAGCombiner. LoadSDNodes have two values (data, chain).
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.

llvm-svn: 131183
2011-05-11 14:40:50 +00:00
Nadav Rotem 8f971c27fb Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector.
llvm-svn: 131179
2011-05-11 08:12:09 +00:00
Rafael Espindola 2a09d65979 Revert 131172 as it is causing clang to miscompile itself. I will try
to provide a reduced testcase.

llvm-svn: 131176
2011-05-11 03:27:17 +00:00
Evan Cheng 05fc35e275 Add a late optimization to BranchFolding that hoist common instruction sequences
at the start of basic blocks to their common predecessor. It's actually quite
common (e.g. about 50 times in JM/lencod) and has shown to be a nice code size
benefit. e.g.

        pushq   %rax
        testl   %edi, %edi
        jne     LBB0_2
## BB#1:
        xorb    %al, %al
        popq    %rdx
        ret
LBB0_2:
        xorb    %al, %al
        callq   _foo
        popq    %rdx
        ret

=>

        pushq   %rax
        xorb    %al, %al
        testl   %edi, %edi
        je      LBB0_2
## BB#1:
        callq   _foo
LBB0_2:
        popq    %rdx
        ret

rdar://9145558

llvm-svn: 131172
2011-05-11 01:03:01 +00:00
Rafael Espindola 19c1a56287 Produce a __debug_frame section on darwin ARM when appropriate.
llvm-svn: 131151
2011-05-10 21:04:45 +00:00
Justin Holewinski 3c0447259c PTX: add test cases for cvt, fneg, and selp
Patch by Dan Bailey

llvm-svn: 131128
2011-05-10 14:53:13 +00:00
Benjamin Kramer d724a590e5 X86: Add a bunch of peeps for add and sub of SETB.
"b + ((a < b) ? 1 : 0)" compiles into
	cmpl	%esi, %edi
	adcl	$0, %esi
instead of
	cmpl	%esi, %edi
	sbbl	%eax, %eax
	andl	$1, %eax
	addl	%esi, %eax

This saves a register, a false dependency on %eax
(Intel's CPUs still don't ignore it) and it's shorter.

llvm-svn: 131070
2011-05-08 18:36:07 +00:00
Jakob Stoklund Olesen a5c889982a Emit a proper error message when register allocators run out of registers.
This can't be just an assertion, users can always write impossible inline
assembly. Such an assembly statement should be included in the error message.

llvm-svn: 131024
2011-05-06 21:58:30 +00:00
Justin Holewinski 11d70b6b32 PTX: add PTX 2.3 language target
Patch by Wei-Ren Chen

llvm-svn: 130980
2011-05-06 11:40:36 +00:00
Eli Friedman 5401962643 Re-revert r130877; it's apparently causing a regression on 197.parser,
possibly related to cbnz formation.

llvm-svn: 130977
2011-05-06 05:23:07 +00:00
Rafael Espindola a4982bddf3 Don't produce a __debug_frame.
I tested both gdb on a bootstrapped clang and and the gdb testsuite on OS X (snow leopard)
and both are happy using __eh_frame.

llvm-svn: 130937
2011-05-05 18:43:39 +00:00
Eli Friedman 441a01a2b8 Avoid extra vreg copies for arguments passed in registers. Specifically, this can make MachineCSE more effective in some cases (especially in small functions). PR8361 / part of rdar://problem/8259436 .
llvm-svn: 130928
2011-05-05 16:53:34 +00:00
Jakob Stoklund Olesen 17d4f9bbcc Prepare remaining tests for -join-physreg going away.
llvm-svn: 130893
2011-05-04 23:54:59 +00:00
Jakob Stoklund Olesen 369bddf5ad Fix a batch of x86 tests to be coalescer independent.
Most of these tests require a single mov instruction that can come either before
or after a 2-addr instruction. -join-physregs changes the behavior, but the
results are equivalent.

llvm-svn: 130891
2011-05-04 23:54:51 +00:00
Dan Gohman dd550305e6 Give this test an explicit register allocator, so that it can work even if
the default register allocator is changed.

llvm-svn: 130883
2011-05-04 23:14:02 +00:00
Bill Wendling 2a40131f6b SjLj EH could produce a machine basic block that legitimately has more than one
landing pad as its successor.

SjLj exception handling jumps to the correct landing pad via a switch statement
that's generated right before code-gen. Loosen the constraint in the machine
instruction verifier to allow for this. Note, this isn't the most rigorous check
since we cannot determine where that switch statement came from. But it's
marginally better than turning this check off when SjLj exceptions are used.
<rdar://problem/9187612>

llvm-svn: 130881
2011-05-04 22:54:05 +00:00
Eli Friedman 0fe4608af2 Re-commit r130862 with a minor change to avoid an iterator running off the edge in some cases.
Original message:

Teach MachineCSE how to do simple cross-block CSE involving physregs.  This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .

llvm-svn: 130877
2011-05-04 22:10:36 +00:00
Galina Kistanova e53ae508ec This test fails on ARM. The test shouldn't explicitly specify alignment (and alignment 4 is wrong) and requires hard-float.
llvm-svn: 130875
2011-05-04 21:57:44 +00:00
Eli Friedman 3bd79ba856 Back out r130862; it appears to be breaking bootstrap.
llvm-svn: 130867
2011-05-04 20:48:42 +00:00
Eli Friedman a16fc2fec0 Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130862
2011-05-04 19:54:24 +00:00
Jakob Stoklund Olesen 28a93a49bb Fix more register and coalescing dependencies.
llvm-svn: 130859
2011-05-04 19:02:11 +00:00
Jakob Stoklund Olesen d7fd7bfc31 Explicitly request physreg coalesing for a bunch of Thumb2 unit tests.
These tests all follow the same pattern:

	mov	r2, r0
	movs	r0, #0
	$CMP	r2, r1
	it	eq
	moveq	r0, #1
	bx	lr

The first 'mov' can be eliminated by rematerializing 'movs r0, #0' below the
test instruction:

	$CMP	r0, r1
	mov.w	r0, #0
	it	eq
	moveq	r0, #1
	bx	lr

So far, only physreg coalescing can do that. The register allocators won't yet
split live ranges just to eliminate copies. They can learn, but this particular
problem is not likely to show up in real code. It only appears because r0 is
used for both the function argument and return value.

llvm-svn: 130858
2011-05-04 19:02:07 +00:00
Jakob Stoklund Olesen e7528c45ea FileCheckize and break dependence on coalescing order.
llvm-svn: 130856
2011-05-04 19:02:01 +00:00
Jakob Stoklund Olesen 067ba3c23c Explicitly request -join-physregs for some tests that depend on it.
llvm-svn: 130855
2011-05-04 19:01:59 +00:00
Devang Patel 39ecf816c5 Do not emit location expression size twice.
llvm-svn: 130854
2011-05-04 19:00:57 +00:00
Akira Hatanaka 3bace5d223 Remove LLVM IR metadata in test case committed in r130847.
llvm-svn: 130849
2011-05-04 18:28:36 +00:00
Akira Hatanaka 23e8ecf125 Prevent instructions using $gp from being placed between a jalr and the instruction that restores the clobbered $gp.
llvm-svn: 130847
2011-05-04 17:54:27 +00:00
Jakob Stoklund Olesen f1b401800a Don't depend on the physreg coalescing order.
llvm-svn: 130818
2011-05-04 01:01:47 +00:00
Jakob Stoklund Olesen 5b5abb4ea1 Don't run this test through -regalloc=basic.
The basic allocator is really bad about hinting, so it doesn't eliminate all
copies when physreg joining is disabled.

llvm-svn: 130817
2011-05-04 01:01:44 +00:00
Jakob Stoklund Olesen d3b2f44c9d Fix register-dependent XCore tests
llvm-svn: 130816
2011-05-04 01:01:41 +00:00
Jakob Stoklund Olesen 7f7fc82141 Fix register-dependent test in MSP430.
llvm-svn: 130815
2011-05-04 01:01:39 +00:00
Jakob Stoklund Olesen 51b35f7bb1 Fix a bunch of ARM tests to be register allocation independent.
llvm-svn: 130800
2011-05-03 22:31:21 +00:00
Bill Wendling db0996c822 Replace the "movnt" intrinsics with a native store + nontemporal metadata bit.
<rdar://problem/8460511>

llvm-svn: 130791
2011-05-03 21:11:17 +00:00
Evan Cheng 93b5cdc5ab Make the test less likely to fail with minor changes.
llvm-svn: 130778
2011-05-03 19:09:32 +00:00
Bob Wilson c5242b0e78 Remove test for iOS divmod function, since that is disabled for now.
llvm-svn: 130769
2011-05-03 17:54:49 +00:00
Bruno Cardoso Lopes 168c9005b5 Add a few ARM coprocessor intrinsics. Testcases included
llvm-svn: 130763
2011-05-03 17:29:22 +00:00
Dan Gohman 6136e94897 Add an unfolded offset field to LSR's Formula record. This is used to
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.

llvm-svn: 130743
2011-05-03 00:46:49 +00:00
Rafael Espindola 5164e6e8b2 Add 130690 back.
llvm-svn: 130693
2011-05-02 15:58:16 +00:00
Rafael Espindola a392475865 Revert while I debug the tests that use march but not mtriple.
llvm-svn: 130691
2011-05-02 15:42:31 +00:00
Rafael Espindola c2aad4f2a3 Move ppc OS X to cfi too. I am building it on an old ppc mini, but it will take some time.
llvm-svn: 130690
2011-05-02 15:00:52 +00:00
Rafael Espindola fc8223670a Add r130623 back now that ELF has been fixed to work with -fno-dwarf2-cfi-asm.
llvm-svn: 130658
2011-05-01 15:44:13 +00:00
Rafael Espindola 750cb61553 GCC uses a different encoding of pointers in the FDE when using
-fno-dwarf2-cfi-asm. Implement the same behavior.

llvm-svn: 130637
2011-05-01 04:49:54 +00:00
Rafael Espindola b7c2286055 Revert the previous patch while I figure out how to make llvm-gcc
less agressive about disabling cfi on linux :-(

llvm-svn: 130626
2011-04-30 23:03:44 +00:00
Rafael Espindola 5265bc483e Enable CFI on OS X.
Currently the output should be almost identical to the one produced by CodeGen
to make the transition easier.

The only two differences I know of are:

* Some files get an extra advance loc of size 0. This will be fixed when
relaxations are enabled.
* The optimization of declaring an EH symbol as an external variable is not
implemented. This is a subset of adding the nounwind attribute, so we if really
this at -O0 we should probably do it at the IL level.

llvm-svn: 130623
2011-04-30 22:29:54 +00:00
Jakob Stoklund Olesen f5eaa8dc62 Allow folded spills in test.
llvm-svn: 130599
2011-04-30 08:00:50 +00:00
Jakob Stoklund Olesen edfabc9aad Weekly fix of register allocation dependent unit tests.
llvm-svn: 130567
2011-04-30 01:37:52 +00:00
Eli Friedman 4105ed1523 Make FastEmit_ri_ try a bit harder to succeed for supported operations; FastEmit_i can fail for non-Thumb2 ARM. Makes ARMSimplifyAddress work correctly, and reduces the number of fast-isel bailouts on non-Thumb ARM.
llvm-svn: 130560
2011-04-29 23:34:52 +00:00
Eli Friedman 328bad02fa Switch to ImmLeaf (which can be used by FastISel) for a few more common ARM/Thumb2 patterns.
llvm-svn: 130552
2011-04-29 22:48:03 +00:00
Eli Friedman dd937843d3 Fix run-line, again. :(
llvm-svn: 130540
2011-04-29 21:33:03 +00:00
Eli Friedman 86caced370 Re-committing r130454, which does not in fact break anything.
Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130539
2011-04-29 21:22:56 +00:00
Eric Christopher 8d46b47787 Add trunc->branch support, this won't help with clang's i8->i1 truncations
for bools, but is a start.

llvm-svn: 130534
2011-04-29 20:02:39 +00:00
Rafael Espindola 697edc89a5 Change DwarfCFIException's member variables to track what it actually
emmits: .cfi_personality, .cfi_lsda and the moves.

llvm-svn: 130503
2011-04-29 14:48:51 +00:00
Andrew Trick e794e17524 Teach Thumb2 isel to fold and->rotr ==> ROR.
Generalization of Nate Begeman's patch!

llvm-svn: 130502
2011-04-29 14:18:15 +00:00
Andrew Trick 65266ed4d7 Combine thumb2-ror tests.
llvm-svn: 130498
2011-04-29 14:02:41 +00:00
Eli Friedman 517728b1ae Revert r130454; apparently this doesn't actually work.
llvm-svn: 130462
2011-04-28 23:55:14 +00:00
Eli Friedman 37b9ede969 Fix runline.
llvm-svn: 130455
2011-04-28 23:12:24 +00:00
Eli Friedman e4ecd42926 Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130454
2011-04-28 23:03:25 +00:00
Eli Friedman 7cd5101ad3 fast-isel sret calls, try 2. We actually do need to do something on x86-32. rdar://problem/9303592 .
llvm-svn: 130429
2011-04-28 20:19:12 +00:00
Eli Friedman 3cf6d4032a Actually revert r130348 correctly.
llvm-svn: 130418
2011-04-28 18:20:24 +00:00
Eli Friedman d5a80ca3c8 Revert r130348; causing buildbot issues on x86-32.
llvm-svn: 130412
2011-04-28 18:06:10 +00:00
Devang Patel 3e021533cd Teach dwarf writer to handle complex address expression for .debug_loc entries.
This fixes clang generated blocks' variables' debug info.
Radar 9279956.

llvm-svn: 130373
2011-04-28 02:22:40 +00:00
Eli Friedman 33c133919a Fix a silly mistake in r130338.
llvm-svn: 130360
2011-04-28 00:42:03 +00:00
Justin Holewinski 18e6ac83ea PTX: support for bitwise operations on predicates
- selection of bitwise preds (AND, OR, XOR)
- new bitwise.ll test

Patch by Dan Bailey

llvm-svn: 130353
2011-04-28 00:19:51 +00:00
Eli Friedman 8bd572fc58 fast-isel sret. We actually don't need to do anything special on x86. :) rdar://problem/9303592 .
llvm-svn: 130348
2011-04-27 23:58:52 +00:00
Eli Friedman 406c471b69 Make the fast-isel code for literal 0.0 a bit shorter/faster, since 0.0 is common. rdar://problem/9303592 .
llvm-svn: 130338
2011-04-27 22:41:55 +00:00
Evan Cheng 9808d31b9e If converter was being too cute. It look for root BBs (which don't have
successors) and use inverse depth first search to traverse the BBs. However
that doesn't work when the CFG has infinite loops. Simply do a linear
traversal of all BBs work just fine.

rdar://9344645

llvm-svn: 130324
2011-04-27 19:32:43 +00:00
Jakob Stoklund Olesen 71d3b895ba Also add <imp-def> operands for defined and dead super-registers when rewriting.
We cannot rely on the <imp-def> operands added by LiveIntervals in all cases as
demonstrated by the test case.

llvm-svn: 130313
2011-04-27 17:42:31 +00:00
Eli Friedman 0eea0293d9 Fix an edge case involving branches in fast-isel on x86.
rdar://problem/9303306 .

llvm-svn: 130272
2011-04-27 01:34:27 +00:00
Evan Cheng 1355bbdd11 Be careful about scheduling nodes above previous calls. It increase usages of
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.

Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.

rdar://9329627

llvm-svn: 130245
2011-04-26 21:31:35 +00:00
Evan Cheng dbb86b8108 This test should be in MC. It breaks with changes to scheduling / register allocation so it's being removed.
llvm-svn: 130243
2011-04-26 21:09:04 +00:00
Benjamin Kramer 1d4c835089 Force a triple on this test to unbreak windows buildbots.
llvm-svn: 130226
2011-04-26 18:47:43 +00:00
Dan Gohman 7da91aee83 Fast-isel support for simple inline asms.
llvm-svn: 130205
2011-04-26 17:18:34 +00:00
Rafael Espindola 580eebaa20 Add test for PR9743.
llvm-svn: 130198
2011-04-26 14:17:42 +00:00
Chris Lattner 189ca1498f don't emit the symbol name twice for local bss and common
symbols.  For example, don't emit:
        .comm   _i,4,2                  ## @i
                                        ## @i

instead emit:
        .comm   _i,4,2                  ## @i

llvm-svn: 130192
2011-04-26 06:14:13 +00:00
Eric Christopher 238a21f2d5 Make this test disable fast isel as it's not needed.
llvm-svn: 130165
2011-04-25 22:39:46 +00:00
Akira Hatanaka 0e7ee666b7 Lower BlockAddress node when relocation-model is static.
llvm-svn: 130131
2011-04-25 17:10:45 +00:00
Devang Patel 734f2218ac A dbg.declare may not be in entry block, even if it is referring to an incoming argument. However, It is appropriate to emit DBG_VALUE referring to this incoming argument in entry block in MachineFunction.
llvm-svn: 130129
2011-04-25 16:33:52 +00:00
Benjamin Kramer ba446cc12a Make tests more useful.
lit needs a linter ...

llvm-svn: 130126
2011-04-25 10:12:01 +00:00
Andrew Trick 76dca78cb4 Accidental function name mangling.
llvm-svn: 130050
2011-04-23 04:08:15 +00:00
Andrew Trick 0ed5778a1e Thumb2 and ARM add/subtract with carry fixes.
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.

Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.

Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).

llvm-svn: 130048
2011-04-23 03:55:32 +00:00
Andrew Trick 1a1f8d4640 whitespace
llvm-svn: 130046
2011-04-23 03:24:11 +00:00
NAKAMURA Takumi 576273cf56 test/CodeGen/X86/shrink-compare.ll: Relax expressions for Win64.
llvm-svn: 130039
2011-04-23 00:15:45 +00:00
Chris Lattner 6d277517d1 Recommit the fix for rdar://9289512 with a couple tweaks to
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
   would cause an assert.
2. The load may not be used by the current chain of instructions,
   and we could move it past a side-effecting instruction. Change
   how we process uses to define the problem away.

llvm-svn: 130018
2011-04-22 21:59:37 +00:00
Benjamin Kramer 341c11da3b DAGCombine: fold "(zext x) == C" into "x == (trunc C)" if the trunc is lossless.
On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
  movzbl	(%rdi), %eax
  cmpl	$47, %eax
->
  cmpb	$47, (%rdi)

This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)

llvm-svn: 130005
2011-04-22 18:47:44 +00:00
Benjamin Kramer 4c81624735 X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039)
This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
uint64_t foo(uint64_t x) { return (x&1) << 42; }
which used to compile into bloated code:
	shlq	$42, %rdi               ## encoding: [0x48,0xc1,0xe7,0x2a]
	movabsq	$4398046511104, %rax    ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
	andq	%rdi, %rax              ## encoding: [0x48,0x21,0xf8]
	ret                             ## encoding: [0xc3]

with this patch we can fold the immediate into the and:
	andq	$1, %rdi                ## encoding: [0x48,0x83,0xe7,0x01]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	shlq	$42, %rax               ## encoding: [0x48,0xc1,0xe0,0x2a]
	ret                             ## encoding: [0xc3]

It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
that without making this code even more complicated. See the TODOs in the code.

llvm-svn: 129990
2011-04-22 15:30:40 +00:00
Evan Cheng c0d2004e3c In Thumb2 mode, lower frame indix references to:
add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.

rdar://9321541

llvm-svn: 129971
2011-04-22 01:42:52 +00:00
Devang Patel 94ad6ac13c Fix DWARF description of Q registers.
llvm-svn: 129952
2011-04-21 23:22:35 +00:00
Devang Patel 3712c14be9 Fix DWARF description of S registers.
llvm-svn: 129947
2011-04-21 22:48:26 +00:00
Devang Patel be22131c28 Test case for r129922
llvm-svn: 129934
2011-04-21 20:16:43 +00:00
Daniel Dunbar 6309828206 Revert r1296656, "Fix rdar://9289512 - not folding load into compare at -O0...",
which broke a couple GCC test suite tests at -O0.

llvm-svn: 129914
2011-04-21 16:14:46 +00:00
Che-Liang Chiou 14c48e5d66 ptx: fix parameter ordering
This patch depends on the prior fix r129908 that changes to use std::find,
rather than std::binary_search, on unordered array.

Patch by Dan Bailey

llvm-svn: 129909
2011-04-21 10:56:58 +00:00
Evan Cheng 5f1ba4cd2d Remove -use-divmod-libcall. Let targets opt in when they are available.
llvm-svn: 129884
2011-04-20 22:20:12 +00:00
Stuart Hastings 1b06a10d62 Un-XFAIL this test for ARM. <rdar://problem/7662569>
llvm-svn: 129875
2011-04-20 21:47:45 +00:00
Justin Holewinski 7d8895e767 PTX: Add intrinsics to list of built-in intrinsics, which allows them to be
used by Clang.  To help Clang integration, the PTX target has been split
     into two targets: ptx32 and ptx64, depending on the desired pointer size.

- Add GCCBuiltin class to all intrinsics
- Split PTX target into ptx32 and ptx64

llvm-svn: 129851
2011-04-20 15:37:17 +00:00
Eric Christopher bcaedb5ce0 Rewrite the expander for umulo/smulo to remember to sign extend the input
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).

Fixes rdar://9292577

llvm-svn: 129842
2011-04-20 01:19:45 +00:00
Daniel Dunbar ed3d5496dc llc: Eliminate a use of getDarwinMajorNumber().
- As before, there is a minor semantic change here (evidenced by the test
   change) for Darwin triples that have no version component. I debated changing
   the default behavior of isOSVersionLT, but decided it made more sense for
   triples to be explicit.

llvm-svn: 129805
2011-04-19 20:46:13 +00:00
Daniel Dunbar 4a7783b0c2 CodeGen: Eliminate a use of getDarwinMajorNumber().
- There is a minor semantic change here (evidenced by the test change) for
   Darwin triples that have no version component. I debated changing the default
   behavior of isOSVersionLT, but decided it made more sense for triples to be
   explicit.

llvm-svn: 129802
2011-04-19 20:32:39 +00:00
Bob Wilson 0858c3aaed This patch combines several changes from Evan Cheng for rdar://8659675.
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Enable these fp vmlx codegen changes for Cortex-A9.

llvm-svn: 129775
2011-04-19 18:11:57 +00:00
Bob Wilson d04a83f8f2 Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637.
llvm-svn: 129774
2011-04-19 18:11:52 +00:00
Bob Wilson a2881ee8a4 Avoid some 's' 16-bit instruction which partially update CPSR
(and add false dependency) when it isn't dependent on last CPSR defining
instruction. rdar://8928208

llvm-svn: 129773
2011-04-19 18:11:49 +00:00
Bob Wilson df612ba006 Avoid write-after-write issue hazards for Cortex-A9.
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.

This is currently disabled by default.  We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.

llvm-svn: 129772
2011-04-19 18:11:45 +00:00
Eli Friedman ee92a6b332 Add support for FastISel'ing varargs calls.
llvm-svn: 129765
2011-04-19 17:22:22 +00:00
Jakob Stoklund Olesen fb1249548f Tighten test case a bit.
Ideally, we would match an S-register to its containing D-register, but that
requires arithmetic (divide by 2).

llvm-svn: 129756
2011-04-19 06:14:45 +00:00
Chris Lattner 91328b317b Implement support for x86 fastisel of small fixed-sized memcpys, which are generated
en-mass for C++ PODs.  On my c++ test file, this cuts the fast isel rejects by 10x 
and shrinks the generated .s file by 5%

llvm-svn: 129755
2011-04-19 05:52:03 +00:00
Chris Lattner 5f4b783426 Implement support for fast isel of calls of i1 arguments, even though they are illegal,
when they are a truncate from something else.  This eliminates fully half of all the 
fastisel rejections on a test c++ file I'm working with, which should make a substantial
improvement for -O0 compile of c++ code.

This fixed rdar://9297003 - fast isel bails out on all functions taking bools

llvm-svn: 129752
2011-04-19 05:09:50 +00:00
Chris Lattner d7f7c93914 Handle i1/i8/i16 constant integer arguments to calls by prepromoting them.
Before we would bail out on i1 arguments all together, now we just bail on
non-constant ones.  Also, we used to emit extraneous code.  e.g. test12 was:

	movb	$0, %al
	movzbl	%al, %edi
	callq	_test12

and test13 was:
	movb	$0, %al
	xorl	%edi, %edi
	movb	%al, 7(%rsp)
	callq	_test13f

Now we get:

	movl	$0, %edi
	callq	_test12
and:
	movl	$0, %edi
	callq	_test13f

llvm-svn: 129751
2011-04-19 04:42:38 +00:00
Chris Lattner c59290a34c be layout aware, to produce:
testb	$1, %al
	je	LBB0_2
## BB#1:                                ## %if.then
	movb	$0, %al

instead of:

	testb	$1, %al
	jne	LBB0_1
	jmp	LBB0_2
LBB0_1:                                 ## %if.then
	movb	$0, %al

how 'bout that.

llvm-svn: 129749
2011-04-19 04:26:32 +00:00
Chris Lattner 2c8a4c3b1b fix rdar://9297006 - fast isel bails out on trunc to i1 -> bools cry,
a common cause of fast isel rejects on c++ code.

llvm-svn: 129748
2011-04-19 04:22:17 +00:00
Jakob Stoklund Olesen bf78618db6 Make tests register allocation independent again.
llvm-svn: 129739
2011-04-19 00:14:43 +00:00
Evan Cheng 4079133796 Do not lose mem_operands while lowering VLD / VST intrinsics.
llvm-svn: 129738
2011-04-19 00:04:03 +00:00
Eric Christopher c37aa0b26a Fix a bug where we were counting the alias sets as completely used
registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.

Fixes rdar://9207598

llvm-svn: 129711
2011-04-18 19:26:25 +00:00
Chris Lattner 48f75ad678 while we're at it, handle 'sdiv exact' of a power of 2 also,
this fixes a few rejects on c++ iterator loops.

llvm-svn: 129694
2011-04-18 07:00:40 +00:00
Chris Lattner 562d6e82bd fix rdar://9297011 - udiv by power of two causing fast-isel rejects
llvm-svn: 129693
2011-04-18 06:55:51 +00:00
Chris Lattner 07add49a4b Implement major new fastisel functionality: the matcher can now handle immediates with
value constraints on them (when defined as ImmLeaf's).  This is particularly important
for X86-64, where almost all reg/imm instructions take a i64immSExt32 immediate operand,
which has a value constraint.  Before this patch we ended up iseling the examples into
such amazing code as:

	movabsq	$7, %rax
	imulq	%rax, %rdi
	movq	%rdi, %rax
	ret

now we produce:

	imulq	$7, %rdi, %rax
	ret

This dramatically shrinks the generated code at -O0 on x86-64.

llvm-svn: 129691
2011-04-18 06:22:33 +00:00
Chris Lattner 353fda159d relax this test to just check that the lock prefix is encoded properly,
and to not rely on the register allocator's arbitrary operand choices.

llvm-svn: 129690
2011-04-18 06:15:35 +00:00
Chris Lattner b53ccb8e36 1. merge fast-isel-shift-imm.ll into fast-isel-x86-64.ll
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the 
   shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
   instead of FastEmit_ri to simplify code.

llvm-svn: 129666
2011-04-17 20:23:29 +00:00
Chris Lattner eb729d48ff fix an x86 fast isel issue where we'd completely give up on folding an address
when we have a global variable base an an index.  Instead, just give up on
folding the global variable.

Before we'd geenrate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	addq	%rdi, %rax
	movzbl	(%rax), %eax
	ret

now we generate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	movzbl	(%rax,%rdi), %eax
	ret

The difference is even more significant when there is a scale
involved.

This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64

llvm-svn: 129664
2011-04-17 17:47:38 +00:00
Chris Lattner 4832660b4d fix an oversight which caused us to compile the testcase (and other
less trivial things) into a dummy lea.  Before we generated:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	ret

now we produce:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	ret

This is part of rdar://9289558

llvm-svn: 129662
2011-04-17 17:12:08 +00:00
Chris Lattner 045c43855c Fix rdar://9289512 - not folding load into compare at -O0
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo.  Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:


cmpb    $0, 52(%rax)
je      LBB4_2

instead of:

movb    52(%rax), %cl
cmpb    $0, %cl
je      LBB4_2

This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and 
generating less code.

This was one of the biggest classes of missing load folding.  Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.

llvm-svn: 129656
2011-04-17 06:35:44 +00:00
Eli Friedman 55f7bf3289 Remove working entry from README.
llvm-svn: 129654
2011-04-17 02:36:27 +00:00
Chris Lattner fba7ca63cc fix rdar://9289583 - fast isel should handle non-canonical commutative binops
allowing us to fold the immediate into the 'and' in this case:

int test1(int i) {
  return 8&i;
}

llvm-svn: 129653
2011-04-17 01:16:47 +00:00
Eli Friedman 55b0acd624 PR9055: extend the fix to PR4050 (r70179) to apply to zext and anyext.
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.

llvm-svn: 129650
2011-04-16 23:25:34 +00:00
Evan Cheng b14ce09fca Fix divmod libcall lowering. Convert to {S|U}DIVREM first and then expand the node to a libcall. rdar://9280991
llvm-svn: 129633
2011-04-16 03:08:26 +00:00
Akira Hatanaka 2cb3aa30dd Re-enable test o32_cc_vararg.ll.
llvm-svn: 129616
2011-04-15 22:23:09 +00:00
Cameron Zwarich 9c65e4d69c Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate
a case involving EOR, so I only added a test for ORR.

llvm-svn: 129610
2011-04-15 21:24:38 +00:00
Rafael Espindola 9fef721830 Add this test back for Darwin.
llvm-svn: 129607
2011-04-15 21:06:27 +00:00
Cameron Zwarich 0829b3065a The AND instruction leaves the V flag unmodified, so it falls victim to the same
problem as all of the other instructions we fold with CMPs.

llvm-svn: 129602
2011-04-15 20:45:00 +00:00
Cameron Zwarich 93eae1571c Add missing register forms of instructions to the ARM CMP-folding code. This
fixes <rdar://problem/9287901>.

llvm-svn: 129599
2011-04-15 20:28:28 +00:00
Akira Hatanaka 279169771b Add pass that expands pseudo instructions into target instructions after register allocation. Define pseudos that get expanded into mtc1 or mfc1 instructions.
llvm-svn: 129594
2011-04-15 19:52:08 +00:00
Rafael Espindola a01cdb0e37 Add 129518 back with a fix for when we are producing eh just because of debug info.
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.

llvm-svn: 129571
2011-04-15 15:11:06 +00:00
NAKAMURA Takumi b5e3e9dd27 Revert r129518, "Change ELF systems to use CFI for producing the EH tables. This reduces the"
It broke several builds.

llvm-svn: 129557
2011-04-15 03:35:57 +00:00
Evan Cheng 12bb05b75b Fix another fcopysign lowering bug. If src is f64 and destination is f32, don't
forget to right shift the source by 32 first. rdar://9287902

llvm-svn: 129556
2011-04-15 01:31:00 +00:00
Michael J. Spencer 30088ba110 Add 3DNow! intrinsics.
llvm-svn: 129551
2011-04-15 00:32:41 +00:00
Evan Cheng 44887f9c7e Follow up on r127913. Fix Thumb revsh isel. rdar://9286766
llvm-svn: 129548
2011-04-14 23:27:44 +00:00
Rafael Espindola aa2a7cd828 Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.

llvm-svn: 129518
2011-04-14 15:18:53 +00:00
Andrew Trick bfbd972b1f In the pre-RA scheduler, maintain cmp+br proximity.
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.

The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.

Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump

Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion

llvm-svn: 129508
2011-04-14 05:15:06 +00:00
Bill Wendling 410ec4aad1 As Dan pointed out, movzbl, movsbl, and friends are nicer than their alias
(movzx/movsx) because they give more information. Revert that part of the patch.

llvm-svn: 129498
2011-04-14 01:46:37 +00:00
Bill Wendling 7e07d6fb69 Have the X86 back-end emit the alias instead of what's being aliased. In most
cases, it's much nicer and more informative reading the alias.

llvm-svn: 129497
2011-04-14 01:11:51 +00:00
Cameron Zwarich 415b5e8341 Fix a typo in an ARM-specific DAG combine. This fixes <rdar://problem/9278274>.
llvm-svn: 129468
2011-04-13 21:01:19 +00:00
Cameron Zwarich 9398197ef1 Fix a regression caused by r102515 where explicit alignment on globals is
ignored. There was a test to catch this, but it was just blindly updated in
a large change. This fixes another part of <rdar://problem/9275290>.

llvm-svn: 129466
2011-04-13 20:36:04 +00:00
Cameron Zwarich 70be27e913 Fix an obvious problem with an alignment computation. AsmPrinter actually does
the max itself, so it is not easy to write a test case for this, but I added a
test case that would fail if the code in AsmPrinter were removed.

llvm-svn: 129432
2011-04-13 09:02:43 +00:00
Cameron Zwarich cdf59f7016 If a global variable has a specified alignment that is less than the preferred
alignment for its type, use the minimum of the specified alignment and the ABI
alignment. This fixes <rdar://problem/9275290>.

llvm-svn: 129428
2011-04-13 06:03:16 +00:00
Andrew Trick b53a00d2cb Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.

Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129421
2011-04-13 00:38:32 +00:00
Bill Wendling b902f1dd88 Reapply r129401 with patch for clang.
llvm-svn: 129419
2011-04-13 00:36:11 +00:00
Eric Christopher 28f4c729f7 Temporarily revert r129408 to see if it brings the bots back.
llvm-svn: 129417
2011-04-13 00:20:59 +00:00
Eric Christopher d829f43c06 Fix a bug where we were counting the alias sets as completely used
registers for fast allocation.

Fixes rdar://9207598

llvm-svn: 129408
2011-04-12 23:23:14 +00:00
Bill Wendling dbfde42468 Revert r129401 for now. Clang is using the old way of doing things.
llvm-svn: 129403
2011-04-12 22:59:27 +00:00
Bill Wendling 47c24875a1 Remove the unaligned load intrinsics in favor of using native unaligned loads.
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.

First part of <rdar://problem/8460511>.

llvm-svn: 129401
2011-04-12 22:46:31 +00:00
Andrew Trick 1b60ad6644 Revert 129383. It causes some targets to hit a scheduler assert.
llvm-svn: 129385
2011-04-12 20:14:07 +00:00
Andrew Trick c5dd24a542 PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129383
2011-04-12 19:54:36 +00:00
Cameron Zwarich fbcd69b96a Split a store of a VMOVDRR into two integer stores to avoid mixing NEON and ARM
stores of arguments in the same cache line. This fixes the second half of
<rdar://problem/8674845>.

llvm-svn: 129345
2011-04-12 02:24:17 +00:00
Wesley Peck 1914c39bd4 Add scheduling information for the MBlaze backend.
llvm-svn: 129311
2011-04-11 22:31:52 +00:00
Evan Cheng ef42bea704 Look pass copies when determining whether hoisting would end up inserting more copies. rdar://9266679
llvm-svn: 129297
2011-04-11 21:09:18 +00:00
Chris Lattner 214f114aa7 look for the verboten argument slot access in any order, thanks to Frits
for pointing this out

llvm-svn: 129217
2011-04-09 17:00:34 +00:00
Chris Lattner af1bccec68 Fix a bug where RecursivelyDeleteTriviallyDeadInstructions could
delete the instruction pointed to by CGP's current instruction
iterator, leading to a crash on the testcase.  This fixes PR9578.

llvm-svn: 129200
2011-04-09 07:05:44 +00:00
Chris Lattner 418b1037b0 fix two completely broken tests, which were matching due to PR9629.
llvm-svn: 129195
2011-04-09 06:34:38 +00:00
Chris Lattner ea6afab4b0 remove a bunch of CHECK lines that aren't checking what
they thought they were, because alternation was expanding
wrong in {{}}'s.

llvm-svn: 129194
2011-04-09 06:31:06 +00:00
Chris Lattner 41c80e89f3 have dag combine zap "store undef", which can be formed during call lowering
with undef arguments.

llvm-svn: 129185
2011-04-09 02:32:02 +00:00
Chris Lattner 1c42a4d159 don't test for codegen of 'store undef'
llvm-svn: 129184
2011-04-09 02:31:26 +00:00
Evan Cheng 74d92c1924 Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time.
llvm-svn: 129152
2011-04-08 21:37:21 +00:00
Evan Cheng 9a3f2772f0 Add option to emit @llvm.trap as a function call instead of a trap instruction. rdar://9249183.
llvm-svn: 129107
2011-04-07 20:31:12 +00:00
Andrew Trick 2ad0b37318 Added a check in the preRA scheduler for potential interference on a
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.

Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing

llvm-svn: 129100
2011-04-07 19:54:57 +00:00
Akira Hatanaka d6f1c58914 Fix handling of functions with internal linkage.
llvm-svn: 129099
2011-04-07 19:51:44 +00:00
Tanya Lattner 266792a55a Prevent ARM DAG Combiner from doing an AND or OR combine on an illegal vector type (vectors of size 3). Also included test cases.
llvm-svn: 129074
2011-04-07 15:24:20 +00:00
Evan Cheng a7c7b54dde Change -arm-divmod-libcall to a target neutral option.
llvm-svn: 129045
2011-04-07 00:58:44 +00:00
Owen Anderson bdff1c997a Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB.
llvm-svn: 129038
2011-04-06 23:35:59 +00:00
Jakob Stoklund Olesen 1ec41e2bd9 These tests no longer require linear scan because reserved register coalescing is now universal.
llvm-svn: 128936
2011-04-05 21:40:41 +00:00
Jakob Stoklund Olesen 6aa0fbf4c0 Run LiveDebugVariables in RegAllocBasic and RegAllocGreedy.
llvm-svn: 128935
2011-04-05 21:40:37 +00:00
Jakob Stoklund Olesen e20fec7732 Fix one more batch of X86 tests to be register allocation dependent.
llvm-svn: 128919
2011-04-05 20:20:30 +00:00
Jakob Stoklund Olesen 18fd84c79a When dead code elimination removes all but one use, try to fold the single def into the remaining use.
Rematerialization can leave single-use loads behind that we might as well fold whenever possible.

llvm-svn: 128918
2011-04-05 20:20:26 +00:00
Johnny Chen 293875ef55 Fix test-llvm failures.
llvm-svn: 128906
2011-04-05 18:41:40 +00:00
Stuart Hastings 345094777f ARM doesn't support byval yet. XFAIL this test until it does.
llvm-svn: 128891
2011-04-05 17:16:21 +00:00
Jakob Stoklund Olesen 76ad3debab Ensure all defs referring to a virtual register are marked dead by addRegisterDead().
There can be multiple defs for a single virtual register when they are defining
sub-registers.

The missing <dead> flag was stopping the inline spiller from eliminating dead
code after rematerialization.

llvm-svn: 128888
2011-04-05 16:53:50 +00:00
Rafael Espindola 7dd4d6e2e8 Print visibility info for external variables.
llvm-svn: 128887
2011-04-05 15:51:32 +00:00
Eric Christopher f392a69ff7 Fix up testcase for previous commit.
llvm-svn: 128870
2011-04-05 00:56:01 +00:00
Jakob Stoklund Olesen bd09d45489 Fix register-dependent X86 tests.
llvm-svn: 128867
2011-04-05 00:32:44 +00:00
Jakob Stoklund Olesen 2e85396509 Allow coalescing with reserved physregs in certain cases:
When a virtual register has a single value that is defined as a copy of a
reserved register, permit that copy to be joined. These virtual register are
usually copies of the stack pointer:

  %vreg75<def> = COPY %ESP; GR32:%vreg75
  MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill>
  MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0
  MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0
  CALLpcrel32 ...

Coalescing these virtual registers early decreases register pressure.
Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after
register allocation was completed.

The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail
because it depends on linear scan spilling a particular register.

I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of
instructions emitted, and its revision history shows the 'correct' count being
edited many times.

llvm-svn: 128845
2011-04-04 21:00:03 +00:00
Jakob Stoklund Olesen 8296e30627 Disable the PowerPC/Atomics-64 test.
The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is
wrong, and I don't know how to fix it. It seems to be using the correct register
classes for pointers, but it inserts all 32-bit instructions.

llvm-svn: 128835
2011-04-04 17:57:26 +00:00
Jakob Stoklund Olesen 218661346a Fix PowerPC tests to be register allocator independent.
llvm-svn: 128827
2011-04-04 17:07:03 +00:00
Che-Liang Chiou e34b271718 ptx: support setp's 4-operand format
llvm-svn: 128767
2011-04-02 08:51:39 +00:00
Cameron Zwarich 6fe5c29430 Do some peephole optimizations to remove pointless VMOVs from Neon to integer
registers that arise from argument shuffling with the soft float ABI. These
instructions are particularly slow on Cortex A8. This fixes one half of
<rdar://problem/8674845>.

llvm-svn: 128759
2011-04-02 02:40:43 +00:00
Jim Grosbach 360c369967 LDRD/STRD instructions should print both Rt and Rt2 in the asm string.
llvm-svn: 128736
2011-04-01 20:26:57 +00:00
Akira Hatanaka 93f898f643 Add code for analyzing FP branches. Clean up branch Analysis functions.
llvm-svn: 128718
2011-04-01 17:39:08 +00:00
Evan Cheng a6a992a662 Add test case.
llvm-svn: 128707
2011-04-01 06:27:25 +00:00
Evan Cheng 0f86d6de50 FileCheck'ify test.
llvm-svn: 128706
2011-04-01 03:36:33 +00:00
Jakob Stoklund Olesen 100f53fd25 Fix Thumb and Thumb2 tests to be register allocator independent.
llvm-svn: 128690
2011-03-31 23:31:50 +00:00
Jakob Stoklund Olesen 0709342652 Provide a legal pointer register class when targeting thumb1.
The LocalStackSlotAllocation pass was creating illegal registers.

llvm-svn: 128687
2011-03-31 23:02:15 +00:00
Jakob Stoklund Olesen 903baeac27 Fix SystemZ tests
llvm-svn: 128686
2011-03-31 23:02:12 +00:00
Jakob Stoklund Olesen 0888bcf542 Fix ARM tests to be register allocator independent.
llvm-svn: 128680
2011-03-31 22:14:03 +00:00
Evan Cheng 38bf5adcea Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

llvm-svn: 128665
2011-03-31 19:38:48 +00:00
Jakob Stoklund Olesen f4c9754d5c Fix Mips, Sparc, and XCore tests that were dependent on register allocation.
Add an extra run with -regalloc=basic to keep them honest.

llvm-svn: 128654
2011-03-31 18:42:43 +00:00
Akira Hatanaka a535270d91 Added support for FP conditional move instructions and fixed bugs in handling of FP comparisons.
llvm-svn: 128650
2011-03-31 18:26:17 +00:00
Jakob Stoklund Olesen e6e6750670 Don't completely eliminate identity copies that also modify super register liveness.
Turn them into noop KILL instructions instead. This lets the scavenger know when
super-registers are killed and defined.

llvm-svn: 128645
2011-03-31 17:55:25 +00:00
Jakob Stoklund Olesen 9a78835414 Mark all uses as <undef> when joining a copy.
This way, shrinkToUses() will ignore the instruction that is about to be
deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like.

Fix a misunderstanding in MachineVerifier about <def,undef> operands. The
<undef> flag is valid on def operands where it has the same meaning as <undef>
on a use operand. It only applies to sub-register defines which also read the
full register.

llvm-svn: 128642
2011-03-31 17:23:25 +00:00
Richard Osborne 9a827b30ab Add XCore intrinsics for initializing / starting / synchronizing threads.
llvm-svn: 128633
2011-03-31 15:13:13 +00:00
Jakob Stoklund Olesen ae044c06bf Pick a conservative register class when creating a small live range for remat.
The rematerialized instruction may require a more constrained register class
than the register being spilled. In the test case, the spilled register has been
inflated to the DPR register class, but we are rematerializing a load of the
ssub_0 sub-register which only exists for DPR_VFP2 registers.

The register class is reinflated after spilling, so the conservative choice is
only temporary.

llvm-svn: 128610
2011-03-31 03:54:44 +00:00
Evan Cheng ee9d45dd55 Don't try to create zero-sized stack objects.
llvm-svn: 128586
2011-03-30 23:44:13 +00:00
Cameron Zwarich 53dd03d537 Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

llvm-svn: 128584
2011-03-30 23:01:21 +00:00
Evan Cheng 18381b4257 Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.

Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.

First part of rdar://8832507, rdar://9203134

llvm-svn: 128502
2011-03-29 23:06:19 +00:00
Cameron Zwarich 143f9aea2b Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes
<rdar://problem/8875309> and <rdar://problem/9057191>.

llvm-svn: 128492
2011-03-29 21:41:55 +00:00
Rafael Espindola 6b2fac21ca Reduce test case.
llvm-svn: 128445
2011-03-29 02:18:54 +00:00
Evan Cheng e2086e740f Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

llvm-svn: 128444
2011-03-29 01:56:09 +00:00
Bill Wendling 96f962fdff In some cases, the "fail BB dominator" may be null after the BB was split (and
becomes reachable when before it wasn't). Check to make sure that it's not null
before trying to use it.

llvm-svn: 128434
2011-03-28 23:02:18 +00:00
Jakob Stoklund Olesen 9a624fa993 Collect and coalesce DBG_VALUE instructions before emitting the function.
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.

The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.

llvm-svn: 128327
2011-03-26 02:19:36 +00:00
Eric Christopher d553096688 Fix the bfi handling for or (and a mask) (and b mask). We need the two
masks to match inversely for the code as is to work. For the example given
we actually want:

bfi r0, r2, #1, #1

not #0, however, given the way the pattern is written it's not possible
at the moment.

Fixes rdar://9177502

llvm-svn: 128320
2011-03-26 01:21:03 +00:00
Jakob Stoklund Olesen 1886a4c823 Emit less labels for debug info and stop emitting .loc directives for DBG_VALUEs.
The .dot directives don't need labels, that is a leftover from when we created
line number info manually.

Instructions following a DBG_VALUE can share its label since the DBG_VALUE
doesn't produce any code.

llvm-svn: 128284
2011-03-25 17:20:59 +00:00
Devang Patel 71536de752 Move test in x86 specific area.
llvm-svn: 128245
2011-03-24 22:39:09 +00:00
Devang Patel e01b75cb89 Keep track of directory namd and fIx regression caused by Rafael's patch r119613.
A better approach would be to move source id handling inside MC.

llvm-svn: 128233
2011-03-24 20:30:50 +00:00
NAKAMURA Takumi 521eb7c11e Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets.
FIXME: Some cleanups would be needed.
llvm-svn: 128206
2011-03-24 07:07:00 +00:00
Cameron Zwarich 4649f17db1 Do early taildup of ret in CodeGenPrepare for potential tail calls that have a
void return type. This fixes PR9487.

llvm-svn: 128197
2011-03-24 04:52:10 +00:00
Devang Patel abc77347a7 Enable GlobalMerge on darwin.
llvm-svn: 128183
2011-03-23 23:34:19 +00:00
Andrew Trick 4ab9a16569 Revert r128175.
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.

llvm-svn: 128181
2011-03-23 23:11:02 +00:00
Evan Cheng 425489d397 Cmp peephole optimization isn't always safe for signed arithmetics.
int tries = INT_MAX;    
while (tries > 0) {
      tries--;
}

The check should be:
        subs    r4, #1
        cmp     r4, #0
        bgt     LBB0_1

The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.

rdar://9172742

llvm-svn: 128179
2011-03-23 22:52:04 +00:00
Eli Friedman 4c192305bf PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND.
Also cleaning up some duplicated code while I'm here.

llvm-svn: 128176
2011-03-23 22:18:48 +00:00
Andrew Trick 4046a0de91 Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS.
(target-specific branchless method for double-width relational comparisons on x86)

llvm-svn: 128175
2011-03-23 22:16:02 +00:00
Jakob Stoklund Olesen ec0ac3ca40 Reapply r128045 and r128051 with fixes.
This will extend the ranges of debug info variables in registers until they are
clobbered.

Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on
the stack with DBG_VALUE instructions referring to variables in the frame
pointer. This fixes the gdb test-suite failure.

Fix 2: Don't trace through copies to physical registers setting up call
arguments. These registers are call clobbered, and the source register is more
likely to be a callee-saved register that can be extended through the call
instruction.

llvm-svn: 128114
2011-03-22 22:33:08 +00:00
Andrew Trick b0f98bb5e9 Revert r128045 and r128051, debug info enhancements.
Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem.

llvm-svn: 128097
2011-03-22 19:18:42 +00:00
Che-Liang Chiou 7413080cea ptx: add analyze/insert/remove branch
llvm-svn: 128084
2011-03-22 14:12:00 +00:00
Jakob Stoklund Olesen 9c057ee440 Dont emit 'DBG_VALUE %noreg, ...' to terminate user variable ranges.
These ranges get completely jumbled by the post-ra scheduler, and it is not
really reasonable to expect it to make sense of them.

Instead, teach DwarfDebug to notice when user variables in registers are
clobbered, and terminate the ranges there.

llvm-svn: 128045
2011-03-22 00:21:41 +00:00
Dan Gohman c1783b31a4 Fix fast-isel address mode folding to avoid folding instructions
outside of the current basic block. This fixes PR9500, rdar://9156159.

llvm-svn: 128041
2011-03-22 00:04:35 +00:00
Rafael Espindola 1557fd6d39 Write the section table and the section data in the same order that
gun as does. This makes it a lot easier to compare the output of both
as the addresses are now a lot closer.

llvm-svn: 127972
2011-03-20 18:44:20 +00:00
Daniel Dunbar 327cd36f74 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng 824a711305 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Nadav Rotem e7a101ccab Add support for legalizing UINT_TO_FP of vectors on platforms which do
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.

llvm-svn: 127951
2011-03-19 13:09:10 +00:00
Andrew Trick e7537a0187 FileCheckize a test.
(one-by-one until valgrind is happy)

llvm-svn: 127925
2011-03-19 00:41:39 +00:00
Evan Cheng dc1d626a3d Match a few more obvious patterns to revsh. rdar://9147637.
llvm-svn: 127913
2011-03-18 21:52:42 +00:00
Eli Friedman 59721e3238 Revert r127852; it's apparently causing an ICE on mingw.
llvm-svn: 127909
2011-03-18 21:12:29 +00:00
Justin Holewinski 0984dcc077 PTX: Fix various codegen issues
- Emit mad instead of mad.rn for shader model 1.0
- Emit explicit mov.u32 instructions for reading global variables
- (most PTX instructions cannot take global variable immediates)

llvm-svn: 127895
2011-03-18 19:24:28 +00:00
Che-Liang Chiou b1df0fe1cc ptx: fix parameter order that is reversed
llvm-svn: 127874
2011-03-18 11:23:56 +00:00
Che-Liang Chiou ff9d938e33 ptx: add unconditional and conditional branch
llvm-svn: 127873
2011-03-18 11:08:52 +00:00
Eli Friedman 1a916a3c0c Add a target-specific branchless method for double-width relational
comparisons on x86.  Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.

This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.

llvm-svn: 127852
2011-03-18 02:34:11 +00:00
Benjamin Kramer cfcea12fe2 BuildUDIV: If the divisor is even we can simplify the fixup of the multiplied value by introducing an early shift.
This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into
	shrl	$2, %edi
	imulq	$613566757, %rdi, %rax
	shrq	$32, %rax
	ret

instead of
	movl    %edi, %eax
	imulq   $613566757, %rax, %rcx
	shrq    $32, %rcx
	subl    %ecx, %eax
	shrl    %eax
	addl    %ecx, %eax
	shrl    $4, %eax

on x86_64

llvm-svn: 127829
2011-03-17 20:39:14 +00:00
Richard Osborne 6120962d7d Add XCore intrinsic for setpsc.
llvm-svn: 127821
2011-03-17 18:42:05 +00:00
NAKAMURA Takumi bf9ff6f63b test/CodeGen/X86/h-registers-1.ll: Add explicit -mtriple=x86_64-linux. It does not need to be checked on x86_64-win32 (aka Win64).
llvm-svn: 127800
2011-03-17 04:24:40 +00:00
NAKAMURA Takumi 5b6198dfb9 test/CodeGen/X86/constant-pool-remat-0.ll: FileCheck-ize and add explicit -mtriple=x86_64-linux.
llvm-svn: 127775
2011-03-16 23:01:31 +00:00