Commit Graph

2286 Commits

Author SHA1 Message Date
Dale Johannesen ab60ae3cf3 Disable these tests for now; it's not obvious why they fail on Linux.
llvm-svn: 115257
2010-10-01 00:59:21 +00:00
Dale Johannesen c6f17f7420 Make test not sensitive to register choice.
llvm-svn: 115250
2010-10-01 00:16:17 +00:00
Dale Johannesen dd224d2333 Massive rewrite of MMX:
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.

Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics. 

MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces.  Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.

The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.

llvm-svn: 115243
2010-09-30 23:57:10 +00:00
NAKAMURA Takumi bb995ae261 test/CodeGen/X86/sibcall.ll: Add explicit triplets and remove XFAIL: apple-darwin8.
llvm-svn: 115215
2010-09-30 22:02:06 +00:00
Jakob Stoklund Olesen eb12f49fb7 Try again to disable critical edge splitting in CodeGenPrepare.
The bug that broke i386 linux has been fixed in r115191.

llvm-svn: 115204
2010-09-30 20:51:52 +00:00
Jakob Stoklund Olesen 665aa6efcc When isel is emitting instructions for an x86 target without CMOV, the CFG is
edited during emission.

If the basic block ends in a switch that gets lowered to a jump table, any
phis at the default edge were getting updated wrong. The jump table data
structure keeps a pointer to the header blocks that wasn't getting updated
after the MBB is split.

This bug was exposed on 32-bit Linux when disabling critical edge splitting in
codegen prepare.

The fix is to uipdate stale MBB pointers whenever a block is split during
emission.

llvm-svn: 115191
2010-09-30 19:44:31 +00:00
Bill Wendling cc91601211 And remove r114997's test.
llvm-svn: 115003
2010-09-28 23:24:18 +00:00
Bill Wendling b0b2c57149 Revert r114997. It was causing a failure on darwin10-selfhost.
llvm-svn: 115002
2010-09-28 23:11:55 +00:00
Bill Wendling d848beb1e5 Fix a FIXME. _foo.eh symbols are currently always exported so that the linker
knows about them. This is not necessary on 10.6 and later.

llvm-svn: 114997
2010-09-28 22:36:56 +00:00
Jakob Stoklund Olesen 415a7a6fec Revert "Disable codegen prepare critical edge splitting. Machine instruction passes now"
This reverts revision 114633. It was breaking llvm-gcc-i386-linux-selfhost.

It seems there is a downstream bug that is exposed by
-cgp-critical-edge-splitting=0. When that bug is fixed, this patch can go back
in.

Note that the changes to tailcallfp2.ll are not reverted. They were good are
required.

llvm-svn: 114859
2010-09-27 18:43:48 +00:00
Evan Cheng 794aaa79e2 Disable codegen prepare critical edge splitting. Machine instruction passes now
break critical edges on demand.

llvm-svn: 114633
2010-09-23 06:55:34 +00:00
Owen Anderson 3231d13ddd A select between a constant and zero, when fed by a bit test, can be efficiently
lowered using a series of shifts.
Fixes <rdar://problem/8285015>.

llvm-svn: 114599
2010-09-22 22:58:22 +00:00
Cameron Esfahani bbb9287080 Fix PR8201: Update the code to call via X86::CALL64pcrel32 in the 64-bit case.
llvm-svn: 114597
2010-09-22 22:35:21 +00:00
Chris Lattner bd85725341 Fix an inconsistency in the x86 backend that led it to reject "calll foo" on
x86-32: 32-bit calls were named "call" not "calll".  64-bit calls were correctly
named "callq", so this only impacted x86-32.

This fixes rdar://8456370 - llvm-mc rejects 'calll'

This also exposes that mingw/64 is generating a 32-bit call instead of a 64-bit call,
I will file a bugzilla.

llvm-svn: 114534
2010-09-22 05:49:14 +00:00
Chris Lattner 8a236b63d8 reimplement elf TLS support in terms of addressing modes, eliminating SegmentBaseAddress.
llvm-svn: 114529
2010-09-22 04:39:11 +00:00
Chris Lattner 505af598d0 linux has a different stack alignment than the mac, relax this a bit.
llvm-svn: 114519
2010-09-22 00:46:26 +00:00
Chris Lattner 54e5329545 give VZEXT_LOAD a memory operand, it now works with segment registers.
llvm-svn: 114515
2010-09-22 00:34:38 +00:00
Chris Lattner 07827ba978 revert r114386 now that address modes work correctly, we get a nice
call through gs-relative memory now.

llvm-svn: 114510
2010-09-22 00:11:31 +00:00
Chris Lattner e479e9643b give LCMPXCHG_DAG[8] a memory operand, allowing it to work with addrspace 256/257
llvm-svn: 114508
2010-09-21 23:59:42 +00:00
Chris Lattner 0cefa51114 filecheckize
llvm-svn: 114507
2010-09-21 23:57:27 +00:00
Devang Patel d92f42d1d0 Use FileCheck
llvm-svn: 114475
2010-09-21 20:50:32 +00:00
Owen Anderson f4b1a5bdc4 When adding the carry bit to another value on X86, exploit the fact that the carry-materialization
(sbbl x, x) sets the registers to 0 or ~0.  Combined with two's complement arithmetic, we can fold
the intermediate AND and the ADD into a single SUB.

This fixes <rdar://problem/8449754>.

llvm-svn: 114460
2010-09-21 18:41:19 +00:00
Chris Lattner bb0a1c44bf fix rdar://8453210, a crash handling a call through a GS relative load.
For now, just disable folding the load into the call.

llvm-svn: 114386
2010-09-21 03:37:00 +00:00
Evan Cheng f3e9a48584 Enable machine sinking critical edge splitting. e.g.
define double @foo(double %x, double %y, i1 %c) nounwind {
  %a = fdiv double %x, 3.2
  %z = select i1 %c, double %a, double %y
  ret double %z
}

Was:
_foo:
        divsd   LCPI0_0(%rip), %xmm0
        testb   $1, %dil
        jne     LBB0_2
        movaps  %xmm1, %xmm0
LBB0_2:
        ret

Now:
_foo:
        testb   $1, %dil
        je      LBB0_2
        divsd   LCPI0_0(%rip), %xmm0
        ret
LBB0_2:
        movaps  %xmm1, %xmm0
        ret

This avoids the divsd when early exit is taken.
rdar://8454886

llvm-svn: 114372
2010-09-20 22:52:00 +00:00
Owen Anderson 272ff94916 When TCO is turned on, it is possible to end up with aliasing FrameIndex's. Therefore,
CombinerAA cannot assume that different FrameIndex's never alias, but can instead use
MachineFrameInfo to get the actual offsets of these slots and check for actual aliasing.

This fixes CodeGen/X86/2010-02-19-TailCallRetAddrBug.ll and CodeGen/X86/tailcallstack64.ll
when CombinerAA is enabled, modulo a different register allocation sequence.

llvm-svn: 114348
2010-09-20 20:39:59 +00:00
NAKAMURA Takumi b912c27fc9 test/CodeGen/X86: Add explicit triplet -mtriple=i686-linux to 3 tests incompatible to Win32 codegen.
r114297 raises 3 failures. They might fail also on mingw.

llvm-svn: 114317
2010-09-19 21:58:55 +00:00
Owen Anderson b92b13d8a0 Invert the logic of reachesChainWithoutSideEffects(). What we want to check is that there is
NO path to the destination containing side effects, not that SOME path contains no side effects.
In  practice, this only manifests with CombinerAA enabled, because otherwise the chain has little
to no branching, so "any" is effectively equivalent to "all".

llvm-svn: 114268
2010-09-18 04:45:14 +00:00
Evan Cheng e53ab6dffc Teach machine sink to
1) Do forward copy propagation. This makes it easier to estimate the cost of the
   instruction being sunk.
2) Break critical edges on demand, including cases where the value is used by
   PHI nodes.
Critical edge splitting is not yet enabled by default.

llvm-svn: 114227
2010-09-17 22:28:18 +00:00
Dan Gohman 534db8a5c8 Avoid emitting a PIC base register if no PIC addresses are needed.
This fixes rdar://8396318.

llvm-svn: 114201
2010-09-17 20:24:24 +00:00
Dale Johannesen f95f59a0c2 When substituting sunkaddrs into indirect arguments an asm, we were
walking the asm arguments once and stashing their Values.  This is
wrong because the same memory location can be in the list twice, and
if the first one has a sunkaddr substituted, the stashed value for the
second one will be wrong (use-after-free).  PR 8154.

llvm-svn: 114104
2010-09-16 18:30:55 +00:00
Bruno Cardoso Lopes e8501a468c Add one more pattern to fallback movddup
llvm-svn: 113522
2010-09-09 18:48:34 +00:00
Devang Patel 3f4abf397c remove these tests for now.
llvm-svn: 113293
2010-09-07 22:03:44 +00:00
Devang Patel b0af23a1f6 There is no need to force target if the test is going to run on other x86 platforms.
llvm-svn: 113285
2010-09-07 20:59:09 +00:00
Devang Patel e50b23e223 Fix command line used to link these test cases.
llvm-svn: 113237
2010-09-07 18:17:56 +00:00
Devang Patel 9dc0e5be58 Reintroduce dbg-declare tests.
llvm-svn: 113232
2010-09-07 18:01:49 +00:00
Devang Patel 688338eec3 Remove last three tests. I need to make them independent of my setup.
llvm-svn: 113213
2010-09-07 17:08:57 +00:00
Devang Patel 55a3bab0d2 Add a test case to check handling of dbg-declare during hybrid mode where we begin using fast-isel but switch back to DAG building at some point.
llvm-svn: 113210
2010-09-07 17:03:44 +00:00
Devang Patel 29a775adf1 Add a test case to check handling of dbg-declare by selection DAG builder.
llvm-svn: 113209
2010-09-07 16:56:35 +00:00
Devang Patel 184c81c3e2 Add a test case to check handling of dbg-declare by fast-isel.
llvm-svn: 113208
2010-09-07 16:40:53 +00:00
Chris Lattner eeba0c73e5 implement rdar://6653118 - fastisel should fold loads where possible.
Since mem2reg isn't run at -O0, we get a ton of reloads from the stack,
for example, before, this code:

int foo(int x, int y, int z) {
  return x+y+z;
}

used to compile into:

_foo:                                   ## @foo
	subq	$12, %rsp
	movl	%edi, 8(%rsp)
	movl	%esi, 4(%rsp)
	movl	%edx, (%rsp)
	movl	8(%rsp), %edx
	movl	4(%rsp), %esi
	addl	%edx, %esi
	movl	(%rsp), %edx
	addl	%esi, %edx
	movl	%edx, %eax
	addq	$12, %rsp
	ret

Now we produce:

_foo:                                   ## @foo
	subq	$12, %rsp
	movl	%edi, 8(%rsp)
	movl	%esi, 4(%rsp)
	movl	%edx, (%rsp)
	movl	8(%rsp), %edx
	addl	4(%rsp), %edx    ## Folded load
	addl	(%rsp), %edx     ## Folded load
	movl	%edx, %eax
	addq	$12, %rsp
	ret

Fewer instructions and less register use = faster compiles.

llvm-svn: 113102
2010-09-05 02:18:34 +00:00
Dale Johannesen 367afb5a00 Remove the rest of the nonexistent 64-bit AVX instructions.
Bruno, please review.

llvm-svn: 113014
2010-09-03 21:23:00 +00:00
NAKAMURA Takumi 24d039ebe3 test/CodeGen/X86: Add explicit -mtriple=(i686|x86_64)-linux for Win32 host.
llvm-svn: 112947
2010-09-03 03:24:08 +00:00
Bruno Cardoso Lopes d6634a5b2e AVX doesn't support mm operations neither its instrinsics.
The AVX versions of PALIGN and PABS* should only exist for
128-bit. Remove the unnecessary stuff.

llvm-svn: 112944
2010-09-03 02:08:45 +00:00
Anton Korobeynikov a5a645559c Properly emit __chkstk call instead of __alloca on non-mingw windows targets.
Patch by Cameron Esfahani!

llvm-svn: 112902
2010-09-02 23:03:46 +00:00
Dan Gohman 3c9b5f394b Don't narrow the load and store in a load+twiddle+store sequence unless
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.

This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.

llvm-svn: 112861
2010-09-02 21:18:42 +00:00
NAKAMURA Takumi a224e5563e test/loop-strength-reduce4: Add explicit triplet for Win32 host.
llvm-svn: 112802
2010-09-02 03:45:58 +00:00
NAKAMURA Takumi 54ce546865 test/twoaddr-coalesce: Do not use @main.
Win32 codegen emits implicit invoking __main into, to fail.

llvm-svn: 112801
2010-09-02 03:45:51 +00:00
Bruno Cardoso Lopes fea81b4831 Using target specific nodes for shuffle nodes makes the mask
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:

  movaps  (%rdi), %xmm0
  movaps  (%rax), %xmm1
  movaps  %xmm0, %xmm2
  movss %xmm1, %xmm2
  shufps  $36, %xmm2, %xmm0

now is generated as:

  movaps  (%rdi), %xmm0
  movaps  %xmm0, %xmm1
  movlps  (%rax), %xmm1
  shufps  $36, %xmm1, %xmm0

llvm-svn: 112753
2010-09-01 22:33:20 +00:00
Jakob Stoklund Olesen 4b6fd48bba Teach RemoveCopyByCommutingDef to check all aliases, not just subregisters.
This caused a miscompilation in WebKit where %RAX had conflicting defs when
RemoveCopyByCommutingDef was commuting a %EAX use.

llvm-svn: 112751
2010-09-01 22:15:35 +00:00
Dan Gohman 110ed64fbb Revert 112442 and 112440 until the compile time problems introduced
by 112440 are resolved.

llvm-svn: 112692
2010-09-01 01:45:53 +00:00
Chris Lattner 34bfab0ad5 two changes:
1) nuke ConstDataCoalSection, which is dead.
2) revise my previous patch for rdar://8018335,
  which was completely wrong.  Specifically, it doesn't 
  make sense to mark __TEXT,__const_coal as PURE_INSTRUCTIONS,
  because it is for readonly data.  templates (it turns out)
  go to const_coal_nt.  The real fix for rdar://8018335 was
  to give ConstTextCoalSection a section kind of ReadOnly 
  instead of Text.

llvm-svn: 112496
2010-08-30 18:12:35 +00:00
Duncan Sands 68c30907cc Correct bogus module triple specifications.
llvm-svn: 112469
2010-08-30 10:48:29 +00:00
Dan Gohman 3a08ed7904 Make IVUsers iterative instead of recursive.
This has the side effect of reversing the order of most of
IVUser's results.

llvm-svn: 112442
2010-08-29 16:40:03 +00:00
Dan Gohman 6665550bca Make this test less dependent on register allocation choices.
llvm-svn: 112426
2010-08-29 14:49:42 +00:00
Chris Lattner c2887bc283 merge a bunch of shuffle tests into sse2.ll
llvm-svn: 112398
2010-08-29 03:19:04 +00:00
Chris Lattner b1ff978406 add some nounwind's
llvm-svn: 112396
2010-08-29 03:07:47 +00:00
Chris Lattner 94656b1c8c fix the buildvector->insertp[sd] logic to not always create a redundant
insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379
2010-08-28 17:59:08 +00:00
Chris Lattner bcb6090ad0 fix the BuildVector -> unpcklps logic to not do pointless shuffles
when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378
2010-08-28 17:28:30 +00:00
Dan Gohman e06905d1f0 Completely disable tail calls when fast-isel is enabled, as fast-isel
doesn't currently support dealing with this.

llvm-svn: 112341
2010-08-28 00:51:03 +00:00
Chris Lattner 7413e87b6d get this test passing on linux builders.
llvm-svn: 112280
2010-08-27 18:49:08 +00:00
Daniel Dunbar 1844a71e66 X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler.
llvm-svn: 112250
2010-08-27 01:30:14 +00:00
Chris Lattner af23e9a798 Add a hackaround for PR7993 which is causing failures on x86 builders that lack sse2.
llvm-svn: 112175
2010-08-26 06:57:07 +00:00
Chris Lattner 66afba7aa4 I think enough general codegen bugs are fixed to allow this to work
on random hosts, lets see!

llvm-svn: 112172
2010-08-26 05:52:42 +00:00
Chris Lattner eb2cc0ce0e implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1.
llvm-svn: 112171
2010-08-26 05:51:22 +00:00
Chris Lattner 825294b85f Make sure this forces the x86 targets
llvm-svn: 112169
2010-08-26 05:25:05 +00:00
Chris Lattner cc60609cb4 fix sse1 only codegen in x86-64 mode, which is something we
apparently try to support.

llvm-svn: 112168
2010-08-26 05:24:29 +00:00
Chris Lattner c7fb446a9d temporarily disable this, which started failing on the llvm-i686-linux
builder.  I will investigate tonight.

llvm-svn: 112113
2010-08-25 23:43:14 +00:00
Chris Lattner 75ff053497 Change handling of illegal vector types to widen when possible instead of
expanding: e.g. <2 x float> -> <4 x float> instead of -> 2 floats.  This
affects two places in the code: handling cross block values and handling
function return and arguments.  Since vectors are already widened by 
legalizetypes, this gives us much better code and unblocks x86-64 abi
and SPU abi work.

For example, this (which is a silly example of a cross-block value):
define <4 x float> @test2(<4 x float> %A) nounwind {
 %B = shufflevector <4 x float> %A, <4 x float> undef, <2 x i32> <i32 0, i32 1>
 %C = fadd <2 x float> %B, %B
  br label %BB
BB:
 %D = fadd <2 x float> %C, %C
 %E = shufflevector <2 x float> %D, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
 ret <4 x float> %E
}

Now compiles into:

_test2:                                 ## @test2
## BB#0:
 addps %xmm0, %xmm0
 addps %xmm0, %xmm0
 ret

previously it compiled into:

_test2:                                 ## @test2
## BB#0:
 addps %xmm0, %xmm0
 pshufd $1, %xmm0, %xmm1
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
 insertps $0, %xmm0, %xmm0
 insertps $16, %xmm1, %xmm0
 addps %xmm0, %xmm0
 ret

This implements rdar://8230384

llvm-svn: 112101
2010-08-25 22:49:25 +00:00
Bruno Cardoso Lopes 0bc919fa35 Convert test to use filecheck and make it more specific
llvm-svn: 112016
2010-08-25 01:47:16 +00:00
Dan Gohman c88fda477a Fix X86's isLegalAddressingMode to recognize that static addresses
need not be RIP-relative in small mode.

llvm-svn: 111917
2010-08-24 15:55:12 +00:00
Chris Lattner 58bd73a5a7 Add a new llvm.x86.int intrinsic, allowing access to the
x86 int and int3 instructions.  Patch by Peter Housel!

llvm-svn: 111831
2010-08-23 19:39:25 +00:00
Dan Gohman 42ef669d81 Fix x86 fast-isel's cmp+branch folding to avoid folding when the
comparison is in a different basic block from the branch. In such
cases, the comparison's operands may not have initialized virtual
registers available.

llvm-svn: 111709
2010-08-21 02:32:36 +00:00
Evan Cheng 361b9be7c6 It's possible to sink a def if its local uses are PHI's.
llvm-svn: 111537
2010-08-19 18:33:29 +00:00
Dan Gohman 2470818942 When sending stats output to stdout for grepping, don't emit normal
output to standard output also.

llvm-svn: 111401
2010-08-18 20:32:46 +00:00
Dan Gohman ed2b005842 Tweak IVUsers' concept of "interesting" to exclude add recurrences
where the step value is an induction variable from an outer loop, to
avoid trouble trying to re-expand such expressions. This effectively
hides such expressions from indvars and lsr, which prevents them
from getting into trouble.

llvm-svn: 111317
2010-08-17 22:50:37 +00:00
Evan Cheng efdc74ea59 Add nounwind.
llvm-svn: 111312
2010-08-17 22:35:20 +00:00
Dale Johannesen 16f96445c3 Make fast scheduler handle asm clobbers correctly.
PR 7882.  Follows suggestion by Amaury Pouly, thanks.

llvm-svn: 111306
2010-08-17 22:17:24 +00:00
Evan Cheng f259efde47 PHI elimination should not break back edge. It can cause some significant code placement issues. rdar://8263994
good:
LBB0_2:
  mov     r2, r0
  . . .
  mov     r1, r2
  bne     LBB0_2

bad:
LBB0_2:
  mov     r2, r0
  . . .
@ BB#3:
  mov     r1, r2
  b       LBB0_2

llvm-svn: 111221
2010-08-17 01:20:36 +00:00
Benjamin Kramer cbc55d9dc0 Test expects SSE, give him SSE.
llvm-svn: 111115
2010-08-15 23:32:03 +00:00
Benjamin Kramer 4566466b7f Restore arch on these test, they fail on arm.
llvm-svn: 111109
2010-08-15 20:42:56 +00:00
Dale Johannesen 339423c460 Mark as XFAIL on darwin 8. PR 7886.
llvm-svn: 111108
2010-08-15 19:40:29 +00:00
Dale Johannesen 8d3c89e765 Revert 110491. While not wrong, it was based on a
misanalysis and is undesirable.

llvm-svn: 111028
2010-08-13 18:43:45 +00:00
Bruno Cardoso Lopes 7f704b31a9 - Teach SSEDomainFix to switch between different levels of AVX instructions. Here we guess that AVX will have domain issues, so just implement them for consistency and in the future we remove if it's unnecessary.
- Make foldMemoryOperandImpl aware of 256-bit zero vectors folding and support the 128-bit counterparts of AVX too.
- Make sure MOV[AU]PS instructions are only selected when SSE1 is enabled, and duplicate the patterns to match AVX.
- Add a testcase for a simple 128-bit zero vector creation.

llvm-svn: 110946
2010-08-12 20:20:53 +00:00
Bruno Cardoso Lopes 7306c86886 Begin to support some vector operations for AVX 256-bit intructions. The long
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.

llvm-svn: 110897
2010-08-12 02:06:36 +00:00
Devang Patel 48595bf2bc This is x86 only test.
llvm-svn: 110887
2010-08-12 00:17:38 +00:00
Bruno Cardoso Lopes 1675ee7a02 Add testcases for all AVX 256-bit intrinsics added in the last couple days
llvm-svn: 110854
2010-08-11 21:12:09 +00:00
Bruno Cardoso Lopes 29c8818ad9 Reapply r109881 using a more strict command line for llc.
llvm-svn: 110833
2010-08-11 17:39:23 +00:00
Jakob Stoklund Olesen 5730846c2f Fix test for more architectures. Patch by Tobias Grosser.
llvm-svn: 110685
2010-08-10 16:48:24 +00:00
Tobias Grosser fedeff8015 Fix failing testcase.
Those look like typos to me.

llvm-svn: 110664
2010-08-10 09:54:29 +00:00
Devang Patel b219746c80 Handle TAG_constant for integers.
llvm-svn: 110656
2010-08-10 07:11:13 +00:00
Dale Johannesen a3bd31a923 Use sdmem and sse_load_f64 (etc.) for the vector
form of CMPSD (etc.)  Matching a 128-bit memory
operand is wrong, the instruction uses only 64 bits
(same as ADDSD etc.)  8193553.

llvm-svn: 110491
2010-08-07 00:33:42 +00:00
Eric Christopher e1fb772aa5 Add an option to always emit realignment code for a particular module.
llvm-svn: 110404
2010-08-05 23:57:43 +00:00
Devang Patel cc3f3b341d Move x86 specific tests into test/CodeGen/X86.
llvm-svn: 110372
2010-08-05 20:25:37 +00:00
Dan Gohman c53ee449a5 Move x86-specific tests out of test/Transforms/LoopStrengthReduce and
into test/CodeGen/X86, so that they aren't run when the x86 target is
not enabled.

Fix uglygep.ll to not be x86-specific.

llvm-svn: 110343
2010-08-05 17:04:15 +00:00
Daniel Dunbar e62e664656 tests: CodeGen/X86/GC tests require X86.
llvm-svn: 110338
2010-08-05 15:45:33 +00:00
Bill Wendling ca1cb13646 The lower invoke pass needs to have unreachable code elimination run after it
because it could create such things. This fixes a MingW buildbot test failure.

llvm-svn: 110279
2010-08-04 23:36:02 +00:00
Eli Friedman 39d0f57cab PR7814: Truncates cannot be ignored for signed comparisons.
llvm-svn: 110268
2010-08-04 22:40:58 +00:00
Stuart Hastings cba0d06b7c call-imm.ll test case regex fix. Patch by Dimitry Andric!
llvm-svn: 110199
2010-08-04 15:31:35 +00:00
Jakob Stoklund Olesen 011ff9bec9 OK, that's it. This test is going away now. But don't worry, I am taking it to a
nice farm in the country where it can play with other tests. And bunnies.

It is not clear what is being tested, and the revision history shows a bunch of
random changes to the expected instruction count. Clearly, we are just fudging
it to pass whenever it fails.

llvm-svn: 110118
2010-08-03 17:21:14 +00:00
Bob Wilson 66161f5eb4 Revert new AVX intrinsic tests. They are breaking buildbots and Bruno is
away from a computer now.
--- Reverse-merging r109881 into '.':
D    test/CodeGen/X86/avx-intrinsics-x86.ll
D    test/CodeGen/X86/avx-intrinsics-x86_64.ll

llvm-svn: 109959
2010-07-31 22:36:03 +00:00
Bruno Cardoso Lopes 92941fdb26 A *bunch* of tests for AVX intrinsics
llvm-svn: 109881
2010-07-30 19:57:56 +00:00
Eli Friedman ffe64c06ef Fix for bug reported by Evzen Muller on llvm-commits: make sure to correctly
check the range of the constant when optimizing a comparison between a
constant and a sign_extend_inreg node.

llvm-svn: 109854
2010-07-30 06:44:31 +00:00
Nate Begeman 53afc8f06a Implement a vectorized algorithm for <16 x i8> << <16 x i8>
This is about 4x faster and smaller than the existing scalarization.

llvm-svn: 109566
2010-07-28 00:21:48 +00:00
Nate Begeman 269a6da023 ~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches.
For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret

llvm-svn: 109549
2010-07-27 22:37:06 +00:00
Dan Gohman 55e244698a Use the proper type for shift counts. This fixes a bootstrap error.
llvm-svn: 109265
2010-07-23 21:08:12 +00:00
Dan Gohman 0818684a70 DAGCombine (shl (anyext x, c)) to (anyext (shl x, c)) if the high bits
are not demanded. This often allows the anyext to be folded away.

llvm-svn: 109242
2010-07-23 18:03:30 +00:00
Eric Christopher 9a77382685 Custom lower the memory barrier instructions and add support
for lowering without sse2.  Add a couple of new testcases.

Fixes a few libgomp tests and latent bugs.  Remove a few todos.

llvm-svn: 109078
2010-07-22 02:48:34 +00:00
Dan Gohman 625fd2292d Fix SCEV denormalization of expressions where the exit value from
one loop is involved in the increment of an addrec for another
loop. This fixes rdar://8168938.

llvm-svn: 108863
2010-07-20 17:06:20 +00:00
Duncan Sands 2e839de377 The same problem was being tracked in PR7652.
llvm-svn: 108843
2010-07-20 15:52:32 +00:00
Dan Gohman b5e918dc05 After a custom inserter, in a block which has constant instructions,
update the current basic block in addition to the current insert
position, so that they remain consistent. This fixes rdar://8204072.

llvm-svn: 108765
2010-07-19 22:48:56 +00:00
Owen Anderson 9c271e2835 Remove r108639 now that it is handled by InstCombine instead.
llvm-svn: 108688
2010-07-19 08:10:24 +00:00
Owen Anderson 41670a11a8 Add a testcase for r108639.
llvm-svn: 108640
2010-07-18 08:57:19 +00:00
Bill Wendling bf8370ff36 Consider this function:
void foo() { __builtin_unreachable(); }

It will output the following on Darwin X86:

_func1:
Leh_func_begin0:
        pushq %rbp
Ltmp0:
        movq %rsp, %rbp
Ltmp1:
Leh_func_end0:

This prolog adds a new Call Frame Information (CFI) row to the FDE with an
address that is not within the address range of the code it describes -- part is
equal to the end of the function -- and therefore results in an invalid EH
frame. If we emit a nop in this situation, then the CFI row is now within the
address range.

llvm-svn: 108568
2010-07-16 22:51:10 +00:00
Jakob Stoklund Olesen c30b4ddc58 Remove the X86::FP_REG_KILL pseudo-instruction and the X86FloatingPointRegKill
pass that inserted it.

It is no longer necessary to limit the live ranges of FP registers to a single
basic block.

llvm-svn: 108536
2010-07-16 17:41:44 +00:00
Jakob Stoklund Olesen b1671271ab Add forgotten test case.
llvm-svn: 108506
2010-07-16 04:45:35 +00:00
Dan Gohman 103c4ebea5 Use the source-order scheduler instead of the "fast" scheduler at -O0,
because it's more likely to keep debug line information in its original
order.

llvm-svn: 108496
2010-07-16 02:01:19 +00:00
Bill Wendling 4bda1c8e68 Revert. This isn't the correct way to go.
llvm-svn: 108478
2010-07-15 23:42:21 +00:00
Bill Wendling 973dc3b1d8 Handle code gen for the unreachable instruction if it's the only instruction in
the function. We'll just turn it into a "trap" instruction instead.

The problem with not handling this is that it might generate a prologue without
the equivalent epilogue to go with it:

$ cat t.ll
define void @foo() {
entry:
  unreachable
}
$ llc -o - t.ll -relocation-model=pic -disable-fp-elim -unwind-tables
        .section        __TEXT,__text,regular,pure_instructions
        .globl  _foo
        .align  4, 0x90
_foo:                                   ## @foo
Leh_func_begin0:
## BB#0:                                ## %entry
        pushq   %rbp
Ltmp0:
        movq    %rsp, %rbp
Ltmp1:
Leh_func_end0:
...

The unwind tables then have bad data in them causing all sorts of problems.

Fixes <rdar://problem/8096481>.

llvm-svn: 108473
2010-07-15 23:32:40 +00:00
Evan Cheng 55f0c6b9fc Split -enable-finite-only-fp-math to two options:
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.

llvm-svn: 108465
2010-07-15 22:07:12 +00:00
Chris Lattner 60b131654b fix the definitions of ConstTextCoalSection/ConstDataCoalSection
to keep "Text" in sync with the "pure instructions" section attribute.
Lack of this attribute was preventing the assembler from emitting
multibyte noops instructions for templates (and inlines, and other
coalesced stuff) and was causing the assembler to mismatch .o files.

This fixes rdar://8018335

llvm-svn: 108461
2010-07-15 21:22:00 +00:00
Devang Patel df09db62e2 Fix crash reported in PR7653.
llvm-svn: 108441
2010-07-15 18:45:27 +00:00
Dan Gohman 4afd412d6b Watch out for a constant offset cancelling out a base register, forming
a zero. This situation arrises in Fortran code with induction variables
that start at 1 instead of 0. This fixes PR7651.

llvm-svn: 108424
2010-07-15 15:14:45 +00:00
Devang Patel 29168baf4b Make it a .ll test case.
llvm-svn: 108370
2010-07-14 23:12:52 +00:00
Dan Gohman 042523340b Delete fast-isel's trivial load optimization; it breaks debugging because
it can look past points where a debugger might modify user variables.

llvm-svn: 108336
2010-07-14 17:25:37 +00:00
Evan Cheng a8e8874552 Fix for PR7193 was overly conservative. The only case where sibcall callee
address cannot be allocated a register is in 32-bit mode where the first
three arguments are marked inreg. In that case EAX, EDX, and ECX will be
used for argument passing.

This fixes PR7610.

llvm-svn: 108327
2010-07-14 06:44:01 +00:00
Evan Cheng c893115312 Re-enable the test with fix.
llvm-svn: 108319
2010-07-14 05:49:23 +00:00
Chris Lattner 711338fb04 temporarily disable to test to fix buildbots.
llvm-svn: 108310
2010-07-14 02:21:59 +00:00
Evan Cheng d542414945 Teach ProcessImplicitDefs to transform more COPY instructions into IMPLICIT_DEF (and subsequently eliminate them). This allows machine LICM to hoist IMPLICIT_DEF's. PR7620.
llvm-svn: 108304
2010-07-14 01:22:19 +00:00
Dale Johannesen caca5488dc In inline asm treat indirect 'X' constraint as 'm'.
This may not be right in all cases, but it's better
than asserting which it was doing before.  PR 7528.

llvm-svn: 108268
2010-07-13 20:17:05 +00:00
Evan Cheng f43961007c -enable-unsafe-fp-math should not imply -enable-finite-only-fp-math.
llvm-svn: 108254
2010-07-13 18:46:14 +00:00
Dale Johannesen f241d4626c Fix PR number.
llvm-svn: 108251
2010-07-13 18:14:47 +00:00
Dan Gohman 51e6d9bbf6 Apply the SSE dependence idiom for SSE unary operations to
SD instructions too, in addition to SS instructions. And
add a comment about it.

llvm-svn: 108191
2010-07-12 20:46:04 +00:00
Dan Gohman 79be2b9be5 Fix this test.
llvm-svn: 108059
2010-07-10 22:42:12 +00:00
Jakob Stoklund Olesen c4b3bcc051 FileCheckize inline asm FP stack tests
llvm-svn: 108046
2010-07-10 16:30:25 +00:00
Dan Gohman d7b5ce3312 Reapply bottom-up fast-isel, with several fixes for x86-32:
- Check getBytesToPopOnReturn().
 - Eschew ST0 and ST1 for return values.
 - Fix the PIC base register initialization so that it doesn't ever
   fail to end up the top of the entry block.

llvm-svn: 108039
2010-07-10 09:00:22 +00:00
Jakob Stoklund Olesen 51702ec46b Fix a few tests
llvm-svn: 108011
2010-07-09 20:43:09 +00:00
Dan Gohman ea9ae3e6ed Add a target triple.
llvm-svn: 108003
2010-07-09 19:17:36 +00:00
Dan Gohman 7929c448fc Fix MachineLICM to actually visit inner loops.
llvm-svn: 108001
2010-07-09 18:49:45 +00:00
Bob Wilson 6586e9b203 --- Reverse-merging r107947 into '.':
U    utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U    test/CodeGen/X86/fast-isel.ll
U    test/CodeGen/X86/fast-isel-loads.ll
U    include/llvm/Target/TargetLowering.h
U    include/llvm/Support/PassNameParser.h
U    include/llvm/CodeGen/FunctionLoweringInfo.h
U    include/llvm/CodeGen/CallingConvLower.h
U    include/llvm/CodeGen/FastISel.h
U    include/llvm/CodeGen/SelectionDAGISel.h
U    lib/CodeGen/LLVMTargetMachine.cpp
U    lib/CodeGen/CallingConvLower.cpp
U    lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U    lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U    lib/CodeGen/SelectionDAG/FastISel.cpp
U    lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U    lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U    lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U    lib/CodeGen/SelectionDAG/TargetLowering.cpp
U    lib/Target/XCore/XCoreISelLowering.cpp
U    lib/Target/XCore/XCoreISelLowering.h
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86ISelLowering.h

llvm-svn: 107987
2010-07-09 16:37:18 +00:00
Dan Gohman 0b5aa1cdd3 Re-apply bottom-up fast-isel, with fixes. Be very careful to avoid emitting
a DBG_VALUE after a terminator, or emitting any instructions before an EH_LABEL.

llvm-svn: 107943
2010-07-09 00:39:23 +00:00
Bill Wendling a992445ff2 Extension of r107506. Make sure that we don't mark a function as having a call
if the inline ASM doesn't need a stack frame.

llvm-svn: 107922
2010-07-08 22:38:02 +00:00
Eric Christopher e796253217 A slight reworking of the custom patterns for x86-64 tpoff codegen and
correct the testcase for valid assembly.

Needs more tests.

llvm-svn: 107860
2010-07-08 07:36:46 +00:00
Dan Gohman e75704369d Revert 107840 107839 107813 107804 107800 107797 107791.
Debug info intrinsics win for now.

llvm-svn: 107850
2010-07-08 01:00:56 +00:00
Jakob Stoklund Olesen ddaf0099a5 Allow copies between GR8_ABCD_L and GR8_ABCD_H.
This fixes PR7540.

llvm-svn: 107809
2010-07-07 20:33:27 +00:00
Dan Gohman e7ccc51cc1 Implement bottom-up fast-isel. This has the advantage of not requiring
a separate DCE pass over MachineInstrs.

llvm-svn: 107804
2010-07-07 19:20:32 +00:00
Dan Gohman 2d4d01d0de Add X86FastISel support for return statements. This entails refactoring
a bunch of stuff, to allow the target-independent calling convention
logic to be employed.

llvm-svn: 107800
2010-07-07 18:32:53 +00:00
Dale Johannesen ce65663330 Accept RIP-relative symbols with 'i' constraint, and
print the (%rip) only if the 'a' modifier is present.
PR 7528.

llvm-svn: 107727
2010-07-06 23:27:00 +00:00
Dale Johannesen 6f01541ae6 Make test not hang waiting for input.
llvm-svn: 107721
2010-07-06 23:06:58 +00:00
Jakob Stoklund Olesen a64c0a3d22 Be more forgiving when calculating alias interference for physreg coalescing.
It is OK for an alias live range to overlap if there is a copy to or from the
physical register. CoalescerPair can work out if the copy is coalescable
independently of the alias.

This means that we can join with the actual destination interval instead of
using the getOrigDstReg() hack. It is no longer necessary to merge clobber
ranges into subregisters.

llvm-svn: 107695
2010-07-06 20:31:51 +00:00
Devang Patel 23a7593534 Fix PR7545 crash.
llvm-svn: 107678
2010-07-06 18:18:32 +00:00
Eric Christopher 8f06b4a294 Remove mistakenly added test.
llvm-svn: 107641
2010-07-06 05:20:13 +00:00
Eric Christopher 2ad0c779c3 Fix up -fstack-protector on linux to use the segment
registers.  Split out testcases per architecture and os
now.

Patch from Nelson Elhage.

llvm-svn: 107640
2010-07-06 05:18:56 +00:00
Chris Lattner 60db4557cd another v2f32 case, in this case showing poor codegen.
llvm-svn: 107614
2010-07-05 05:52:56 +00:00
Chris Lattner 431e81f2fb fix test on non-x86 hosts.
llvm-svn: 107608
2010-07-05 03:56:55 +00:00
Chris Lattner 45cc4d74a3 Just rip v2f32 support completely out of the X86 backend. In
the example in the testcase, we now generate:

_test1:                                 ## @test1
	movss	4(%esp), %xmm0
	addss	8(%esp), %xmm0
	movl	12(%esp), %eax
	movss	%xmm0, (%eax)
	ret

instead of:

_test1:                                                     ## @test1
	subl	$20, %esp
	movl	24(%esp), %eax
	movq	%mm0, (%esp)
	movq	%mm0, 8(%esp)
	movss	(%esp), %xmm0
	addss	12(%esp), %xmm0
	movss	%xmm0, (%eax)
	addl	$20, %esp
	ret

v2f32 support did not work reliably because most of the X86
backend didn't know it was legal.  It was apparently only added
to support returning source-level v2f32 values in MMX registers
in x86-32 mode.  If ABI compatibility is important on this
GCC-extended-vector type for some reason, then the frontend
should generate IR that returns v2i32 instead of v2f32.  However,
we generally don't try very hard to be abi compatible on gcc
extended vectors. 

llvm-svn: 107601
2010-07-04 23:07:25 +00:00
Chris Lattner 681b926d54 fix PR7518 - terrible codegen of <2 x float>, by only marking
v2f32 as legal in 32-bit mode.  It is just as terrible there,
but I just care about x86-64 and noone claims it is valuable
in 64-bit mode.

llvm-svn: 107600
2010-07-04 22:57:10 +00:00
Evan Cheng 0ce84486c3 - Two-address pass should not assume unfolding is always successful.
- X86 unfolding should check if the instructions being unfolded has memoperands.
  If there is no memoperands, then it must assume conservative alignment. If this
  would introduce an expensive sse unaligned load / store, then unfoldMemoryOperand
  etc. should not unfold the instruction.

llvm-svn: 107509
2010-07-02 20:36:18 +00:00
Dale Johannesen 4d887f7ca7 Propagate the AlignStack bit in InlineAsm's to the
PrologEpilog code, and use it to determine whether
the asm forces stack alignment or not.  gcc consistently
does not do this for GCC-style asms; Apple gcc inconsistently
sometimes does it for asm blocks.  There is no
convenient place to put a bit in either the SDNode or
the MachineInstr form, so I've added an extra operand
to each; unlovely, but it does allow for expansion for
more bits, should we need it.  PR 5125.  Some
existing testcases are affected.
The operand lists of the SDNode and MachineInstr forms
are indexed with awesome mnemonics, like "2"; I may
fix this someday, but not now.  I'm not making it any
worse.  If anyone is inspired I think you can find all
the right places from this patch.

llvm-svn: 107506
2010-07-02 20:16:09 +00:00
Bill Wendling 03bcd6ecc8 Implement the "linker_private_weak" linkage type. This will be used for
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.

For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:

      .globl l_objc_msgSend_fixup_alloc
      .weak_definition l_objc_msgSend_fixup_alloc
      .section __DATA, __objc_msgrefs, coalesced
      .align 3
l_objc_msgSend_fixup_alloc:
       .quad   _objc_msgSend_fixup
       .quad   L_OBJC_METH_VAR_NAME_1

This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".

Currently only supported on Darwin platforms.

llvm-svn: 107433
2010-07-01 21:55:59 +00:00
Dan Gohman d2965c10a1 Temporarily disable on-demand fast-isel.
llvm-svn: 107393
2010-07-01 12:15:30 +00:00
Dan Gohman aef3d140b7 Teach fast-isel to avoid loading a value from memory when it's already
available in a register. This is pretty primitive, but it reduces the
number of instructions in common testcases by 4%.

llvm-svn: 107380
2010-07-01 03:49:38 +00:00
Dan Gohman 722f5fc567 Enable on-demand fast-isel.
llvm-svn: 107377
2010-07-01 02:58:57 +00:00
Dan Gohman 7937d5606d Teach X86FastISel to fold constant offsets and scaled indices in
the same address.

llvm-svn: 107373
2010-07-01 02:27:15 +00:00
Dale Johannesen 17feb07c53 In asm's, output operands with matching input constraints
have to be registers, per gcc documentation.  This affects
the logic for determining what "g" should lower to.  PR 7393.
A couple of existing testcases are affected.

llvm-svn: 107079
2010-06-28 22:09:45 +00:00
Jakob Stoklund Olesen fde9c348e9 Don't write temporary files in test directory
llvm-svn: 107049
2010-06-28 20:01:15 +00:00
Jakob Stoklund Olesen 0117091c16 Add a triple so test runs on Linux as well.
llvm-svn: 107045
2010-06-28 19:31:15 +00:00
Jakob Stoklund Olesen 0d94d7af78 Add more special treatment for inline asm in RegAllocFast.
When an instruction has tied operands and physreg defines, we must take extra
care that the tied operands conflict with neither physreg defs nor uses.

The special treatment is given to inline asm and instructions with tied operands
/ early clobbers and physreg defines.

This fixes PR7509.

llvm-svn: 107043
2010-06-28 18:34:34 +00:00
Benjamin Kramer 3bbc52ce3e Fix some tests that didn't test anything.
llvm-svn: 106954
2010-06-26 20:05:06 +00:00
Jakob Stoklund Olesen d7d0d4e882 When creating X86 MUL8 and DIV8 instructions, make sure we don't produce
CopyFromReg nodes for aliasing registers (AX and AL). This confuses the fast
register allocator.

Instead of CopyFromReg(AL), use ExtractSubReg(CopyFromReg(AX), sub_8bit).

This fixes PR7312.

llvm-svn: 106934
2010-06-26 00:39:23 +00:00
Dale Johannesen ce97d55ad9 The hasMemory argument is irrelevant to how the argument
for an "i" constraint should get lowered; PR 6309.  While
this argument was passed around a lot, this is the only
place it was used, so it goes away from a lot of other
places.

llvm-svn: 106893
2010-06-25 21:55:36 +00:00
Dan Gohman 8de1fe3ccf pcmpeqd and friends are Commutable.
llvm-svn: 106886
2010-06-25 21:05:35 +00:00
Bill Wendling e41e40f689 - Reapply r106066 now that the bzip2 build regression has been fixed.
- 2010-06-25-CoalescerSubRegDefDead.ll is the testcase for r106878.

llvm-svn: 106880
2010-06-25 20:48:10 +00:00
Dan Gohman 600658a4ba Don't write an output file to cwd, and put an rdar prefix on
an rdar number.

llvm-svn: 106810
2010-06-24 23:45:15 +00:00
Dan Gohman 9a2f0473b2 Teach EmitLiveInCopies to omit copies for unused virtual registers,
and to clean up unused incoming physregs from the live-in list.

llvm-svn: 106805
2010-06-24 22:23:02 +00:00
Dale Johannesen 5ad5226c58 Disallow matching "i" constraint to symbol addresses when
address requires a register or secondary load to compute
(most PIC modes).  This improves "g" constraint handling.  8015842.

The test from 2007 is attempting to test the fix for PR1761,
but since -relocation-model=static doesn't work on Darwin
x86-64, it was not testing what it was supposed to be testing
and was passing erroneously.  Fixed to use Linux x86-64.

llvm-svn: 106779
2010-06-24 20:14:51 +00:00
Dan Gohman 463f26b4be Eliminate the other half of the BRCOND optimization, and update
as many tests as possible.

llvm-svn: 106749
2010-06-24 15:24:03 +00:00
Dan Gohman df6b33e778 Eliminate the first have of the optimization which eliminates BRCOND
when the condition is constant. This optimization shouldn't be
necessary, because codegen shouldn't be able to find dead control
paths that the IR-level optimizer can't find. And it's undesirable,
because it encourages bugpoint to leave "br i1 false" branches
in its output. And it wasn't updating the CFG.

I updated all the tests I could, but some tests are too reduced
and I wasn't able to meaningfully preserve them.

llvm-svn: 106748
2010-06-24 15:04:11 +00:00
Dan Gohman 600f62b3ba Reapply r106634, now that the bug it exposed is fixed.
llvm-svn: 106746
2010-06-24 14:30:44 +00:00
Dan Gohman 0695e09b09 Optimize the "bit test" code path for switch lowering in the
case where the bit mask has exactly one bit.

llvm-svn: 106716
2010-06-24 02:06:24 +00:00
Bill Wendling a136521a17 MorphNodeTo doesn't preserve the memory operands. Because we're morphing a node
into the same node, but with different non-memory operands, we need to replace
the memory operands after it's finished morphing.

llvm-svn: 106643
2010-06-23 18:16:24 +00:00
Daniel Dunbar 4df321b7ad Revert r106263, "Fold the ShrinkDemandedOps pass into the regular DAGCombiner pass,"... it was causing both 'file' (with clang) and 176.gcc (with llvm-gcc) to be miscompiled.
llvm-svn: 106634
2010-06-23 17:09:26 +00:00
Daniel Dunbar ef5a4383ad Revert r106066, "Create a more targeted fix for not sinking instructions into a range where it"... it causes bzip2 to be miscompiled by Clang.
Conflicts:

	lib/CodeGen/MachineSink.cpp

llvm-svn: 106614
2010-06-23 00:48:25 +00:00
Dan Gohman f1cf963c64 Loosen up this test so that it doesn't depend as much on register
allocation details.

llvm-svn: 106599
2010-06-22 23:32:47 +00:00
Dan Gohman 1081f1a0f5 Fix OptimizeMax to handle an odd case where one of the max operands
is another max which folds. This fixes PR7454.

llvm-svn: 106594
2010-06-22 23:07:13 +00:00
Dale Johannesen 6d4802ba6c Add SSE so these actually pass on non-X86 hosts.
llvm-svn: 106575
2010-06-22 20:54:03 +00:00
Mon P Wang 825639e849 Move v-binop-widen tests to X86 since they don't work on all platforms
llvm-svn: 106562
2010-06-22 19:40:50 +00:00
Jakob Stoklund Olesen 9c47dac677 Remove the SimpleJoin optimization from SimpleRegisterCoalescing.
Measurements show that it does not speed up coalescing, so there is no reason
the keep the added complexity around.

Also clean out some unused methods and static functions.

llvm-svn: 106548
2010-06-22 16:13:57 +00:00
Dan Gohman 3c1b3c61e9 Teach two-address lowering how to unfold a load to open up commuting
opportunities. For example, this lets it emit this:

   movq (%rax), %rcx
   addq %rdx, %rcx

instead of this:

   movq %rdx, %rcx
   addq (%rax), %rcx

in the case where %rdx has subsequent uses. It's the same number
of instructions, and usually the same encoding size on x86, but
it appears faster, and in general, it may allow better scheduling
for the load.

llvm-svn: 106493
2010-06-21 22:17:20 +00:00
Dan Gohman 2dd1d3d182 Make this test more robust in case LLVM ever decides to align the global
variable differently.

llvm-svn: 106454
2010-06-21 19:56:27 +00:00
Eric Christopher bf572c7cea Add some codegen patterns for x86_64-linux-gnu tls codegen matching.
Based on a patch by Patrick Marlier!

llvm-svn: 106433
2010-06-21 18:21:27 +00:00
Dan Gohman 51d00092b6 Include the use kind along with the expression in the key of the
use sharing map. The reconcileNewOffset logic already forces a
separate use if the kinds differ, so incorporating the kind in the
key means we can track more sharing opportunities.

More sharing means fewer total uses to track, which means smaller
problem sizes, which means the conservative throttles don't kick
in as often.

llvm-svn: 106396
2010-06-19 21:29:59 +00:00
Dan Gohman 99ba4dac59 Don't maintain a set of deleted nodes; instead, use a HandleSDNode
to track a node over CSE events. This fixes PR7368.

llvm-svn: 106266
2010-06-18 01:24:29 +00:00
Dan Gohman b92156d5e4 Fold the ShrinkDemandedOps pass into the regular DAGCombiner pass,
which is faster, simpler, and less surprising.

llvm-svn: 106263
2010-06-18 01:05:21 +00:00
Dan Gohman 30d7a51d6c Make this test less fragile.
llvm-svn: 106255
2010-06-18 00:06:03 +00:00
Bill Wendling 8c0cf0994d Create a more targeted fix for not sinking instructions into a range where it
will conflict with another live range. The place which creates this scenerio is
the code in X86 that lowers a select instruction by splitting the MBBs. This
eliminates the need to check from the bottom up in an MBB for live pregs.

llvm-svn: 106066
2010-06-15 23:46:31 +00:00
Jakob Stoklund Olesen ec2e964fd6 Remove the local register allocator.
Please use the fast allocator instead.

llvm-svn: 106051
2010-06-15 21:58:33 +00:00
Chris Lattner 874c92bd47 fix fastisel to handle GS and FS relative pointers. Patch by
Nelson Elhage!

llvm-svn: 106031
2010-06-15 19:08:40 +00:00
Jakob Stoklund Olesen 246e9a07a2 Avoid processing early clobbers twice in RegAllocFast.
Early clobbers defining a virtual register were first alocated to a physreg and
then processed as a physreg EC, spilling the virtreg.

This fixes PR7382.

llvm-svn: 105998
2010-06-15 16:20:57 +00:00
Chris Lattner 00ab615406 apparently lots of dupes.
llvm-svn: 105956
2010-06-14 20:19:03 +00:00
Chris Lattner faa7bdccbf fix a nasty bug where we were not treating available_externally
symbols as declarations in the X86 backend.  This would manifest
on darwin x86-32 as errors like this with -fvisibility=hidden:

symbol '__ZNSbIcED1Ev' can not be undefined in a subtraction expression

This fixes PR7353.

llvm-svn: 105954
2010-06-14 20:11:56 +00:00
Chris Lattner bbb798c7d1 remove old test.
llvm-svn: 105953
2010-06-14 20:07:43 +00:00
Chris Lattner b30f87b74e rename test
llvm-svn: 105952
2010-06-14 20:07:34 +00:00
Bill Wendling d53a2cb4ac Testcase for r105741.
llvm-svn: 105750
2010-06-09 20:30:22 +00:00
Jakob Stoklund Olesen 8bc5eca331 Mark physregs defined by inline asm as implicit.
This is a bit of a hack to make inline asm look more like call instructions.
It would be better to produce correct dead flags during isel.

llvm-svn: 105749
2010-06-09 20:05:00 +00:00
Dan Gohman bbfb6aca92 LSR needs to remember inserted instructions even in postinc mode, because
there could be multiple subexpressions within a single expansion which
require insert point adjustment. This fixes PR7306.

llvm-svn: 105510
2010-06-05 00:33:07 +00:00
Dan Gohman 538b413ccb Fix normalization and de-normalization of non-affine SCEVs.
llvm-svn: 105480
2010-06-04 19:16:34 +00:00
Mon P Wang 622cdd2297 Fixed a bug during widening where we would avoid legalizing a node. When we
replace an OpA with a widened OpB, it is possible to get new uses of OpA due to CSE
when recursively updating nodes.  Since OpA has been processed, the new uses are
not examined again.  The patch checks if this occurred and it it did, updates the
new uses of OpA to use OpB.

llvm-svn: 105453
2010-06-04 01:20:10 +00:00
Dan Gohman 8fdda8a655 This test doesn't need the ssp attribute.
llvm-svn: 105440
2010-06-04 00:14:48 +00:00
Dan Gohman d83e3e7750 Fix SimplifyDemandedBits' AssertZext logic to demand all the bits. It
needs to demand the high bits because it's asserting that they're zero.

llvm-svn: 105406
2010-06-03 20:21:33 +00:00
Bill Wendling f82aea634c Machine sink could potentially sink instructions into a block where the physical
registers it defines then interfere with an existing preg live range.

For instance, if we had something like these machine instructions:

BB#0
  ... = imul ... EFLAGS<imp-def,dead>
  test ..., EFLAGS<imp-def>
  jcc BB#2 EFLAGS<imp-use>

BB#1
  ... ; fallthrough to BB#2

BB#2
  ... ; No code that defines EFLAGS
  jcc ... EFLAGS<imp-use>

Machine sink will come along, see that imul implicitly defines EFLAGS, but
because it's "dead", it assumes that it can move imul into BB#2. But when it
does, imul's "dead" imp-def of EFLAGS is raised from the dead (a zombie) and
messes up the condition code for the jump (and pretty much anything else which
relies upon it being correct).

The solution is to know which pregs are live going into a basic block. However,
that information isn't calculated at this point. Nor does the LiveVariables pass
take into account non-allocatable physical registers. In lieu of this, we do a
*very* conservative pass through the basic block to determine if a preg is live
coming out of it.

llvm-svn: 105387
2010-06-03 07:54:20 +00:00
Eric Christopher f67fe3b1e8 One underscore, not two.
llvm-svn: 105379
2010-06-03 04:02:59 +00:00
Dan Gohman b782caa393 Fill in missing support for ISD::FEXP, ISD::FPOWI, and friends.
llvm-svn: 105283
2010-06-01 18:35:14 +00:00
Chris Lattner 14c46517b5 fix PR6623: when optimizing for size, don't inline memcpy/memsets
that are too large.  This causes the freebsd bootloader to be too
large apparently.

It's unclear if this should be an -Os or -Oz thing.  Thoughts welcome.

llvm-svn: 105228
2010-05-31 17:30:14 +00:00
Chris Lattner 291a189cda upgrade and filecheckize this test.
llvm-svn: 105227
2010-05-31 17:27:17 +00:00
Evan Cheng 707b7cc429 Remove schedule-livein-copies. It's not being used.
llvm-svn: 105095
2010-05-29 02:23:39 +00:00
Evan Cheng 27c4933e02 Fix PR7193: if sibling call address can take a register, make sure there are enough registers available by counting inreg arguments.
llvm-svn: 105092
2010-05-29 01:35:22 +00:00
Jakob Stoklund Olesen 2085089c49 Fix more tests that depended on the default register allocator choice.
llvm-svn: 104961
2010-05-28 17:06:30 +00:00
Dan Gohman 2140a74979 Eliminate the restriction that the array size in an alloca must be i32.
This will help reduce the amount of casting required on 64-bit targets.

llvm-svn: 104911
2010-05-28 01:14:11 +00:00
Jakob Stoklund Olesen b613ae2c89 Add a -regalloc=default option that chooses a register allocator based on the -O
optimization level.

This only really affects llc for now because both the llvm-gcc and clang front
ends override the default register allocator. I intend to remove that code later.

llvm-svn: 104904
2010-05-27 23:57:25 +00:00
Devang Patel 6b9a9fe207 Simplify. Eliminate unneeded debug_loc entry.
llvm-svn: 104785
2010-05-26 23:55:23 +00:00
Devang Patel 1b08572a66 Update debug info when live-in reg is copied into a vreg.
llvm-svn: 104732
2010-05-26 20:18:50 +00:00
Dale Johannesen 053dd21c84 Testcase for 104624/104619/PR7191/8023512.
Reduced from one provided by Duncan Sands, thanks!

llvm-svn: 104710
2010-05-26 17:55:45 +00:00
Dale Johannesen cd4ba6caba Removing test; Chris thinks it's better to have the
bug go untested than have a testcase this large.  So be it.

llvm-svn: 104632
2010-05-25 20:40:10 +00:00
Dale Johannesen 60fe2cdc4f Fix another variant of PR 7191. Also add a testcase
Mon Ping provided; unfortunately bugpoint failed to
reduce it, but I think it's important to have a test for
this in the suite.  8023512.

llvm-svn: 104624
2010-05-25 18:47:23 +00:00
Eric Christopher 64087cd346 This test is darwin only. Make it so(tm).
llvm-svn: 104418
2010-05-22 00:55:55 +00:00
Eric Christopher 6fdea1bda8 Add full bss data support for darwin tls variables.
llvm-svn: 104414
2010-05-22 00:10:22 +00:00
Chris Lattner 0735ecfe17 now that fp reg kill insertion stuff happens as a separate
pass after isel instead of being interlaced with it, we can
trust that all the code for a function has been isel'd before
it is run.

The practical impact of this is that we can scan for machine
instr phis instead of doing a fuzzy match on the LLVM BB for
phi nodes.  Doing the fuzzy match required knowing when isel
would produce an fp reg stack phi which was gross.  It was
also wrong in cases where select got lowered to a branch
tree because cmovs aren't available (PR6828).

Just do the scan on machine phis which is simpler, faster
and more correct.  This fixes PR6828.

llvm-svn: 104333
2010-05-21 18:17:54 +00:00
Dale Johannesen b3b9c8ac48 Fix i64->f64 conversion, x86-64, -no-sse. A bit
tricky since there's a 3rd 64-bit type, MMX vectors.
PR 7135.

llvm-svn: 104308
2010-05-21 00:52:33 +00:00
Dan Gohman ee2fea3cd7 When canonicalizing icmp operand order to put the loop invariant
operand on the left, the interesting operand is on the right. This
fixes a bug where LSR was failing to recognize ICmpZero uses,
which led it to be unable to reverse the induction variable in the
attached testcase.

Delete test/CodeGen/X86/stack-color-with-reg-2.ll, because its test
is extremely fragile and hard to meaningfully update.

llvm-svn: 104262
2010-05-20 19:26:52 +00:00
Dan Gohman 887dd1cd31 When converting a test to a cmp to fold a load, use the cmp that has an
8-bit immediate field rather than one with a wider immediate field.

llvm-svn: 104064
2010-05-18 21:42:03 +00:00
Daniel Dunbar a4820fcc78 MC/X86: Implement custom lowering to make sure we match things like
X86::ADC32ri $0, %eax
to
  X86::ADC32i32 $0

llvm-svn: 104030
2010-05-18 17:22:24 +00:00
Dale Johannesen f92c344167 Removing as part of previous reversion.
llvm-svn: 103915
2010-05-16 20:19:40 +00:00
Dale Johannesen 2ef974ee0e Revert 103911; it broke a test that expects bitconvert
<1xi64> -> i64 to work in MMX registers on hosts where -no-sse
is the default (not mine).  The right thing is
to accept this and make i64->f64 conversions go through memory,
but I don't have time right now.

llvm-svn: 103914
2010-05-16 20:19:04 +00:00
Dale Johannesen fc1492d71b Make x86-64 64-bit bitconvert work when SSE is not available.
(This worked as of about 6 months ago and I didn't track down
exactly what broke it; I think this fix is appropriate.)

llvm-svn: 103911
2010-05-16 18:22:38 +00:00
Anton Korobeynikov 8f35fabbc1 Add support for thiscall calling convention.
Patch by Charles Davis and Steven Watanabe!

llvm-svn: 103902
2010-05-16 09:08:45 +00:00
Jakob Stoklund Olesen 4d5c1061e3 Simplify the handling of physreg defs and uses in RegAllocFast.
This adds extra security against using clobbered physregs, and it adds kill
markers to physreg uses.

llvm-svn: 103784
2010-05-14 18:03:25 +00:00
Jakob Stoklund Olesen 0ba2e2a568 Take allocation hints from copy instructions to/from physregs.
This causes way more identity copies to be generated, ripe for coalescing.

llvm-svn: 103686
2010-05-13 00:19:43 +00:00
Jakob Stoklund Olesen 955a0e71e9 Make sure to add kill flags to the last use of a virtreg when it is redefined.
The X86 floating point stack pass and others depend on good kill flags.

llvm-svn: 103635
2010-05-12 18:46:03 +00:00
Jakob Stoklund Olesen e6e39dc310 Enable a bunch more -regalloc=fast tests
llvm-svn: 103531
2010-05-12 00:11:24 +00:00
Jakob Stoklund Olesen 84c881e593 One more -regalloc=fast test
llvm-svn: 103509
2010-05-11 20:51:07 +00:00
Jakob Stoklund Olesen 3f0241e0f9 Simplify the tracking of used physregs to a bulk bitor followed by a transitive
closure after allocating all blocks.

Add a few more test cases for -regalloc=fast.

llvm-svn: 103500
2010-05-11 20:30:28 +00:00
Jakob Stoklund Olesen f1b3029a54 Mostly rewrite RegAllocFast.
Sorry for the big change. The path leading up to this patch had some TableGen
changes that I didn't want to commit before I knew they were useful. They
weren't, and this version does not need them.

The fast register allocator now does no liveness calculations. Instead it relies
on kill flags provided by isel. (Currently those kill flags are also ignored due
to isel bugs). The allocation algorithm is supposed to work with any subset of
valid kill flags. More kill flags simply means fewer spills inserted.

Registers are allocated from a working set that contains no aliases. That means
most allocations can be done directly without expensive alias checks. When the
working set runs out of registers we do the full alias check to find new free
registers.

llvm-svn: 103488
2010-05-11 18:54:45 +00:00
Evan Cheng 02947a4551 Be careful with operand promotion. For a binary operation, the source operands may be the same. PR7018. rdar://7939869.
llvm-svn: 103419
2010-05-10 19:03:57 +00:00
Bill Wendling cd476b6760 Readd testcase.
llvm-svn: 103335
2010-05-08 04:47:54 +00:00
Dan Gohman d0800241d2 When pruning candidate formulae out of an LSRUse, update the
LSRUse's Regs set after all pruning is done, rather than trying
to do it on the fly, which can produce an incomplete result.

This fixes a case where heuristic pruning was stripping all
formulae from a use, which led the solver to enter an infinite
loop.

Also, add a few asserts to diagnose this kind of situation.

llvm-svn: 103328
2010-05-07 23:36:59 +00:00
Bill Wendling 6b5897b4de Remove. Don't XFAIL.
llvm-svn: 103321
2010-05-07 23:09:17 +00:00
Bill Wendling 32d8981ec0 Temorarily revert r101984.
llvm-svn: 103314
2010-05-07 22:45:36 +00:00
Dale Johannesen 51c1695a0a Fix PR 7087, and probably other things, by extending
getConstantFP to accept the two supported long double
target types.  This was not the original intent, but
there are other places that assume this works and it's
easy enough to do.

llvm-svn: 103299
2010-05-07 21:35:53 +00:00
Duncan Sands ebf838274f Correct some bogus target triples.
llvm-svn: 103265
2010-05-07 17:03:48 +00:00
Nick Lewycky 45f530db39 Revert r103133 and add testcase from PR7066.
llvm-svn: 103233
2010-05-07 01:45:38 +00:00