Commit Graph

416 Commits

Author SHA1 Message Date
Evan Cheng cee351c410 Test case for PR2082.
llvm-svn: 47501
2008-02-22 20:38:49 +00:00
Evan Cheng 94ba37f8e3 Allow re-materialization of pic load (controlled by -remat-pic-load for now).
llvm-svn: 47476
2008-02-22 09:25:47 +00:00
Chris Lattner ab8bfc28c8 copy mmx values from/to memory with GPRs on x86-32
instead of with mmx registers.  This horribleness is apparently
done by gcc to avoid having to insert emms in places that really 
should have it.  This is the second half of rdar://5741668.

llvm-svn: 47474
2008-02-22 05:18:04 +00:00
Chris Lattner 997b3a65ca Start using GPR's to copy around mmx value instead of mmx regs.
GCC apparently does this, and code depends on not having to do
emms when this happens.  This is x86-64 only so far, second half
should handle x86-32.

rdar://5741668

llvm-svn: 47470
2008-02-22 02:09:43 +00:00
Chris Lattner bdd4c8b04d Treat clobber operands like early clobbers: if we have
any, we force sdisel to do all regalloc for an asm.  This
leads to gross but correct codegen.

This fixes the rest of PR2078.

llvm-svn: 47454
2008-02-21 19:43:13 +00:00
Tanya Lattner 3cdf542f5a Remove llvm-upgrade and update tests.
llvm-svn: 47432
2008-02-21 07:42:26 +00:00
Chris Lattner 83c93d5afd Fix a (harmless) but where vregs were added to the used reg lists for
inline asms.

Fix PR2078 by marking aliases of registers used when a register is 
marked used.  This prevents EAX from being allocated when AX is listed
in the clobber set for the asm.

llvm-svn: 47426
2008-02-21 04:55:52 +00:00
Evan Cheng 0aa9f2a7f3 XFAIL this for now.
llvm-svn: 47355
2008-02-20 02:38:58 +00:00
Chris Lattner a0b1cc41ef this test requires sse2
llvm-svn: 47331
2008-02-19 18:07:46 +00:00
Chris Lattner 97b9662f78 Don't fold and's into test instructions if they have multiple uses.
This compiles test-nofold.ll into:

_test:
	movl	$15, %ecx
	andl	4(%esp), %ecx
	testl	%ecx, %ecx
	movl	$42, %eax
	cmove	%ecx, %eax
	ret

instead of:
_test:
	movl	4(%esp), %eax
	movl	%eax, %ecx
	andl	$15, %ecx
	testl	$15, %eax
	movl	$42, %eax
	cmove	%ecx, %eax
	ret

llvm-svn: 47330
2008-02-19 17:37:35 +00:00
Chris Lattner 08162d9515 rename tests to avoid a test- prefix when they aren't related to the test instruction.
llvm-svn: 47329
2008-02-19 17:33:52 +00:00
Nick Lewycky 0e2e21b8b9 Don't spew stats to stderr.
llvm-svn: 47308
2008-02-19 03:11:47 +00:00
Nick Lewycky b54a803a2e Fix up the run line for this new test.
llc: for the -info-output-file option:  requires a value!

llvm-svn: 47306
2008-02-19 02:58:36 +00:00
Evan Cheng 634a8f9275 New test.
llvm-svn: 47302
2008-02-19 02:09:58 +00:00
Evan Cheng 6200c225e0 - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.

llvm-svn: 47290
2008-02-18 23:04:32 +00:00
Dan Gohman a589ee11bb Don't mark scalar integer multiplication as Expand on x86, since x86
has plain one-result scalar integer multiplication instructions.
This avoids expanding such instructions into MUL_LOHI sequences that
must be special-cased at isel time, and avoids the problem with that
code that provented memory operands from being folded.

This fixes PR1874, addressesing the most common case. The uncommon
cases of optimizing multiply-high operations will require work
in DAGCombiner.

llvm-svn: 47277
2008-02-18 17:55:26 +00:00
Andrew Lenharth 9b254eed32 llvm.memory.barrier, and impl for x86 and alpha
llvm-svn: 47204
2008-02-16 01:24:58 +00:00
Evan Cheng 6edbbe0c25 This test is not interesting.
llvm-svn: 47189
2008-02-15 23:06:21 +00:00
Chris Lattner 558a3ba17f Fix a miscompilation from Dan's recent apintification.
llvm-svn: 47128
2008-02-14 18:48:56 +00:00
Chris Lattner 3bd37f549a This readme entry is done, testcase here: CodeGen/X86/zero-remat.ll
llvm-svn: 47106
2008-02-14 05:39:46 +00:00
Evan Cheng a4621f04bb Fix test.
llvm-svn: 47102
2008-02-14 01:32:53 +00:00
Chris Lattner a08af08a88 In SDISel, for targets that support FORMAL_ARGUMENTS nodes, lower this
node as soon as we create it in SDISel.  Previously we would lower it in
legalize.  The problem with this is that it only exposes the argument
loads implied by FORMAL_ARGUMENTs after legalize, so that only dag combine 2
can hack on them.  This causes us to miss some optimizations because 
datatype expansion also happens here.

Exposing the loads early allows us to do optimizations on them.  For example
we now compile arg-cast.ll to:

_foo:
	movl	$2147483647, %eax
	andl	8(%esp), %eax
	ret

where we previously produced:

_foo:
	subl	$12, %esp
	movsd	16(%esp), %xmm0
	movsd	%xmm0, (%esp)
	movl	$2147483647, %eax
	andl	4(%esp), %eax
	addl	$12, %esp
	ret

It might also make sense to do this for ISD::CALL nodes, which have implicit
stores on many targets.

llvm-svn: 47054
2008-02-13 07:39:09 +00:00
Evan Cheng ea8530d82c New tests.
llvm-svn: 47047
2008-02-13 03:23:53 +00:00
Evan Cheng 724029151b Don't mask the isel bug.
llvm-svn: 47018
2008-02-12 19:11:29 +00:00
Evan Cheng 3069a26f63 This test assumes no SSE4.1.
llvm-svn: 47017
2008-02-12 19:11:08 +00:00
Evan Cheng b21301fbe7 Fix some test cases.
llvm-svn: 46998
2008-02-12 07:22:46 +00:00
Dale Johannesen 43a2ed8611 Alignment of struct containing vectors depends on
whether SSE is present, on Darwin anyway.  Make it
explicit.

llvm-svn: 46909
2008-02-09 19:04:25 +00:00
Evan Cheng 3b3286d4bc It's not always safe to fold movsd into xorpd, etc. Check the alignment of the load address first to make sure it's 16 byte aligned.
llvm-svn: 46893
2008-02-08 21:20:40 +00:00
Evan Cheng 8d59dd119b Added missing entries in X86 load / store folding tables.
llvm-svn: 46866
2008-02-08 00:12:56 +00:00
Evan Cheng a20a773654 Fix a x86-64 codegen deficiency. Allow gv + offset when using rip addressing mode.
Before:
_main:
        subq    $8, %rsp
        leaq    _X(%rip), %rax
        movsd   8(%rax), %xmm1
        movss   _X(%rip), %xmm0
        call    _t
        xorl    %ecx, %ecx
        movl    %ecx, %eax
        addq    $8, %rsp
        ret
Now:
_main:
        subq    $8, %rsp
        movsd   _X+8(%rip), %xmm1
        movss   _X(%rip), %xmm0
        call    _t
        xorl    %ecx, %ecx
        movl    %ecx, %eax
        addq    $8, %rsp
        ret

Notice there is another idiotic codegen issue that needs to be fixed asap:
xorl    %ecx, %ecx
movl    %ecx, %eax

llvm-svn: 46850
2008-02-07 08:53:49 +00:00
Evan Cheng 87fbd66f9f Fix PR1975: dag isel emitter produces patterns that isel wrong flag result.
llvm-svn: 46776
2008-02-05 22:50:29 +00:00
Chris Lattner f4e5e556fd Add target triples to these so they don't fail on linux.
llvm-svn: 46496
2008-01-29 06:26:07 +00:00
Chris Lattner 888560d62c Implement some dag combines that allow doing fneg/fabs/fcopysign in integer
registers if used by a bitconvert or using a bitconvert.  This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway.  For example, we now 
compile CodeGen/X86/fp-in-intregs.ll to:

_test1:
	movl	$2147483648, %eax
	xorl	4(%esp), %eax
	ret
_test2:
	movl	$1065353216, %eax
	orl	4(%esp), %eax
	andl	$3212836864, %eax
	ret

Instead of:
_test1:
	movss	4(%esp), %xmm0
	xorps	LCPI2_0, %xmm0
	movd	%xmm0, %eax
	ret
_test2:
	movss	4(%esp), %xmm0
	andps	LCPI3_0, %xmm0
	movss	LCPI3_1, %xmm1
	andps	LCPI3_2, %xmm1
	orps	%xmm0, %xmm1
	movd	%xmm1, %eax
	ret

bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.

llvm-svn: 46414
2008-01-27 17:42:27 +00:00
Chris Lattner 596704405f New test to verify that "merging 4 loads into a vec load" continues to work and
continues to infer alignment info.

llvm-svn: 46403
2008-01-26 20:06:45 +00:00
Chris Lattner e30e33af4f Infer alignment of loads and increase their alignment when we can tell they are
from the stack.  This allows us to compile stack-align.ll to:

_test:
	movsd	LCPI1_0, %xmm0
	movapd	%xmm0, %xmm1
***	andpd	4(%esp), %xmm1
	andpd	_G, %xmm0
	addsd	%xmm1, %xmm0
	movl	20(%esp), %eax
	movsd	%xmm0, (%eax)
	ret

instead of:

_test:
	movsd	LCPI1_0, %xmm0
**	movsd	4(%esp), %xmm1
**	andpd	%xmm0, %xmm1
	andpd	_G, %xmm0
	addsd	%xmm1, %xmm0
	movl	20(%esp), %eax
	movsd	%xmm0, (%eax)
	ret

llvm-svn: 46401
2008-01-26 19:45:50 +00:00
Chris Lattner 364963d41c remove a useless xfailed test.
llvm-svn: 46400
2008-01-26 19:35:46 +00:00
Bill Wendling 1a17ef02c8 If there's no instructions being emitted on X86 for a function, emit a
nop. Emit the nop directly for PPC.

llvm-svn: 46398
2008-01-26 09:03:52 +00:00
Chris Lattner 84ab724e06 Add target-specific dag combines for FAND(x,0) and FOR(x,0). This allows
us to compile:

double test(double X) {
  return copysign(0.0, X);
}

into:

_test:
	andpd	LCPI1_0(%rip), %xmm0
	ret

instead of:
_test:
	pxor	%xmm1, %xmm1
	andpd	LCPI1_0(%rip), %xmm1
	movapd	%xmm0, %xmm2
	andpd	LCPI1_1(%rip), %xmm2
	movapd	%xmm1, %xmm0
	orpd	%xmm2, %xmm0
	ret

llvm-svn: 46344
2008-01-25 05:46:26 +00:00
Chris Lattner a91f77eaac Significantly simplify and improve handling of FP function results on x86-32.
This case returns the value in ST(0) and then has to convert it to an SSE
register.  This causes significant codegen ugliness in some cases.  For 
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:

_bar:
	subl	$28, %esp
	call	L_foo$stub
	fstpl	16(%esp)
	movsd	16(%esp), %xmm0
	movsd	%xmm0, 8(%esp)
	fldl	8(%esp)
	addl	$28, %esp
	ret

because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.

Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always 
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.

This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case.  This gives 
us this code for the example above:

_bar:
	subl	$12, %esp
	call	L_foo$stub
	addl	$12, %esp
	ret

The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert).  This is gross, but
less gross than the code it is replacing :)

This also allows us to generate better code in several other cases.  For 
example on fp-stack-ret-conv.ll, we now generate:

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstps	8(%esp)
	movl	16(%esp), %eax
	cvtss2sd	8(%esp), %xmm0
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

where before we produced (incidentally, the old bad code is identical to what
gcc produces):

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstpl	(%esp)
	cvtsd2ss	(%esp), %xmm0
	cvtss2sd	%xmm0, %xmm0
	movl	16(%esp), %eax
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

Note that we generate slightly worse code on pr1505b.ll due to a scheduling 
deficiency that is unrelated to this patch.

llvm-svn: 46307
2008-01-24 08:07:48 +00:00
Chris Lattner 001d781c41 take these with a pr #
llvm-svn: 46303
2008-01-24 06:35:44 +00:00
Evan Cheng 35abd840a6 Let each target decide byval alignment. For X86, it's 4-byte unless the aggregare contains SSE vector(s). For x86-64, it's max of 8 or alignment of the type.
llvm-svn: 46286
2008-01-23 23:17:41 +00:00
Evan Cheng 1e0d4d2aa8 SSE varargs arguments are passed in memory.
llvm-svn: 46262
2008-01-22 23:26:53 +00:00
Dale Johannesen 4768c3c9b6 Test is correct again for the moment.
llvm-svn: 46172
2008-01-18 19:53:31 +00:00
Chris Lattner 1ea55cf816 This commit changes:
1. Legalize now always promotes truncstore of i1 to i8. 
2. Remove patterns and gunk related to truncstore i1 from targets.
3. Rename the StoreXAction stuff to TruncStoreAction in TLI.
4. Make the TLI TruncStoreAction table a 2d table to handle from/to conversions.
5. Mark a wide variety of invalid truncstores as such in various targets, e.g.
   X86 currently doesn't support truncstore of any of its integer types.
6. Add legalize support for truncstores with invalid value input types.
7. Add a dag combine transform to turn store(truncate) into truncstore when
   safe.

The later allows us to compile CodeGen/X86/storetrunc-fp.ll to:

_foo:
	fldt	20(%esp)
	fldt	4(%esp)
	faddp	%st(1)
	movl	36(%esp), %eax
	fstps	(%eax)
	ret

instead of:

_foo:
	subl	$4, %esp
	fldt	24(%esp)
	fldt	8(%esp)
	faddp	%st(1)
	fstps	(%esp)
	movl	40(%esp), %eax
	movss	(%esp), %xmm0
	movss	%xmm0, (%eax)
	addl	$4, %esp
	ret

llvm-svn: 46140
2008-01-17 19:59:44 +00:00
Evan Cheng 54c20b559e When a live virtual register is being clobbered by an implicit def, it is spilled
and the spill is its kill. However, if the local allocator has determined the
register has not been modified (possible when its value was reloaded), it would
not issue a restore. In that case, mark the last use of the virtual register as
kill.

llvm-svn: 46111
2008-01-17 02:08:17 +00:00
Evan Cheng 7be1528004 Fixes a nasty dag combiner bug that causes a bunch of tests to fail at -O0.
It's not safe to use the two value CombineTo variant to combine away a dead load.
e.g. 
v1, chain2 = load chain1, loc
v2, chain3 = load chain2, loc
v3         = add v2, c 
Now we replace use of v1 with undef, use of chain2 with chain1.
ReplaceAllUsesWith() will iterate through uses of the first load and update operands:
v1, chain2 = load chain1, loc
v2, chain3 = load chain1, loc
v3         = add v2, c 
Now the second load is the same as the first load, SelectionDAG cse will ensure
the use of second load is replaced with the first load.
v1, chain2 = load chain1, loc
v3         = add v1, c
Then v1 is replaced with undef and bad things happen.

llvm-svn: 46099
2008-01-16 23:11:54 +00:00
Duncan Sands 32b0ff6814 Trampoline support for x86-64. This looks like
it should work, but I have no machine to test
it on.  Committed because it will at least
cause no harm, and maybe someone can test it
for me!

llvm-svn: 46098
2008-01-16 22:55:25 +00:00
Chris Lattner 6e3379c07b make sure to use a cpu that has sse.
llvm-svn: 46060
2008-01-16 06:32:02 +00:00
Chris Lattner 8f7cec859e My previous commit had an incomplete message, it should have been:
make the 'fp return in ST(0)' optimization smart enough to
look through token factor nodes.  THis allows us to compile 
testcases like CodeGen/X86/fp-stack-retcopy.ll into:

_carg:
	subl	$12, %esp
	call	L_foo$stub
	fstpl	(%esp)
	fldl	(%esp)
	addl	$12, %esp
	ret

instead of:

_carg:
	subl	$28, %esp
	call	L_foo$stub
	fstpl	16(%esp)
	movsd	16(%esp), %xmm0
	movsd	%xmm0, 8(%esp)
	fldl	8(%esp)
	addl	$28, %esp
	ret

Still not optimal, but much better and this is a trivial patch.  Fixing 
the rest requires invasive surgery that is is not llvm 2.2 material.

llvm-svn: 46054
2008-01-16 05:56:59 +00:00
Chris Lattner 915ec14073 verify x86 generates ud2 for llvm.trap
llvm-svn: 46023
2008-01-15 22:22:02 +00:00
Dale Johannesen 04b99780cf Disable for now.
llvm-svn: 45881
2008-01-11 20:47:33 +00:00
Duncan Sands 53c954fa86 Output sinl for a long double FSIN node, not sin.
Likewise fix up a bunch of other libcalls.  While
there I remove NEG_F32 and NEG_F64 since they are
not used anywhere.  This fixes 9 Ada ACATS failures.

llvm-svn: 45833
2008-01-10 10:28:30 +00:00
Evan Cheng 0f8c7c4a73 Codegen improvement has reduced one spill.
llvm-svn: 45814
2008-01-10 02:54:40 +00:00
Evan Cheng 0e400d4cb7 Special copy SUnit's do not have SDNode's.
llvm-svn: 45787
2008-01-09 23:01:55 +00:00
Evan Cheng a31824a08e Fix sse2.psrl.w and sse2.psrl.q definitions.
llvm-svn: 45772
2008-01-09 02:16:44 +00:00
Chris Lattner 51b01bf8a5 Make load->store deletion a bit smarter. This allows us to compile this:
void test(long long *P) { *P ^= 1; }

into just:

_test:
	movl	4(%esp), %eax
	xorl	$1, (%eax)
	ret

instead of code like this:

_test:
	movl	4(%esp), %ecx
        xorl    $1, (%ecx)
	movl	4(%ecx), %edx
	movl	%edx, 4(%ecx)
	ret

llvm-svn: 45762
2008-01-08 23:08:06 +00:00
Duncan Sands 7b1460cca4 Crashes llc when using Chris's new legalization logic.
llvm-svn: 45758
2008-01-08 21:51:53 +00:00
Nate Begeman d3d49df3f1 Update test to catch recent x86 insert regression and improvements
llvm-svn: 45705
2008-01-07 17:49:23 +00:00
Chris Lattner 41e423a6f5 fix this to use a valid triple.
llvm-svn: 45509
2008-01-02 22:21:45 +00:00
Chris Lattner 5d998c5712 verify that aligned common support doesn't break.
llvm-svn: 45495
2008-01-02 19:48:24 +00:00
Chris Lattner d2b8a36f0e One readme entry is done, one is really easy (Evan, want to investigate
eliminating the llvm.x86.sse2.loadl.pd intrinsic?), one shuffle optzn
may be done (if shufps is better than pinsw, Evan, please review), and
we already know about LICM of simple instructions.

llvm-svn: 45407
2007-12-29 19:31:47 +00:00
Chris Lattner 0d90c8f016 upgrade this test
llvm-svn: 45406
2007-12-29 19:24:06 +00:00
Chris Lattner 3b6a82118b Fold comparisons against a constant nan, and optimize ORD/UNORD
comparisons with a constant.  This allows us to compile isnan to:

_foo:
	fcmpu cr7, f1, f1
	mfcr r2
	rlwinm r3, r2, 0, 31, 31
	blr 

instead of:

LCPI1_0:					;  float
	.space	4
_foo:
	lis r2, ha16(LCPI1_0)
	lfs f0, lo16(LCPI1_0)(r2)
	fcmpu cr7, f1, f0
	mfcr r2
	rlwinm r3, r2, 0, 31, 31
	blr 

llvm-svn: 45405
2007-12-29 08:37:08 +00:00
Chris Lattner 33de0c6e92 this xform is implemented.
llvm-svn: 45404
2007-12-29 08:19:39 +00:00
Chris Lattner 07ccbfa64a Codegen:
as:

_bar:
	pushl	%esi
	subl	$8, %esp
	movl	16(%esp), %esi
	call	L_foo$stub
	fstps	(%esi)
	addl	$8, %esp
	popl	%esi
	#FP_REG_KILL
	ret

instead of:

_bar:
	pushl	%esi
	subl	$8, %esp
	movl	16(%esp), %esi
	call	L_foo$stub
	fstpl	(%esi)
	cvtsd2ss	(%esi), %xmm0
	movss	%xmm0, (%esi)
	addl	$8, %esp
	popl	%esi
	#FP_REG_KILL
	ret

llvm-svn: 45401
2007-12-29 06:57:38 +00:00
Chris Lattner 8013bd339b avoid going through a stack slot to convert from fpstack to xmm reg
if we are just going to store it back anyway.  This improves things 
like:
double foo();
void bar(double *P) { *P = foo(); }

llvm-svn: 45399
2007-12-29 06:41:28 +00:00
Chris Lattner bc13df19a8 one fewer uncond branch with my codegenprepare hack for single-mbb backedges.
llvm-svn: 45360
2007-12-26 17:23:47 +00:00
Evan Cheng 483a969ece Fix PR1872: SrcValue and SrcValueOffset should not be used to compute load / store node id.
llvm-svn: 45167
2007-12-18 19:38:14 +00:00
Evan Cheng 91e0fc9cb4 FIX for PR1799: When a load is unfolded from an instruction, check if it is a new node. If not, do not create a new SUnit.
llvm-svn: 45157
2007-12-18 08:42:10 +00:00
Evan Cheng 23d2d4dc6c Make better use of instructions that clear high bits; fix various 2-wide shuffle bugs.
llvm-svn: 45058
2007-12-15 03:00:47 +00:00
Evan Cheng 0e6408124e Fix ctlz and cttz. llvm definition requires them to return number of bits in of the src type when value is zero.
llvm-svn: 45029
2007-12-14 08:30:15 +00:00
Evan Cheng e9fbc3f014 Implement ctlz and cttz with bsr and bsf.
llvm-svn: 45024
2007-12-14 02:13:44 +00:00
Evan Cheng 37c36ed79a Be extra careful with extension use optimation. Now turned on by default.
llvm-svn: 44981
2007-12-13 03:32:53 +00:00
Evan Cheng 827d30db19 Fold some and + shift in x86 addressing mode.
llvm-svn: 44970
2007-12-13 00:43:27 +00:00
Evan Cheng 6e68381e02 Implicit def instructions, e.g. X86::IMPLICIT_DEF_GR32, are always re-materializable and they should not be spilled.
llvm-svn: 44960
2007-12-12 23:12:09 +00:00
Dan Gohman 7a7742c2fe Allow vector integer constants to be created with
SelectionDAG::getConstant, in the same way as vector floating-point
constants. This allows the legalize expansion code for @llvm.ctpop and
friends to be usable with vector types.

llvm-svn: 44954
2007-12-12 22:21:26 +00:00
Evan Cheng 0f42730722 Use shuffles to implement insert_vector_elt for i32, i64, f32, and f64.
llvm-svn: 44929
2007-12-12 07:55:34 +00:00
Evan Cheng 0a1254f634 Add a test case for -optimize-ext-uses.
llvm-svn: 44928
2007-12-12 07:54:08 +00:00
Evan Cheng 2a98956796 Lower a build_vector with all constants into a constpool load unless it can be done with a move to low part.
llvm-svn: 44921
2007-12-12 06:45:40 +00:00
Evan Cheng 4fbf459549 - Improved v8i16 shuffle lowering. It now uses pshuflw and pshufhw as much as
possible before resorting to pextrw and pinsrw.
- Better codegen for v4i32 shuffles masquerading as v8i16 or v16i8 shuffles.
- Improves (i16 extract_vector_element 0) codegen by recognizing
  (i32 extract_vector_element 0) does not require a pextrw.

llvm-svn: 44836
2007-12-11 01:46:18 +00:00
Christopher Lamb d202e03fe5 Improve branch folding by recgonizing that explict successor relationships impact the value of fall-through choices.
llvm-svn: 44785
2007-12-10 07:24:06 +00:00
Evan Cheng bfd373a53e Much improved v8i16 shuffles. (Step 1).
llvm-svn: 44676
2007-12-07 08:07:39 +00:00
Evan Cheng 26593a04db New test case.
llvm-svn: 44672
2007-12-07 01:48:46 +00:00
Evan Cheng 5cb41390ab Fix a bogus test case.
llvm-svn: 44668
2007-12-06 22:12:45 +00:00
Evan Cheng 8393dc7378 Turning simple splitting on. Start testing new coalescer heuristics as new llcbeta.
llvm-svn: 44660
2007-12-06 08:54:31 +00:00
Chris Lattner eedaf92fcf third time around: instead of disabling this completely,
only disable it if we don't know it will be obviously profitable.
Still fixme, but less so. :)

llvm-svn: 44658
2007-12-06 07:47:55 +00:00
Chris Lattner b5fdfb9612 Actually, disable this code for now. More analysis and improvements to
the X86 backend are needed before this should be enabled by default.

llvm-svn: 44657
2007-12-06 07:44:31 +00:00
Chris Lattner 7c709a5d08 implement a readme entry, compiling the code into:
_foo:
	movl	$12, %eax
	andl	4(%esp), %eax
	movl	_array(%eax), %eax
	ret

instead of:

_foo:
	movl	4(%esp), %eax
	shrl	$2, %eax
	andl	$3, %eax
	movl	_array(,%eax,4), %eax
	ret

As it turns out, this triggers all the time, in a wide variety of
situations, for example, I see diffs like this in various programs:

-       movl    8(%eax), %eax
-       shll    $2, %eax
-       andl    $1020, %eax
-       movl    (%esi,%eax), %eax
+       movzbl  8(%eax), %eax
+       movl    (%esi,%eax,4), %eax


-       shll    $2, %edx
-       andl    $1020, %edx
-       movl    (%edi,%edx), %edx
+       andl    $255, %edx
+       movl    (%edi,%edx,4), %edx

Unfortunately, I also see stuff like this, which can be fixed in the
X86 backend:

-       andl    $85, %ebx
-       addl    _bit_count(,%ebx,4), %ebp
+       shll    $2, %ebx
+       andl    $340, %ebx
+       addl    _bit_count(%ebx), %ebp

llvm-svn: 44656
2007-12-06 07:33:36 +00:00
Chris Lattner dfa39289a5 fix this when run on non x86 hosts.
llvm-svn: 44645
2007-12-06 01:05:52 +00:00
Evan Cheng 69fda0a716 Allow some reloads to be folded in multi-use cases. Specifically testl r, r -> cmpl [mem], 0.
llvm-svn: 44479
2007-12-01 02:07:52 +00:00
Evan Cheng b10dc27b20 Do not fold reload into an instruction with multiple uses. It issues one extra load.
llvm-svn: 44467
2007-11-30 21:23:43 +00:00
Dan Gohman f151c8e760 Remove unnecessary && from the RUN lines of this test.
llvm-svn: 44342
2007-11-27 00:03:38 +00:00
Dan Gohman 9a69341725 Don't lower srem/urem X%C to X-X/C*C unless the division is actually
optimized. This avoids creating illegal divisions when the combiner is
running after legalize; this fixes PR1815. Also, it produces better
code in the included testcase by avoiding the subtract and multiply
when the division isn't optimized.

llvm-svn: 44341
2007-11-26 23:46:11 +00:00
Chris Lattner 5728bdd4db Fix a long standing deficiency in the X86 backend: we would
sometimes emit "zero" and "all one" vectors multiple times,
for example:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M2
	ret

instead of:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	movq	%mm0, _M2
	ret

This patch fixes this by always arranging for zero/one vectors
to be defined as v4i32 or v2i32 (SSE/MMX) instead of letting them be
any random type.  This ensures they get trivially CSE'd on the dag.
This fix is also important for LegalizeDAGTypes, as it gets unhappy
when the x86 backend wants BUILD_VECTOR(i64 0) to be legal even when
'i64' isn't legal.

This patch makes the following changes:

1) X86TargetLowering::LowerBUILD_VECTOR now lowers 0/1 vectors into
   their canonical types.
2) The now-dead patterns are removed from the SSE/MMX .td files.
3) All the patterns in the .td file that referred to immAllOnesV or
   immAllZerosV in the wrong form now use *_bc to match them with a
   bitcast wrapped around them.
4) X86DAGToDAGISel::SelectScalarSSELoad is generalized to handle 
   bitcast'd zero vectors, which simplifies the code actually.
5) getShuffleVectorZeroOrUndef is updated to generate a shuffle that
   is legal, instead of generating one that is illegal and expecting
   a later legalize pass to clean it up.
6) isZeroShuffle is generalized to handle bitcast of zeros.
7) several other minor tweaks.

This patch is definite goodness, but has the potential to cause random
code quality regressions.  Please be on the lookout for these and let 
me know if they happen.

llvm-svn: 44310
2007-11-25 00:24:49 +00:00
Chris Lattner f5dfd15e98 upgrade this test
llvm-svn: 44298
2007-11-24 05:39:29 +00:00
Dan Gohman 36347a26f9 Add support in SplitVectorOp for remainder operators.
llvm-svn: 44233
2007-11-19 15:15:03 +00:00
Chris Lattner 861302e264 fix bogus test that the more strict lexer is finding.
llvm-svn: 44216
2007-11-18 18:26:45 +00:00
Evan Cheng 13e8b022f5 Typo.
llvm-svn: 44196
2007-11-16 23:55:08 +00:00
Evan Cheng 2c1a50455c Fix a thinko in post-allocation coalescer.
llvm-svn: 44166
2007-11-15 08:13:29 +00:00
Anton Korobeynikov 2c6387803e Fix PIC jump table codegen on x86-32/linux. In fact, such thing should be applied
to all targets uses GOT-relative offsets for PIC (Alpha?)

llvm-svn: 44108
2007-11-14 09:18:41 +00:00