Commit Graph

331 Commits

Author SHA1 Message Date
Chris Lattner f4b0c99d63 add some missing flags.
llvm-svn: 45859
2008-01-11 06:59:07 +00:00
Chris Lattner 317332fc2a Start inferring side effect information more aggressively, and fix many bugs in the
x86 backend where instructions were not marked maystore/mayload, and perf issues where
instructions were not marked neverHasSideEffects.  It would be really nice if we could
write patterns for copy instructions.

I have audited all the x86 instructions down to MOVDQAmr.  The flags on others and on
other targets are probably not right in all cases, but no clients currently use this
info that are enabled by default.

llvm-svn: 45829
2008-01-10 07:59:24 +00:00
Chris Lattner aca7ca3730 remove explicit sets of 'neverHasSideEffects' that can now be
inferred from the instr patterns.

llvm-svn: 45824
2008-01-10 05:45:39 +00:00
Chris Lattner a4ce4f6987 rename isLoad -> isSimpleLoad due to evan's desire to have such a predicate.
llvm-svn: 45667
2008-01-06 23:38:27 +00:00
Chris Lattner f3ebc3f3d2 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00
Evan Cheng 01c7c198ee Fix JIT encoding for CMPSD as well.
llvm-svn: 45268
2007-12-20 19:57:09 +00:00
Bill Wendling b3d85a5d4b Add "mayHaveSideEffects" and "neverHasSideEffects" flags to some instructions. I
based what flag to set on whether it was already marked as
"isRematerializable". If there was a further check to determine if it's "really"
rematerializable, then I marked it as "mayHaveSideEffects" and created a check
in the X86 back-end similar to the remat one.

llvm-svn: 45132
2007-12-17 23:07:56 +00:00
Chris Lattner dab6bd902e Fix the JIT encoding of cmp*ss, which aborts with this assertion currently:
X86CodeEmitter.cpp:378: failed assertion `0 && "Immediate size not set!"'

I *think* this is right, but Evan, please verify.  It also looks like
CMPSDrr and maybe others are missing this info.  Evan, plz investigate.

llvm-svn: 45074
2007-12-16 20:12:41 +00:00
Evan Cheng 23d2d4dc6c Make better use of instructions that clear high bits; fix various 2-wide shuffle bugs.
llvm-svn: 45058
2007-12-15 03:00:47 +00:00
Evan Cheng 6e68381e02 Implicit def instructions, e.g. X86::IMPLICIT_DEF_GR32, are always re-materializable and they should not be spilled.
llvm-svn: 44960
2007-12-12 23:12:09 +00:00
Evan Cheng c829e5cdf0 Remove a bogus optimization. It's not possible to do a move to low element to a <8 x i16> or <16 x i8> vector.
llvm-svn: 44669
2007-12-06 22:14:22 +00:00
Chris Lattner 5728bdd4db Fix a long standing deficiency in the X86 backend: we would
sometimes emit "zero" and "all one" vectors multiple times,
for example:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M2
	ret

instead of:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	movq	%mm0, _M2
	ret

This patch fixes this by always arranging for zero/one vectors
to be defined as v4i32 or v2i32 (SSE/MMX) instead of letting them be
any random type.  This ensures they get trivially CSE'd on the dag.
This fix is also important for LegalizeDAGTypes, as it gets unhappy
when the x86 backend wants BUILD_VECTOR(i64 0) to be legal even when
'i64' isn't legal.

This patch makes the following changes:

1) X86TargetLowering::LowerBUILD_VECTOR now lowers 0/1 vectors into
   their canonical types.
2) The now-dead patterns are removed from the SSE/MMX .td files.
3) All the patterns in the .td file that referred to immAllOnesV or
   immAllZerosV in the wrong form now use *_bc to match them with a
   bitcast wrapped around them.
4) X86DAGToDAGISel::SelectScalarSSELoad is generalized to handle 
   bitcast'd zero vectors, which simplifies the code actually.
5) getShuffleVectorZeroOrUndef is updated to generate a shuffle that
   is legal, instead of generating one that is illegal and expecting
   a later legalize pass to clean it up.
6) isZeroShuffle is generalized to handle bitcast of zeros.
7) several other minor tweaks.

This patch is definite goodness, but has the potential to cause random
code quality regressions.  Please be on the lookout for these and let 
me know if they happen.

llvm-svn: 44310
2007-11-25 00:24:49 +00:00
Nate Begeman d4d45c268c Add support for vectors to int <-> float casts.
llvm-svn: 44204
2007-11-17 03:58:34 +00:00
Dale Johannesen d50c8bcef6 Add missing SSE builtins: CVTPD2PI, CVTPS2PI,
CVTTPD2PI, CVTTPS2PI, CVTPI2PD, CVTPI2PS.

llvm-svn: 43523
2007-10-30 22:15:38 +00:00
Arnold Schwaighofer 1f0da1fefb Corrected many typing errors. And removed 'nest' parameter handling
for fastcc from X86CallingConv.td.  This means that nested functions
are not supported for calling convention 'fastcc'.

llvm-svn: 42934
2007-10-12 21:30:57 +00:00
Dale Johannesen 62f65edc32 Add missing argument to PALIGNR
llvm-svn: 42874
2007-10-11 20:58:37 +00:00
Evan Cheng f4b5d491df Added DAG xforms. e.g.
(vextract (v4f32 s2v (f32 load $addr)), 0) -> (f32 load $addr) 
(vextract (v4i32 bc (v4f32 s2v (f32 load $addr))), 0) -> (i32 load $addr)
Remove x86 specific patterns.

llvm-svn: 42677
2007-10-06 02:46:29 +00:00
Evan Cheng a1b7e95039 Typo. X86comi doesn't read / write chain's.
llvm-svn: 42492
2007-10-01 18:12:48 +00:00
Evan Cheng 5fb5a1f389 Enabling new condition code modeling scheme.
llvm-svn: 42459
2007-09-29 00:00:36 +00:00
Evan Cheng e95f391ef1 Added support for new condition code modeling scheme (i.e. physical register dependency). These are a bunch of instructions that are duplicated so the x86 backend can support both the old and new schemes at the same time. They will be deleted after
all the kinks are worked out.

llvm-svn: 42285
2007-09-25 01:57:46 +00:00
Dale Johannesen e36c400255 Fix PR 1681. When X86 target uses +sse -sse2,
keep f32 in SSE registers and f64 in x87.  This
is effectively a new codegen mode.
Change addLegalFPImmediate to permit float and
double variants to do different things.
Adjust callers.

llvm-svn: 42246
2007-09-23 14:52:20 +00:00
Evan Cheng 483e1ce16e Add implicit def of EFLAGS on those instructions that may modify flags.
llvm-svn: 41962
2007-09-14 21:48:26 +00:00
Evan Cheng 3e18e504ae Remove (somewhat confusing) Imp<> helper, use let Defs = [], Uses = [] instead.
llvm-svn: 41863
2007-09-11 19:55:27 +00:00
Dan Gohman a95cbb0007 Avoid storing and reloading zeros and other constants from stack slots
by flagging the associated instructions as being trivially rematerializable.

llvm-svn: 41775
2007-09-07 21:32:51 +00:00
Evan Cheng c2081fe573 Mark load instructions with isLoad = 1.
llvm-svn: 41595
2007-08-30 05:49:43 +00:00
Bill Wendling cdbd82ee37 64-bit SSSE3 ops that use MMX registers don't require 16-byte alignment.
Make a 'memop' pattern just for them.

llvm-svn: 41017
2007-08-11 09:52:53 +00:00
Bill Wendling 7014615087 For kicks, I though it would be fun to use the correct opcode.
llvm-svn: 40985
2007-08-10 09:00:17 +00:00
Bill Wendling 2377206923 Adding SSSE3 intrinsics.
llvm-svn: 40982
2007-08-10 06:22:27 +00:00
Dan Gohman 8932bff7fe Fix the alignment requirements of several unpck and shuf instructions.
Generalize isPSHUFDMask and add a unary SHUFPD pattern so that SHUFPD's
memory operand alignment can be tested as well, with a fix to avoid
breaking MMX's use of isPSHUFDMask.

llvm-svn: 40756
2007-08-02 21:17:01 +00:00
Dan Gohman 4d436e2b7d Fix pastos in vector arithmetic intrinsics.
llvm-svn: 40754
2007-08-02 21:06:40 +00:00
Dan Gohman fa3eeeedc0 Mark the SSE and MMX load instructions that
X86InstrInfo::isReallyTriviallyReMaterializable knows how to handle
with the isReMaterializable flag so that it is given a chance to handle
them. Without hoisting constant-pool loads from loops this isn't very
visible, though it does keep CodeGen/X86/constant-pool-remat-0.ll from
making a copy of the constant pool on the stack.

llvm-svn: 40736
2007-08-02 14:27:55 +00:00
Evan Cheng da549ece5c Missing Requires.
llvm-svn: 40691
2007-08-01 21:42:24 +00:00
Dan Gohman 54ec4bfa5f Change the x86 assembly output to use tab characters to separate the
mnemonics from their operands instead of single spaces. This makes the
assembly output a little more consistent with various other compilers
(f.e. GCC), and slightly easier to read. Also, update the regression
tests accordingly.

llvm-svn: 40648
2007-07-31 20:11:57 +00:00
Evan Cheng 12c6be84ff Redo and generalize previously removed opt for pinsrw: (vextract (v4i32 bc (v4f32 s2v (f32 load ))), 0) -> (i32 load )
llvm-svn: 40628
2007-07-31 08:04:03 +00:00
Dan Gohman 4788552deb Re-apply 40504, but with a fix for the segfault it caused in oggenc:
Make the alignedload and alignedstore patterns always require 16-byte
alignment. This way when they are used in the "Fs" instructions, in which
a vector instruction is used for a scalar purpose, they can still require
the full vector alignment. And add a regression test for this.

llvm-svn: 40555
2007-07-27 17:16:43 +00:00
Evan Cheng 931de40afa Reverting 40504 for now. It's breaking oggenc.
llvm-svn: 40547
2007-07-27 01:37:47 +00:00
Dan Gohman cecd4b3793 Fix a whitespace difference between CMPSSrr and CMPSDrr.
llvm-svn: 40528
2007-07-26 15:11:50 +00:00
Dan Gohman 8455bd3fae Remove X86ISD::LOAD_PACK and X86ISD::LOAD_UA and associated code from the
x86 target, replacing them with the new alignment attributes on memory
references.

llvm-svn: 40504
2007-07-26 00:31:09 +00:00
Evan Cheng 8fefeffb37 Because we promote SSE logical ops and loads to v2i64, we often end up generate
code that cross integer / floating point domains (e.g. generate pxor / pand for
logical ops on floating point value, movdqa to load / store floating point SSE
values). Given that, it's better to use movaps instead of movdqa and movups
instead of movdqu. They have the same latency but the "aps" variants are one
byte shorter.
If the domain crossing problem is a real performance issue, then we will have to
fix it with dynamic programming based isel.

llvm-svn: 40076
2007-07-20 00:27:43 +00:00
Evan Cheng 7ca3555bfa Fix patterns so we isel the xorps, etc. for floating pt logical SSE ops. DAG combiner may fold away the (bit_convert (load)).
llvm-svn: 40070
2007-07-19 23:34:10 +00:00
Evan Cheng 94b5a80b93 Change instruction description to split OperandList into OutOperandList and
InOperandList. This gives one piece of important information: # of results
produced by an instruction.
An example of the change:
def ADD32rr  : I<0x01, MRMDestReg, (ops GR32:$dst, GR32:$src1, GR32:$src2),
                 "add{l} {$src2, $dst|$dst, $src2}",
                 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
=>
def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
                 "add{l} {$src2, $dst|$dst, $src2}",
                 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;

llvm-svn: 40033
2007-07-19 01:14:50 +00:00
Dan Gohman 776962a97a Implement initial memory alignment awareness for SSE instructions. Vector loads
and stores that have a specified alignment of less than 16 bytes now use
instructions that support misaligned memory references.

llvm-svn: 40015
2007-07-18 20:23:34 +00:00
Dan Gohman 57111e7a60 Define non-intrinsic instructions for vector min, max, sqrt, rsqrt, and rcp,
in addition to the intrinsic forms. Add spill-folding entries for these new
instructions, and for the scalar min and max instrinsic instructions which
were missing. And add some preliminary ISelLowering code for using the new
non-intrinsic vector sqrt instruction, and fneg and fabs.

llvm-svn: 38478
2007-07-10 00:05:58 +00:00
Dale Johannesen a2b3c175db Fix for PR 1505 (and 1489). Rewrite X87 register
model to include f32 variants.  Some factoring
improvments forthcoming.

llvm-svn: 37847
2007-07-03 00:53:03 +00:00
Dan Gohman e8c1e428f2 Revert the earlier change that removed the M_REMATERIALIZABLE machine
instruction flag, and use the flag along with a virtual member function
hook for targets to override if there are instructions that are only
trivially rematerializable with specific operands (i.e. constant pool
loads).

llvm-svn: 37728
2007-06-26 00:48:07 +00:00
Dan Gohman 2e84e3f7b7 Make minor adjustments to whitespace and comments to reduce differences
between SSE1 instructions and their respective SSE2 analogues.

llvm-svn: 37718
2007-06-25 15:44:19 +00:00
Dan Gohman 33209bd6b8 Fix loadv2i32 to be loadv4i32, though it isn't actually used anywhere yet.
llvm-svn: 37717
2007-06-25 15:19:03 +00:00
Dan Gohman 9e82064924 Replace M_REMATERIALIZIBLE and the newly-added isOtherReMaterializableLoad
with a general target hook to identify rematerializable instructions. Some
instructions are only rematerializable with specific operands, such as loads
from constant pools, while others are always rematerializable. This hook
allows both to be identified as being rematerializable with the same
mechanism.

llvm-svn: 37644
2007-06-19 01:48:05 +00:00
Evan Cheng 632c3f01ed Added missing patterns for UNPCKH* and PUNPCKH*.
llvm-svn: 37172
2007-05-17 18:44:37 +00:00
Bill Wendling b5ce7c5466 Non-algorithmic change. Moved definitions around into separate sections
for SSE1, SSE2, SSE3, and SSSE3.

llvm-svn: 36656
2007-05-02 23:11:52 +00:00
Dan Gohman 29845cd40d Fix the spelling of the prefetchnta instruction.
llvm-svn: 36256
2007-04-18 14:09:14 +00:00
Bill Wendling f099841573 Add support for our first SSSE3 instruction "pmulhrsw".
llvm-svn: 35869
2007-04-10 22:10:25 +00:00
Evan Cheng 61eee86487 Mark re-materializable instructions.
llvm-svn: 35230
2007-03-21 00:16:56 +00:00
Chris Lattner d647f92a14 add missing braces
llvm-svn: 34905
2007-03-04 06:13:52 +00:00
Evan Cheng b68a774fd9 How the heck did I forget patterns for llvm.x86.sse2.cmp.sd?
llvm-svn: 34434
2007-02-20 00:39:09 +00:00
Evan Cheng 82241c86e9 - FCOPYSIGN custom lowering bug. Clear the sign bit of operand 0 first before
or'ing in the sign bit of operand 1.
- Tweaking: rather than left shift the sign bit, fp_extend operand 1 first
  before taking its sign bit if its type is smaller than that of operand 0.

llvm-svn: 32932
2007-01-05 21:37:56 +00:00
Evan Cheng 4363e884c0 With SSE2, expand FCOPYSIGN to a series of SSE bitwise operations.
llvm-svn: 32900
2007-01-05 07:55:56 +00:00
Evan Cheng fccea9b2a9 - Rename MOVDSS2DIrr to MOVSS2DIrr for consistency sake.
- Add MOVDI2SSrm and MOVSS2DImr to fold load / store for i32 <-> f32 bit_convert
  patterns.

llvm-svn: 32582
2006-12-14 19:43:11 +00:00
Chris Lattner c20b7e878a If we have ScalarSSE, we can select bitconvert into single instructions.
This compiles bitcast.ll:test3/test4 into:

_test3:
        movd %xmm0, %eax
        ret
_test4:
        movd %edi, %xmm0
        ret

llvm-svn: 32230
2006-12-05 18:45:06 +00:00
Evan Cheng 572dc9cb4e Correct instructions for moving data between GR64 and SSE registers; also correct load i64 / store i64 from v2i64.
llvm-svn: 31795
2006-11-16 23:33:25 +00:00
Evan Cheng 49683ba236 Don't dag combine floating point select to max and min intrinsics. Those
take v4f32 / v2f64 operands and may end up causing larger spills / restores.
Added X86 specific nodes X86ISD::FMAX, X86ISD::FMIN instead.

This fixes PR996.

llvm-svn: 31645
2006-11-10 21:43:37 +00:00
Evan Cheng 922e191116 Fixed a bug which causes x86 be to incorrectly match
shuffle v, undef, <2, ?, 3, ?>
to movhlps
It should match to unpckhps instead.

Added proper matching code for
shuffle v, undef, <2, 3, 2, 3>

llvm-svn: 31519
2006-11-07 22:14:24 +00:00
Chris Lattner 9ac6442db6 remove dead/redundant vars
llvm-svn: 31435
2006-11-03 23:48:56 +00:00
Evan Cheng 94e5bc9e83 Fix ldmxcsr JIT encoding.
llvm-svn: 31343
2006-11-01 06:53:52 +00:00
Evan Cheng e056dd5928 Fixed a significant bug where unpcklpd is incorrectly used to extract element 1 from a v2f64 value.
llvm-svn: 31228
2006-10-27 21:08:32 +00:00
Evan Cheng 7e065ff7e4 X86ISD::PEXTRW 3rd operand type is always target pointer type.
llvm-svn: 31185
2006-10-25 21:35:05 +00:00
Evan Cheng 4090dc4703 ComplexPatterns sse_load_f32 and sse_load_f64 returns in / out chain operands.
llvm-svn: 30892
2006-10-11 21:06:01 +00:00
Evan Cheng 57ccb6d372 Don't go too crazy with these AddComplexity. Try matching shufps with load
folding first.

llvm-svn: 30848
2006-10-09 21:42:15 +00:00
Evan Cheng e71fe34d75 Reflects ISD::LOAD / ISD::LOADX / LoadSDNode changes.
llvm-svn: 30844
2006-10-09 20:57:25 +00:00
Chris Lattner 398195ebbe completely disable folding of loads into scalar sse instructions and provide
a framework for doing it right.  This fixes
CodeGen/X86/2006-10-07-ScalarSSEMiscompile.ll.

Once X86DAGToDAGISel::SelectScalarSSELoad is implemented right, this task
will be done.

llvm-svn: 30817
2006-10-07 21:55:32 +00:00
Chris Lattner 942009fee5 convert packed FP add/sub/mul/div to use a multiclass.
llvm-svn: 30815
2006-10-07 21:17:13 +00:00
Chris Lattner 4005f4e49c one multiclass now defines all 8 variants of binary-scalar-sse-fp operations.
llvm-svn: 30814
2006-10-07 20:55:57 +00:00
Chris Lattner 6eaee2c8e3 Switch ADD/MUL/DIV/SUB scalarsse fp ops to a multiclass
llvm-svn: 30813
2006-10-07 20:35:44 +00:00
Chris Lattner c8c6441821 Random acts of shrinkage
llvm-svn: 30812
2006-10-07 19:49:05 +00:00
Chris Lattner b5df7e554d Convert pand/por/pxor to use multiclass
llvm-svn: 30811
2006-10-07 19:37:30 +00:00
Chris Lattner 6138cba5f1 Convert some more instructions over to use a new multiclass.
Fix a bug where the asmstring for PSUBQrm was wrong.

llvm-svn: 30810
2006-10-07 19:34:33 +00:00
Chris Lattner 662ba43f08 Fix a bug where PADDQrm printed paddd instead of paddq.
llvm-svn: 30809
2006-10-07 19:15:46 +00:00
Chris Lattner 29c62a3c88 Add multiclass for SSE2 instructions that correspond to simple binops.
llvm-svn: 30808
2006-10-07 19:14:49 +00:00
Chris Lattner e0928d9d7b rename:
PDI_binop_rm -> PDI_binop_rm_int
  PDI_binop_rmi -> PDI_binop_rmi_int

to make it clear that these are for use with intrinsics.

llvm-svn: 30807
2006-10-07 19:02:31 +00:00
Chris Lattner 489b63089d Convert saturating PADD/PSUB's to use a multiclass
llvm-svn: 30806
2006-10-07 18:48:46 +00:00
Chris Lattner fa2ce8824d Convert PAVG*, PMADDWD, and PMUL* to use multiclasses.
llvm-svn: 30805
2006-10-07 18:39:00 +00:00
Chris Lattner cab92e4c0c Fix typo in packsswb instr definition, where the load had the wrong type.
This allows us to use the multiclass for other packs.

llvm-svn: 30804
2006-10-07 18:23:58 +00:00
Chris Lattner e746a9cd6a handle pmin/pmax with multiclasses
llvm-svn: 30800
2006-10-07 07:49:33 +00:00
Chris Lattner b14e6a0f8c simplify pack and shift intrinsics with multiclasses
llvm-svn: 30797
2006-10-07 07:06:17 +00:00
Chris Lattner 521fc4e33f Use a multiclass to simplify 'SSE2 Integer comparison'
llvm-svn: 30796
2006-10-07 06:47:08 +00:00
Chris Lattner c6138cec61 move class defns close to uses to make it easier to read
llvm-svn: 30795
2006-10-07 06:33:36 +00:00
Chris Lattner 87e692323c simplify horizontal op definitions
llvm-svn: 30794
2006-10-07 06:31:41 +00:00
Chris Lattner 3e9fc37458 remove more unneeded type info
llvm-svn: 30793
2006-10-07 06:27:03 +00:00
Chris Lattner 807be0a715 remove unneeded definitions and type info
llvm-svn: 30792
2006-10-07 06:19:41 +00:00
Chris Lattner 5b1358a8eb remove some unneeded type info
llvm-svn: 30791
2006-10-07 06:17:43 +00:00
Chris Lattner 3414c022af simplify patterns by merging in operand info
llvm-svn: 30790
2006-10-07 05:50:25 +00:00
Chris Lattner ca21ce5f08 Factor operands into packed unary classes
llvm-svn: 30789
2006-10-07 05:47:20 +00:00
Chris Lattner 0052c3ff5b remove dead/duplicate instructions
llvm-svn: 30788
2006-10-07 05:41:52 +00:00
Chris Lattner 904c6e9c92 Pull operand info up into parent class for scalar sse intrinsics.
llvm-svn: 30787
2006-10-07 05:26:13 +00:00
Chris Lattner e698c90ee9 convert the sole sd unary intrinsic to a multiclass for consistency
llvm-svn: 30786
2006-10-07 05:19:31 +00:00
Chris Lattner 2bb2f050f5 pull operand string into the multiclass
llvm-svn: 30785
2006-10-07 05:13:26 +00:00
Chris Lattner 069679c7b6 Remove RSQRTSS[rm] RCPSS[rm], which are dead.
Introduce SS_IntUnary, a multiclass to replace SS_Int[rm].

llvm-svn: 30784
2006-10-07 05:09:48 +00:00
Chris Lattner f13a7b376c eliminate redundancy
llvm-svn: 30783
2006-10-07 04:52:09 +00:00
Evan Cheng a36e6cf44f These don't have immediate operands.
llvm-svn: 30694
2006-10-03 06:55:11 +00:00
Evan Cheng 4259a0f654 X86ISD::CMP now produces a chain as well as a flag. Make that the chain
operand of a conditional branch to allow load folding into CMP / TEST
instructions.

llvm-svn: 30241
2006-09-11 02:19:56 +00:00
Evan Cheng 17c28b2e0e JIT encoding bug.
llvm-svn: 30112
2006-09-05 05:59:25 +00:00
Evan Cheng 66ed41cac1 Can't commute shufps. The high / low parts elements come from different vectors.
llvm-svn: 29275
2006-07-25 20:25:40 +00:00
Evan Cheng 5987cfb7b1 X86 target specific DAG combine: turn build_vector (load x), (load x+4),
(load x+8), (load x+12), <0, 1, 2, 3> to a single 128-bit load (aligned and
unaligned).

e.g.

__m128 test(float a, float b, float c, float d) {
  return _mm_set_ps(d, c, b, a);
}

_test:
        movups 4(%esp), %xmm0
        ret

llvm-svn: 29042
2006-07-07 08:33:52 +00:00
Evan Cheng 390922f979 Should just use xorps to clear XMM registers for all data types. pxor is also one byte longer.
llvm-svn: 28984
2006-06-29 18:04:54 +00:00
Evan Cheng fc8cdda070 Always use xorps to clear XMM registers.
llvm-svn: 28979
2006-06-29 00:34:23 +00:00
Chris Lattner dbec49d574 Remove some ugly now-redundant casts.
llvm-svn: 28864
2006-06-20 00:25:29 +00:00
Chris Lattner 55594634d7 Fix some mismatched type constraints
llvm-svn: 28862
2006-06-20 00:12:37 +00:00
Evan Cheng cd58e9d8b9 Minor clean up.
llvm-svn: 28860
2006-06-19 19:25:30 +00:00
Evan Cheng de7156f12c Type of vector extract / insert index operand should be iPTR.
llvm-svn: 28796
2006-06-15 08:14:54 +00:00
Evan Cheng 25e44e008d Rename instructions for consistency sake.
llvm-svn: 28594
2006-05-31 19:00:07 +00:00
Evan Cheng 8abf45e22d Select vector_shuffle v1, undef <2, 3, ?, ?> to MOVHLPS.
llvm-svn: 28582
2006-05-31 00:51:37 +00:00
Evan Cheng 57399704b3 MAXP{D|S} and MINP{D|S} are commutable.
llvm-svn: 28578
2006-05-30 23:47:30 +00:00
Evan Cheng c0f90bef47 Commute shufps / shufpd.
llvm-svn: 28577
2006-05-30 23:34:30 +00:00
Evan Cheng 66f849bd7b Allow shufps x, x, mask to be converted to pshufd x, mask to save a move.
llvm-svn: 28565
2006-05-30 20:26:50 +00:00
Evan Cheng 9fee442e63 X86 integer register classes naming changes. Make them consistent with FP, vector classes.
llvm-svn: 28324
2006-05-16 07:21:53 +00:00
Chris Lattner 44a73e9fa5 Teach the code generator to use cvtss2sd as extload f32 -> f64
llvm-svn: 28131
2006-05-05 21:35:18 +00:00
Evan Cheng 8b1cde2bbe Use movsd to shuffle in the lowest two elements of a v4f32 / v4i32 vector when
movlps cannot be used (e.g. when load from m64 has multiple uses).

llvm-svn: 28089
2006-05-03 20:32:03 +00:00
Evan Cheng 4cc3e0b05f Fix a typo.
llvm-svn: 27968
2006-04-25 17:48:41 +00:00
Evan Cheng fb46b2bf5d Explicitly specify result type for def : Pat<> patterns (if it produces a vector
result). Otherwise tblgen will pick the default (v16i8 for 128-bit vector).

llvm-svn: 27965
2006-04-25 00:50:01 +00:00
Evan Cheng 25b09295f8 Added X86 SSE2 intrinsics which can be represented as vector_shuffles. This is
a temporary workaround for the 2-wide vector_shuffle problem (i.e. its mask
would have type v2i32 which is not legal).

llvm-svn: 27964
2006-04-24 23:34:56 +00:00
Evan Cheng 63bd4d3730 Some missing movlps, movhps, movlpd, and movhpd patterns.
llvm-svn: 27960
2006-04-24 21:58:20 +00:00
Evan Cheng e8b5180044 Now generating perfect (I think) code for "vector set" with a single non-zero
scalar value.

e.g.
        _mm_set_epi32(0, a, 0, 0);
==>
	movd 4(%esp), %xmm0
	pshufd $69, %xmm0, %xmm0

        _mm_set_epi8(0, 0, 0, 0, 0, a, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
==>
	movzbw 4(%esp), %ax
	movzwl %ax, %eax
	pxor %xmm0, %xmm0
	pinsrw $5, %eax, %xmm0

llvm-svn: 27923
2006-04-21 01:05:10 +00:00
Evan Cheng 6d5297dac3 Prefer {p}unpack* and mov*dup over {p}shuf* as well.
llvm-svn: 27844
2006-04-19 21:15:24 +00:00
Evan Cheng b416a25174 - Renamed AddedCost to AddedComplexity.
- Added more movhlps and movlhps patterns.

llvm-svn: 27842
2006-04-19 20:37:34 +00:00
Evan Cheng cc7abc6c38 More mov{h|l}p{d|s} patterns.
llvm-svn: 27836
2006-04-19 18:20:17 +00:00
Evan Cheng aeb09ccdd3 - More mov{h|l}ps patterns.
- Increase cost (complexity) of patterns which match mov{h|l}ps ops. These
  are preferred over shufps in most cases.

llvm-svn: 27835
2006-04-19 18:11:52 +00:00
Evan Cheng 3823aa1d0f - PEXTRW cannot take a memory location as its first source operand.
- PINSRWrmi encoding bug.

llvm-svn: 27818
2006-04-18 21:59:43 +00:00
Evan Cheng a179ea631d Name change for clarity sake
llvm-svn: 27816
2006-04-18 21:55:35 +00:00
Evan Cheng d799d680f4 Name change for clarity sake
llvm-svn: 27814
2006-04-18 21:29:50 +00:00
Evan Cheng 0ee281f37c Left a pattern out
llvm-svn: 27813
2006-04-18 21:29:08 +00:00
Evan Cheng e2d25a1a50 Fixed an encoding bug: movd from XMM to R32.
llvm-svn: 27807
2006-04-18 18:19:00 +00:00
Evan Cheng 5421206c4b Use movss to insert_vector_elt(v, s, 0).
llvm-svn: 27782
2006-04-17 22:45:49 +00:00
Evan Cheng 22c06f054b Encoding bug
llvm-svn: 27773
2006-04-17 21:33:57 +00:00
Evan Cheng 5112b5c544 Errors in patterns preventing load folding
llvm-svn: 27762
2006-04-17 18:05:01 +00:00
Evan Cheng 20712deecb movduprm, movshduprm bugs
llvm-svn: 27734
2006-04-16 18:11:28 +00:00
Evan Cheng 3064f9aaa6 Encoding bugs
llvm-svn: 27733
2006-04-16 07:02:22 +00:00
Evan Cheng 8f1d801389 More encoding bugs
llvm-svn: 27722
2006-04-15 06:10:09 +00:00
Evan Cheng 91944e8699 pslldrm, psrawrm, etc. encoding bug
llvm-svn: 27721
2006-04-15 05:59:08 +00:00
Evan Cheng 1220b31a31 hsubp{s|d} encoding bug
llvm-svn: 27720
2006-04-15 05:52:42 +00:00
Evan Cheng 6222cf2a36 Silly bug
llvm-svn: 27719
2006-04-15 05:37:34 +00:00
Evan Cheng 00a5b3d9d3 Some clean up
llvm-svn: 27715
2006-04-14 23:32:40 +00:00
Evan Cheng 5d247f81c1 Last few SSE3 intrinsics.
llvm-svn: 27711
2006-04-14 21:59:03 +00:00
Evan Cheng 3bd605397b Misc. SSE2 intrinsics: clflush, lfench, mfence
llvm-svn: 27699
2006-04-14 07:43:12 +00:00
Evan Cheng eb0063a34f pcmpeq* and pcmpgt* intrinsics.
llvm-svn: 27685
2006-04-14 01:39:53 +00:00
Evan Cheng 16287444ff psll*, psrl*, and psra* intrinsics.
llvm-svn: 27684
2006-04-14 00:14:05 +00:00
Evan Cheng a84319719c Doh. PANDrm, etc. are not commutable.
llvm-svn: 27668
2006-04-13 18:11:28 +00:00
Evan Cheng ed3996743f psad, pmax, pmin intrinsics.
llvm-svn: 27647
2006-04-13 06:11:45 +00:00
Evan Cheng 58dad55959 Various SSE2 packed integer intrinsics: pmulhuw, pavgw, etc.
llvm-svn: 27645
2006-04-13 05:24:54 +00:00
Evan Cheng b3fe00bdc6 padds{b|w}, paddus{b|w}, psubs{b|w}, psubus{b|w} intrinsics.
llvm-svn: 27639
2006-04-13 00:43:35 +00:00
Evan Cheng 0aab735a1a Naming inconsistency.
llvm-svn: 27638
2006-04-13 00:00:23 +00:00