Commit Graph

163 Commits

Author SHA1 Message Date
Chris Lattner 724539c001 A few inline asm cleanups:
- Make targetlowering.h fit in 80 cols.
  - Make LowerAsmOperandForConstraint const.
  - Make lowerXConstraint -> LowerXConstraint
  - Make LowerXConstraint return a const char* instead of taking a string byref.

llvm-svn: 50312
2008-04-26 23:02:14 +00:00
Dan Gohman 3dd8ba6235 Remove X86_64SRet; it isn't used anymore.
llvm-svn: 49759
2008-04-16 00:24:30 +00:00
Dan Gohman 2505d86783 Fix const-correctness issues with the SrcValue handling in the
memory intrinsic expansion code.

llvm-svn: 49666
2008-04-14 17:55:48 +00:00
Arnold Schwaighofer 634fc9a33a This patch corrects the handling of byval arguments for tailcall
optimized x86-64 (and x86) calls so that they work (... at least for
my test cases).

Should fix the following problems:

Problem 1: When i introduced the optimized handling of arguments for
tail called functions (using a sequence of copyto/copyfrom virtual
registers instead of always lowering to top of the stack) i did not
handle byval arguments correctly e.g they did not work at all :).

Problem 2: On x86-64 after the arguments of the tail called function
are moved to their registers (which include ESI/RSI etc), tail call
optimization performs byval lowering which causes xSI,xDI, xCX
registers to be overwritten. This is handled in this patch by moving
the arguments to virtual registers first and after the byval lowering
the arguments are moved from those virtual registers back to
RSI/RDI/RCX.

llvm-svn: 49584
2008-04-12 18:11:06 +00:00
Dan Gohman 544ab2c50b Drop ISD::MEMSET, ISD::MEMMOVE, and ISD::MEMCPY, which are not Legal
on any current target and aren't optimized in DAGCombiner. Instead
of using intermediate nodes, expand the operations, choosing between
simple loads/stores, target-specific code, and library calls,
immediately.

Previously, the code to emit optimized code for these operations
was only used at initial SelectionDAG construction time; now it is
used at all times. This fixes some cases where rep;movs was being
used for small copies where simple loads/stores would be better.

This also cleans up code that checks for alignments less than 4;
let the targets make that decision instead of doing it in
target-independent code. This allows x86 to use rep;movs in
low-alignment cases.

Also, this fixes a bug that resulted in the use of rep;stos for
memsets of 0 with non-constant memory size when the alignment was
at least 4. It's better to use the library in this case, which
can be significantly faster when the size is large.

This also preserves more SourceValue information when memory
intrinsics are lowered into simple loads/stores.

llvm-svn: 49572
2008-04-12 04:36:06 +00:00
Dan Gohman 33b3300178 Make isVectorClearMaskLegal's operand list const.
llvm-svn: 49446
2008-04-09 20:09:42 +00:00
Chris Lattner 68b11e14bc remove Evan's "ugly hack" that sorta attempted to get
x86-64 return conventions correct, but was never enabled.
We can now do the "right thing" with multiple return values.

llvm-svn: 48635
2008-03-21 06:50:21 +00:00
Arnold Schwaighofer 7da2bceb3b Don't loose incoming argument registers. Fix documentation style.
llvm-svn: 48545
2008-03-19 16:39:45 +00:00
Chris Lattner 4b3a7fa823 Eliminate the FP_GET_ST0/FP_SET_ST0 target-specific dag nodes, just lower to
copyfromreg/copytoreg instead.

llvm-svn: 48174
2008-03-10 21:08:41 +00:00
Scott Michel a6729e8666 Give TargetLowering::getSetCCResultType() a parameter so that ISD::SETCC's
return ValueType can depend its operands' ValueType.

This is a cosmetic change, no functionality impacted.

llvm-svn: 48145
2008-03-10 15:42:14 +00:00
Chris Lattner 4c869594bc rename FP_SETRESULT -> FP_SET_ST0
llvm-svn: 48094
2008-03-09 07:08:44 +00:00
Chris Lattner d587e580a6 rename FpGETRESULT32 -> FpGET_ST0_32 etc. Add support for
isel'ing value preserving FP roundings from one fp stack reg to another
into a noop, instead of stack traffic.

llvm-svn: 48093
2008-03-09 07:05:32 +00:00
Evan Cheng 0a62cb44ce Add a target lowering hook to control whether it's worthwhile to compress fp constant.
For x86, if sse2 is available, it's not a good idea since cvtss2sd is slower than a movsd load and it prevents load folding. On x87, it's important to shrink fp constant since fldt is very expensive.

llvm-svn: 47931
2008-03-05 01:30:59 +00:00
Andrew Lenharth 357061a74d 64bit CAS on 32bit x86.
llvm-svn: 47929
2008-03-05 01:15:49 +00:00
Andrew Lenharth d032c33300 all but CAS working on x86
llvm-svn: 47798
2008-03-01 21:52:34 +00:00
Arnold Schwaighofer 3bfca3e942 Refactor according to Evan's and Anton's suggestions.
llvm-svn: 47635
2008-02-26 22:21:54 +00:00
Arnold Schwaighofer b01b99ec78 Change the lowering of arguments for tail call optimized
calls. Before arguments that could overwrite each other were
explicitly lowered to a stack slot, not giving the register allocator
a chance to optimize. Now a sequence of copyto/copyfrom virtual
registers ensures that arguments are loaded in (virtual) registers
before they are lowered to the stack slot (and might overwrite each
other). Also parameter stack slots are marked mutable for
(potentially) tail calling functions.

llvm-svn: 47593
2008-02-26 09:19:59 +00:00
Evan Cheng 6200c225e0 - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.

llvm-svn: 47290
2008-02-18 23:04:32 +00:00
Dan Gohman e1d9ee66ed Simplify some logic in ComputeMaskedBits. And change ComputeMaskedBits
to pass the mask APInt by value, not by reference. 

llvm-svn: 47096
2008-02-13 22:28:48 +00:00
Dan Gohman f990faf23b Convert SelectionDAG::ComputeMaskedBits to use APInt instead of uint64_t.
Add an overload that supports the uint64_t interface for use by clients
that haven't been updated yet.

llvm-svn: 47039
2008-02-13 00:35:47 +00:00
Nate Begeman 2d77e8e446 Enable SSE4 codegen and pattern matching.
Add some notes to the README.

llvm-svn: 46949
2008-02-11 04:19:36 +00:00
Dan Gohman 3a4be0fdef Rename MRegisterInfo to TargetRegisterInfo.
llvm-svn: 46930
2008-02-10 18:45:23 +00:00
Dan Gohman 9ba4d76816 Rename ISD::FLT_ROUNDS to ISD::FLT_ROUNDS_ to avoid conflicting
with the real FLT_ROUNDS (defined in <float.h>).

llvm-svn: 46587
2008-01-31 00:41:03 +00:00
Evan Cheng 29cfb67e28 Even though InsertAtEndOfBasicBlock is an ugly hack it still deserves a proper name. Rename it to EmitInstrWithCustomInserter since it does not necessarily insert
instruction at the end.

llvm-svn: 46562
2008-01-30 18:18:23 +00:00
Evan Cheng 084a1cdcdd Work in progress. This patch *fixes* x86-64 calls which are modelled as StructRet but really should be return in registers, e.g. _Complex long double, some 128-bit aggregates. This is a short term solution that is necessary only because llvm, for now, cannot model i128 nor call's with multiple results.
Status: This only works for direct calls, and only the caller side is done. Disabled for now.

llvm-svn: 46527
2008-01-29 19:34:22 +00:00
Dale Johannesen 2b3bc30420 Handle 'X' constraint in asm's better.
llvm-svn: 46485
2008-01-29 02:21:21 +00:00
Evan Cheng 35abd840a6 Let each target decide byval alignment. For X86, it's 4-byte unless the aggregare contains SSE vector(s). For x86-64, it's max of 8 or alignment of the type.
llvm-svn: 46286
2008-01-23 23:17:41 +00:00
Chris Lattner 7dc00e8021 make a method public
llvm-svn: 46159
2008-01-18 06:52:41 +00:00
Chris Lattner e8bb9f2190 make it more clear that this predicate only applies to scalar FP types.
llvm-svn: 46058
2008-01-16 06:24:21 +00:00
Chris Lattner 14e616ef0b introduce a isTypeInSSEReg predicate, which allows us to simplify
some code.  No functionality change.

llvm-svn: 46055
2008-01-16 06:19:45 +00:00
Chris Lattner 3c3fefde06 no need to expand ISD::TRAP to X86ISD::TRAP, just match ISD::TRAP.
llvm-svn: 46015
2008-01-15 21:58:22 +00:00
Anton Korobeynikov 6bbbc4cbfa For PR1839: add initial support for __builtin_trap. llvm-gcc part is missed
as well as PPC codegen

llvm-svn: 46001
2008-01-15 07:02:33 +00:00
Gordon Henriksen 9231958391 Refactoring the x86 and x86-64 calling convention implementations,
unifying the copied algorithms and saving over 500 LOC. There should
be no functionality change, but please test on your favorite x86
target.

llvm-svn: 45627
2008-01-05 16:56:59 +00:00
Chris Lattner f3ebc3f3d2 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00
Evan Cheng e9fbc3f014 Implement ctlz and cttz with bsr and bsf.
llvm-svn: 45024
2007-12-14 02:13:44 +00:00
Chris Lattner f81d5886c6 Several changes:
1) Change the interface to TargetLowering::ExpandOperationResult to 
   take and return entire NODES that need a result expanded, not just
   the value.  This allows us to handle things like READCYCLECOUNTER,
   which returns two values.
2) Implement (extremely limited) support in LegalizeDAG::ExpandOp for MERGE_VALUES.
3) Reimplement custom lowering in LegalizeDAGTypes in terms of the new
   ExpandOperationResult.  This makes the result simpler and fully 
   general.
4) Implement (fully general) expand support for MERGE_VALUES in LegalizeDAGTypes.
5) Implement ExpandOperationResult support for ARM f64->i64 bitconvert and ARM
   i64 shifts, allowing them to work with LegalizeDAGTypes.
6) Implement ExpandOperationResult support for X86 READCYCLECOUNTER and FP_TO_SINT,
   allowing them to work with LegalizeDAGTypes.

LegalizeDAGTypes now passes several more X86 codegen tests when enabled and when
type legalization in LegalizeDAG is ifdef'd out.

llvm-svn: 44300
2007-11-24 07:07:01 +00:00
Anton Korobeynikov 91460e43f1 Implement codegen for flt_rounds on x86
llvm-svn: 44183
2007-11-16 01:31:51 +00:00
Evan Cheng 797d56ff17 Much improved pic jumptable codegen:
Then:
        call    "L1$pb"
"L1$pb":
        popl    %eax
		...
LBB1_1: # entry
        imull   $4, %ecx, %ecx
        leal    LJTI1_0-"L1$pb"(%eax), %edx
        addl    LJTI1_0-"L1$pb"(%ecx,%eax), %edx
        jmpl    *%edx

        .align  2
        .set L1_0_set_3,LBB1_3-LJTI1_0
        .set L1_0_set_2,LBB1_2-LJTI1_0
        .set L1_0_set_5,LBB1_5-LJTI1_0
        .set L1_0_set_4,LBB1_4-LJTI1_0
LJTI1_0:
        .long    L1_0_set_3
        .long    L1_0_set_2

Now:
        call    "L1$pb"
"L1$pb":
        popl    %eax
		...
LBB1_1: # entry
        addl    LJTI1_0-"L1$pb"(%eax,%ecx,4), %eax
        jmpl    *%eax

		.align  2
		.set L1_0_set_3,LBB1_3-"L1$pb"
		.set L1_0_set_2,LBB1_2-"L1$pb"
		.set L1_0_set_5,LBB1_5-"L1$pb"
		.set L1_0_set_4,LBB1_4-"L1$pb"
LJTI1_0:
        .long    L1_0_set_3
        .long    L1_0_set_2

llvm-svn: 43924
2007-11-09 01:32:10 +00:00
Rafael Espindola fa0df55bdd Move the LowerMEMCPY and LowerMEMCPYCall to a common place.
Thanks for the suggestions Bill :-)

llvm-svn: 43742
2007-11-05 23:12:20 +00:00
Evan Cheng e106e2f142 Enable more fold (sext (load x)) -> (sext (truncate (sextload x)))
transformation. Previously, it's restricted by ensuring the number of load uses
is one. Now the restriction is loosened up by allowing setcc uses to be
"extended" (e.g. setcc x, c, eq -> setcc sext(x), sext(c), eq).

llvm-svn: 43465
2007-10-29 19:58:20 +00:00
Evan Cheng 7f3d02471d Loosen up iv reuse to allow reuse of the same stride but a larger type when truncating from the larger type to smaller type is free.
e.g.
Turns this loop:
LBB1_1: # entry.bb_crit_edge
        xorl    %ecx, %ecx
        xorw    %dx, %dx
        movw    %dx, %si
LBB1_2: # bb
        movl    L_X$non_lazy_ptr, %edi
        movw    %si, (%edi)
        movl    L_Y$non_lazy_ptr, %edi
        movw    %dx, (%edi)
		addw    $4, %dx
		incw    %si
		incl    %ecx
		cmpl    %eax, %ecx
		jne     LBB1_2  # bb
	
into

LBB1_1: # entry.bb_crit_edge
        xorl    %ecx, %ecx
        xorw    %dx, %dx
LBB1_2: # bb
        movl    L_X$non_lazy_ptr, %esi
        movw    %cx, (%esi)
        movl    L_Y$non_lazy_ptr, %esi
        movw    %dx, (%esi)
        addw    $4, %dx
		incl    %ecx
        cmpl    %eax, %ecx
        jne     LBB1_2  # bb

llvm-svn: 43375
2007-10-26 01:56:11 +00:00
Arnold Schwaighofer 9ccea99165 Added tail call optimization to the x86 back end. It can be
enabled by passing -tailcallopt to llc.  The optimization is
performed if the following conditions are satisfied:
* caller/callee are fastcc
* elf/pic is disabled OR
  elf/pic enabled + callee is in module + callee has
  visibility protected or hidden

llvm-svn: 42870
2007-10-11 19:40:01 +00:00
Dan Gohman e8c8ef5234 LowerIntegerDivOrRem no longer exists.
llvm-svn: 42787
2007-10-09 15:45:13 +00:00
Dan Gohman a160361c85 Migrate X86 and ARM from using X86ISD::{,I}DIV and ARMISD::MULHILO{U,S} to
use ISD::{S,U}DIVREM and ISD::{S,U}MUL_HIO. Move the lowering code
associated with these operators into target-independent in LegalizeDAG.cpp
and TargetLowering.cpp.

llvm-svn: 42762
2007-10-08 18:33:35 +00:00
Evan Cheng 5fb5a1f389 Enabling new condition code modeling scheme.
llvm-svn: 42459
2007-09-29 00:00:36 +00:00
Rafael Espindola 6c04ac1db0 Refactor the memcpy lowering for the x86 target.
The only generated code difference is that now we call memcpy when
the size of the array is unknown. This matches GCC behavior and is
better since the run time value can be arbitrarily large.

llvm-svn: 42433
2007-09-28 12:53:01 +00:00
Dan Gohman 06919e8ef2 Fix a typo in a comment.
llvm-svn: 42313
2007-09-25 19:37:26 +00:00
Dan Gohman 31599685c7 When both x/y and x%y are needed (x and y both scalar integer), compute
both results with a single div or idiv instruction. This uses new X86ISD
nodes for DIV and IDIV which are introduced during the legalize phase
so that the SelectionDAG's CSE can automatically eliminate redundant
computations.

llvm-svn: 42308
2007-09-25 18:23:27 +00:00
Evan Cheng e95f391ef1 Added support for new condition code modeling scheme (i.e. physical register dependency). These are a bunch of instructions that are duplicated so the x86 backend can support both the old and new schemes at the same time. They will be deleted after
all the kinks are worked out.

llvm-svn: 42285
2007-09-25 01:57:46 +00:00
Dale Johannesen e36c400255 Fix PR 1681. When X86 target uses +sse -sse2,
keep f32 in SSE registers and f64 in x87.  This
is effectively a new codegen mode.
Change addLegalFPImmediate to permit float and
double variants to do different things.
Adjust callers.

llvm-svn: 42246
2007-09-23 14:52:20 +00:00