Commit Graph

913 Commits

Author SHA1 Message Date
Dan Gohman 56b6885104 When doing the very-late shift-and address-mode optimization,
create a new DAG node to represent the new shift to keep the
DAG consistent, even though it'll almost always be folded into
the address.

If a user of the resulting address has multiple uses, the
nodes may get revisited by a later MatchAddress call, in which
case DAG inconsistencies do matter.

This fixes PR2849.

llvm-svn: 57465
2008-10-13 20:52:04 +00:00
Evan Cheng 4c499c4fa6 Also update sub-register intervals after a trivial computation is rematt'ed for a copy instruction. PR2775.
llvm-svn: 57458
2008-10-13 18:35:52 +00:00
Evan Cheng 762f0f53ec Add a test case for _Complex passed as a FCA.
llvm-svn: 57456
2008-10-13 18:13:07 +00:00
Chris Lattner 2753955fc0 Change CALLSEQ_BEGIN and CALLSEQ_END to take TargetConstant's as
parameters instead of raw Constants.  This prevents the constants from
being selected by the isel pass, fixing PR2735.

llvm-svn: 57385
2008-10-11 22:08:30 +00:00
Dan Gohman 60ad173dfe Remove -disable-fast-isel. Use cl::boolOrDefault with -fast-isel
instead.

So now: -fast-isel or -fast-isel=true enable fast-isel, and
-fast-isel=false disables it. Fast-isel is also on by default
with -fast, and off by default otherwise.

llvm-svn: 57270
2008-10-07 23:00:56 +00:00
Dan Gohman b8118fd432 Add a testcase for i256 add. i256 isn't fully supported in
codegen right now, but add and subtract work.

llvm-svn: 57260
2008-10-07 20:39:12 +00:00
Anders Carlsson 1699ad9030 Certain patterns involving the "movss" instruction were marked as requiring SSE2, when in reality movss is an SSE1 instruction.
llvm-svn: 57246
2008-10-07 16:14:11 +00:00
Dale Johannesen 6c6729f3a8 Be more precise about which conversions of NaNs
are Inexact.  (These are not Inexact as defined
by IEEE754, but that seems like a reasonable way
to abstract what happens:  information is lost.)

llvm-svn: 57218
2008-10-06 22:59:10 +00:00
Evan Cheng 94d14f2d45 Fix PR2850 and PR2863. Only generate movddup for 128-bit SSE vector shuffles.
llvm-svn: 57210
2008-10-06 21:13:08 +00:00
Anton Korobeynikov b52ef06c8c Revert r56675 - it breaks unwinding runtime everywhere.
llvm-svn: 57048
2008-10-04 11:09:36 +00:00
Dan Gohman 78bb44fcd4 Fix a bug in the local allocator's liveness computation where it
was setting kill flags on tied uses in two-address instructions.
The kill flags were causing the allocator to think it could
allocate the use and its tied def in different registers.

llvm-svn: 57039
2008-10-04 00:31:14 +00:00
Dale Johannesen 867d549fce Handle some 64-bit atomics on x86-32, some of the time.
llvm-svn: 56963
2008-10-02 18:53:47 +00:00
Dan Gohman 88536398ff Fix a think-o in isSafeToMove. This fixes it from thinking that
volatile memory references are safe to move.

llvm-svn: 56948
2008-10-02 15:04:30 +00:00
Dan Gohman dfc507d2b5 Disable fast-isel for this test, as it doesn't emit the same
number of instructions.

llvm-svn: 56940
2008-10-01 23:48:35 +00:00
Devang Patel 1b76f2c40b Remove OptimizeForSize global. Use function attribute optsize.
llvm-svn: 56937
2008-10-01 23:18:38 +00:00
Dan Gohman 5c8c00af1f Split this test and move it into target-specific directories.
This fixes failures on configurations that don't have one or the
other targets enabled.

llvm-svn: 56926
2008-10-01 19:46:30 +00:00
Dan Gohman 7354227de0 nounwind-ify this test.
llvm-svn: 56918
2008-10-01 15:07:14 +00:00
Bill Wendling 920f6d588e Moved this option to the front-end.
llvm-svn: 56901
2008-10-01 01:02:18 +00:00
Dan Gohman 1df16dff64 Use explicit target-triples to unbreak this test on non-darwin systems.
llvm-svn: 56896
2008-10-01 00:25:38 +00:00
Bill Wendling 1782584f56 Just don't transform this memset into "bzero" if no-builtin is specified.
llvm-svn: 56888
2008-09-30 22:05:33 +00:00
Bill Wendling e818bc159f - Initialize "--no-builtin" to "false".
- Testcase for r56885.

llvm-svn: 56886
2008-09-30 21:40:30 +00:00
Evan Cheng 9156bd2f48 Re-apply 56835 along with header file changes.
llvm-svn: 56848
2008-09-30 15:44:16 +00:00
Duncan Sands 2b9adce1d0 Revert commit 56835 since it breaks the build.
"If a re-materializable instruction has a register
operand, the spiller will change the register operand's
spill weight to HUGE_VAL to avoid it being spilled.
However, if the operand is already in the queue ready
to be spilled, avoid re-materializing it".

llvm-svn: 56837
2008-09-30 10:00:30 +00:00
Evan Cheng 9469049f7d If a re-materializable instruction has a register operand, the spiller will change the register operand's spill weight to HUGE_VAL to avoid it being spilled. However, if the operand is already in the queue ready to be spilled, avoid re-materializing it.
llvm-svn: 56835
2008-09-30 06:36:58 +00:00
Evan Cheng 82237f2f42 Fix PR2835. Do not change the width of a volatile load.
llvm-svn: 56792
2008-09-29 17:26:18 +00:00
Evan Cheng 3774b2f292 Re-apply 56683 with fixes.
llvm-svn: 56748
2008-09-27 01:56:22 +00:00
Evan Cheng 7d6fa97567 Implement "punpckldq %xmm0, $xmm0" as "pshufd $0x50, %xmm0, %xmm" unless optimizing for code size.
llvm-svn: 56711
2008-09-26 23:41:32 +00:00
Bill Wendling c966a737c5 Temporarily reverting r56683. This is causing a failure during the build of llvm-gcc:
/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/bin/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/lib/ -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/include -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/sys-include -mmacosx-version-min=10.4 -O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc -I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include -I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include  -I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber -I/Volumes/Gir/devel/llvm/clean/llvm.obj/include -I/Volumes/Gir/devel/llvm/clean/llvm.src/include -fexceptions -fvisibility=hidden -DHIDE_EXPORTS -c ../../llvm-gcc.src/gcc/unwind-dw2-fde-darwin.c -o libgcc/./unwind-dw2-fde-darwin.o
Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) && TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical register live information"), function runOnMachineFunction, file /Volumes/Gir/devel/llvm/clean/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp, line 311.
../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
{standard input}:3521:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb"
{standard input}:3521:symbol: "_dwarf_reg_size_table" can't be undefined in a subtraction expression
{standard input}:3520:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb"
...

llvm-svn: 56703
2008-09-26 22:10:44 +00:00
Evan Cheng d77cbe8947 Fix @llvm.frameaddress codegen. FP elimination optimization should be disabled when frame address is desired. Also add support for depth > 0.
llvm-svn: 56683
2008-09-26 19:48:35 +00:00
Evan Cheng 994dd0bbec Avoid spilling EBP / RBP twice in the prologue.
llvm-svn: 56675
2008-09-26 19:14:21 +00:00
Evan Cheng 9dbe45c000 Prefer movlhps over punpcklqdq, etc. in more cases.
llvm-svn: 56627
2008-09-25 23:35:16 +00:00
Evan Cheng 74c9ed91b0 With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps
llvm-svn: 56620
2008-09-25 20:50:48 +00:00
Dale Johannesen c50ada2f56 Accept 'inreg' attribute on x86 functions as
meaning sse_regparm (i.e. float/double values go
in XMM0 instead of ST0).  Update documentation
to reflect reality.

llvm-svn: 56619
2008-09-25 20:47:45 +00:00
Evan Cheng f8ead16b50 Fix patterns for SSE4.1 move and sign extend instructions. Also add instructions which fold VZEXT_MOVL and VZEXT_LOAD.
llvm-svn: 56594
2008-09-24 23:27:55 +00:00
Dale Johannesen 86d421df23 Remove SelectionDag early allocation of registers
for earlyclobbers.  Teach Local RA about earlyclobber,
and add some tests for it.

llvm-svn: 56592
2008-09-24 23:13:09 +00:00
Evan Cheng e0add20c1b Properly handle 'm' inline asm constraints. If a GV is being selected for the addressing mode, it requires the same logic for PIC relative addressing, etc.
llvm-svn: 56526
2008-09-24 00:05:32 +00:00
Evan Cheng 9e9426cb82 Support x86 specific inline asm modifier 'J'.
llvm-svn: 56483
2008-09-22 23:57:37 +00:00
Arnold Schwaighofer 796a271c5f Change the calling convention used when tail call optimization is enabled from CC_X86_32_TailCall to CC_X86_32_FastCC.
llvm-svn: 56436
2008-09-22 14:50:07 +00:00
Evan Cheng c042000649 Fix PR2808. When regalloc runs out of register, it spill a physical register around the live interval being allocated. Do not continue to try to spill another register, just grab the physical register and move on.
llvm-svn: 56381
2008-09-20 01:28:05 +00:00
Evan Cheng 65502487b7 Clean up the test.
llvm-svn: 56380
2008-09-20 01:26:27 +00:00
Evan Cheng 4730522235 No need to print function stubs for Mac OS X 10.5 and up. Linker will handle it.
llvm-svn: 56378
2008-09-20 00:13:45 +00:00
Dan Gohman 9801ba451a Refactor X86SelectConstAddr, folding it into X86SelectAddress. This
results in better code for globals. Also, unbreak the local CSE for
GlobalValue stub loads.

llvm-svn: 56371
2008-09-19 22:16:54 +00:00
Evan Cheng 4c0197043c Re-materalized definition instructions may be dead. Whack them.
llvm-svn: 56352
2008-09-19 17:38:47 +00:00
Dale Johannesen f8610ebebc Add a bit to mark operands of asm's that conflict
with an earlyclobber operand elsewhere.  Propagate
this bit and the earlyclobber bit through SDISel.
Change linear-scan RA not to allocate regs in a way 
that conflicts with an earlyclobber.  See also comments.

llvm-svn: 56290
2008-09-17 21:13:11 +00:00
Dan Gohman 68e7735a38 Teach LSR to optimize away SMAX operations for tripcounts in common
cases.  See the comment above OptimizeSMax for the full story, and
the testcase for an example. This cancels out a pessimization
commonly attributed to indvars, and will allow us to lift some of
the artificial throttles in indvars, rather than add new ones.

llvm-svn: 56230
2008-09-15 21:22:06 +00:00
Arnold Schwaighofer 33ad850d93 Add indirect tail call (function pointer) examples.
llvm-svn: 56127
2008-09-11 22:24:28 +00:00
Arnold Schwaighofer dd45bc25ac When tailcallopt is enabled all fastcc calls must have an aligned argument stack size. Add a test case.
llvm-svn: 56119
2008-09-11 20:28:43 +00:00
Evan Cheng 5456a37280 Fix PR2748. Avoid coalescing physical register with virtual register which would create illegal extract_subreg. e.g.
vr1024 = extract_subreg vr1025, 1
...
vr1024 = mov8rr AH
If vr1024 is coalesced with AH, the extract_subreg is now illegal since AH does not have a super-reg whose sub-register 1 is AH.

llvm-svn: 56118
2008-09-11 20:07:10 +00:00
Evan Cheng 4c9fbbb511 Fix PR2783 - coalescer bug. Missing a TargetRegisterInfo::isVirtualRegister check.
llvm-svn: 56112
2008-09-11 18:40:32 +00:00
Evan Cheng b401449ceb Propagate subreg index when promoting a load to a copy.
llvm-svn: 56085
2008-09-11 01:02:12 +00:00
Evan Cheng 710c3cf36a Fix a fastcc + sret bug. If fastcc and sret, callee doesn't need to pop the hidden struct ptr; Re-enable fastcc.
llvm-svn: 56061
2008-09-10 18:25:29 +00:00
Evan Cheng 53b728c27c Fix PR2757. Ignore liveinterval register allocation preference if the preference register is not in the right register class. This can happen due to sub-register coalescing.
llvm-svn: 56006
2008-09-09 20:22:01 +00:00
Evan Cheng 1e97901388 Fix a constant lowering bug. Now we can do load and store instructions with funky getelementptr embedded in the address operand.
llvm-svn: 55975
2008-09-09 01:26:59 +00:00
Anton Korobeynikov ab4928a6e5 Reapply 55902: Add test for checking proper lowering of eh_return & unwind init intrinsics on 32bit x86 targets
llvm-svn: 55960
2008-09-08 21:14:36 +00:00
Anton Korobeynikov 35883b7eec Reapply 55903: Testcase for 64-bit lowering of eh_return & unwind_init
llvm-svn: 55959
2008-09-08 21:14:19 +00:00
Bill Wendling 6da384d980 Remove these testcases associated with changes between r 55898 and r 55909.
llvm-svn: 55931
2008-09-08 18:00:39 +00:00
Bill Wendling 99b83712f3 Reverting r55898 to r55909. One of these patches was causing an ICE during the full bootstrap on Darwin:
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include
-O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2
-D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc
-I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include
-I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include
-I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include
-DSHARED -m64 -DL_negdi2 -c ../../llvm-gcc.src/gcc/libgcc2.c -o
libgcc/x86_64/_negdi2_s.o
Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) &&
TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical
register live information"), function runOnMachineFunction, file
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp,
line 311.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include
-O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2
-D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc
-I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include
-I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include
-I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include
-DSHARED -m64 -DL_lshrdi3 -c ../../llvm-gcc.src/gcc/libgcc2.c -o
libgcc/x86_64/_lshrdi3_s.o
../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
{standard input}:unknown:Undefined local symbol LBB21_11
{standard input}:unknown:Undefined local symbol LBB21_12
{standard input}:unknown:Undefined local symbol LBB21_13
{standard input}:unknown:Undefined local symbol LBB21_8

llvm-svn: 55928
2008-09-08 17:59:12 +00:00
Evan Cheng d172048b59 Handle calls which produce i1 results: promote to i8 but and it with 1 to get the low bit.
llvm-svn: 55925
2008-09-08 17:15:42 +00:00
Anton Korobeynikov f5e48008a4 Testcase for 64-bit lowering of eh_return & unwind_init
llvm-svn: 55903
2008-09-08 14:23:16 +00:00
Anton Korobeynikov f0d1a0ccf0 Add test for checking proper lowering of eh_return & unwind init intrinsics on 32bit x86 targets
llvm-svn: 55902
2008-09-08 14:22:57 +00:00
Evan Cheng ca14c07175 Correctly handle physical register inputs. They are not explicit input operands in the resulting machine instrs.
llvm-svn: 55893
2008-09-08 08:39:33 +00:00
Evan Cheng b928669409 Handle x86 truncate to i8 with target hook for now.
llvm-svn: 55877
2008-09-07 08:47:42 +00:00
Eli Friedman a9c52c8219 Fix for PR2687: Add patterns to match sint_to_fp and fp_to_sint for <2 x
i32>.  This is a little messy, but it works.

We should really get rid of the intrinsics, though, since they map
perfectly well to standard LLVM instructions.

llvm-svn: 55864
2008-09-05 23:07:03 +00:00
Evan Cheng d4e01dce74 Fix test.
llvm-svn: 55849
2008-09-05 20:04:37 +00:00
Evan Cheng 4f0d21592a If SSE2 is available, x86 should pass first 3 f32/f64 arguments in XMM registers for fastcc calls.
llvm-svn: 55840
2008-09-05 17:24:07 +00:00
Evan Cheng 6c94b99c62 For whatever the reason, x86 CallingConv::Fast (i.e. fastcc) was not passing scalar arguments in registers. This patch defines a new fastcc CC which is slightly different from the FastCall CC. In addition to passing integer arguments in ECX and EDX, it also specify doubles are passed in 8-byte slots which are 8-byte aligned (instead of 4-byte aligned). This avoids a potential performance hazard where doubles span cacheline boundaries.
llvm-svn: 55807
2008-09-04 22:59:58 +00:00
Owen Anderson b8c7ba228f Fix the ordering of operands to the store (inverted relative to LLVM IR), and fix the testcase.
llvm-svn: 55777
2008-09-04 16:48:33 +00:00
Owen Anderson 4f948bd87a Add a first attempt at implementing stores for X86 fast isel using target hooks.
Dan or Evan, please review.

llvm-svn: 55764
2008-09-04 07:08:58 +00:00
Evan Cheng 8d8f47d50b Load from GV stub should be locally CSE'd.
llvm-svn: 55763
2008-09-04 06:18:33 +00:00
Evan Cheng 3152edf474 Remove code that pad number of bytes to pop for X86_FastCall CC. The code doesn't do the "aligning" for Cygwin, Mingw, and Windows. But aligning it on Darwin and Linux breaks gcc compatibility. That ruled out all the platforms we support!
llvm-svn: 55756
2008-09-04 01:04:15 +00:00
Evan Cheng a41ee2974b Add X86 target hook to implement load (even from GlobalAddress).
llvm-svn: 55693
2008-09-03 06:44:39 +00:00
Evan Cheng a3771d5bd9 Re-apply 55467 with fix. If copy is being replaced by remat'ed def, transfer the implicit defs onto the remat'ed instruction.
llvm-svn: 55564
2008-08-30 09:09:33 +00:00
Evan Cheng cfb7f3abdf Transform (x << (y&31)) -> (x << y). This takes advantage of the fact x86 shift instructions 2nd operand (shift count) is limited to 0 to 31 (or 63 in the x86-64 case).
llvm-svn: 55558
2008-08-30 02:03:58 +00:00
Evan Cheng 3fddc7e906 Swap fp comparison operands and change predicate to allow load folding (safely this time).
llvm-svn: 55553
2008-08-29 23:22:12 +00:00
Evan Cheng 0b82607fa1 xfail this.
llvm-svn: 55550
2008-08-29 22:59:13 +00:00
Evan Cheng 960b17a3c2 Swap fp comparison operands and change predicate to allow load folding.
llvm-svn: 55521
2008-08-28 23:48:31 +00:00
Dan Gohman f27e33baa7 Optimize DAGCombiner's worklist processing. Previously it started
its work by putting all nodes in the worklist, requiring a big
dynamic allocation. Now, DAGCombiner just iterates over the AllNodes
list and maintains a worklist for nodes that are newly created or
need to be revisited. This allows the worklist to stay small in most
cases, so it can be a SmallVector.

This has the side effect of making DAGCombine not miss a folding
opportunity in alloca-align-rounding.ll.

llvm-svn: 55498
2008-08-28 21:01:56 +00:00
Dan Gohman 04cf2e4540 Revert r55467; it causes regressions in UnitTests/Vector/divides,
Benchmarks/sim/sim, and others on x86-64.

llvm-svn: 55475
2008-08-28 17:22:54 +00:00
Evan Cheng 6975602024 If a copy isn't coalesced, but its src is defined by trivial computation. Re-materialize the src to replace the copy.
llvm-svn: 55467
2008-08-28 07:53:51 +00:00
Dale Johannesen 897b2380d8 This test crashes on non-x86 host; make SSE explicit.
Feel free to fix a better way!

llvm-svn: 55456
2008-08-28 01:51:09 +00:00
Dan Gohman 5ca269e684 Basic FastISel support for floating-point constants.
llvm-svn: 55401
2008-08-27 01:09:54 +00:00
Chris Lattner 09f8cef571 If an xmm register is referenced explicitly in an inline asm, make sure to
assign it to a version of the xmm register with the regclass that matches its
type.  This fixes PR2715, a bug handling some crazy xpcom case in mozilla.

llvm-svn: 55358
2008-08-26 06:19:02 +00:00
Evan Cheng f00f1e50b5 Try approach to moving call address load inside of callseq_start. Now it's done during the preprocess of x86 isel. callseq_start's chain is changed to load's chain node; while load's chain is the last of callseq_start or the loads or copytoreg nodes inserted to move arguments to the right spot.
llvm-svn: 55338
2008-08-25 21:27:18 +00:00
Owen Anderson 32635dbfb2 Add support for fast isel of (integer) immediate materialization pattens, and use them to support
bitcast of constants in fast isel.

llvm-svn: 55325
2008-08-25 20:20:32 +00:00
Evan Cheng e414681352 Fix asm printing of MOVSDto64mr and MOV64toSDrm.
llvm-svn: 55300
2008-08-25 04:11:42 +00:00
Bill Wendling 934b374bc8 Fix this test. Don't null out the file, just XFAIL it until patch can be fixed.
llvm-svn: 55296
2008-08-24 21:48:46 +00:00
Bill Wendling 5b836c5f77 Temporarily reverting r55292. It's causing a bootstraping failure:
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc ... src/libiberty/make-temp-file.c -o make-temp-file.o
Assertion failed: (Node2Index[SU->NodeNum] > Node2Index[I->Dep->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp, line 508.
../../../../llvm-gcc.src/libiberty/hashtab.c:955: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
make[4]: *** [hashtab.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [multi-do] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-target-libiberty] Error 2
make: *** [all] Error 2

llvm-svn: 55295
2008-08-24 21:45:30 +00:00
Evan Cheng 8fa17424f7 Move callseq_start above the call address load to allow load to be folded into the call node.
llvm-svn: 55292
2008-08-24 19:19:55 +00:00
Anton Korobeynikov 55e9d7c178 Testcase for 64bit maskmovq
llvm-svn: 55239
2008-08-23 15:53:47 +00:00
Dale Johannesen bb170bd08c Test all currently supported atomic builtins on x86-{32,64}.
These just test that they go through the BE.

llvm-svn: 55208
2008-08-22 22:39:21 +00:00
Dan Gohman 49e19e906f Factor out the predicate check code from DAGISelEmitter.cpp
and use it in FastISelEmitter.cpp, and make FastISel
subtarget aware. Among other things, this lets it work
properly on x86 targets that don't have SSE, where it
successfully selects x87 instructions.

llvm-svn: 55156
2008-08-22 00:20:26 +00:00
Bill Wendling 1a6e930ea4 Testcase for PR2585.
llvm-svn: 55151
2008-08-21 23:04:49 +00:00
Dan Gohman 46989c637d Add -mattr=sse2 so this test doesn't fail on non-x86 hosts.
llvm-svn: 55145
2008-08-21 22:34:25 +00:00
Dale Johannesen c360f3ba50 Make x86 and sse2 explicit for non-x86 hosts.
llvm-svn: 55141
2008-08-21 21:26:06 +00:00
Evan Cheng 9534ea03e8 Fix a number of byval / memcpy / memset related codegen issues.
1. x86-64 byval alignment should be max of 8 and alignment of type. Previously the code was not doing what the commit message was saying.
2. Do not use byte repeat move and store operations. These are slow.

llvm-svn: 55139
2008-08-21 21:00:15 +00:00
Dan Gohman cdf1a276e3 getelementptr doesn't work on x86-64 yet, because it
has MOV64ri32 and no plain MOV64ri.

llvm-svn: 55126
2008-08-21 17:28:42 +00:00
Dan Gohman efb7d2d03d MVT::getMVT uses iPTR for pointer types, while we need the actual
intptr_t type in this case. FastISel can now select simple
getelementptr instructions.

llvm-svn: 55125
2008-08-21 17:25:26 +00:00
Dan Gohman fe9056584b Basic fast-isel support for instructions with constant int operands.
llvm-svn: 55099
2008-08-21 01:41:07 +00:00
Dan Gohman eaef5f612a Add a -march line for this test, and run it on x86-64 too for fun.
llvm-svn: 55030
2008-08-20 00:56:07 +00:00
Dan Gohman b16a7783c5 Add FastISel support for floating-point operations.
llvm-svn: 55021
2008-08-20 00:23:20 +00:00
Dan Gohman a3e4d5a5e1 Add FastISel support for several more binary operators.
llvm-svn: 55020
2008-08-20 00:11:48 +00:00
Bill Wendling e79740851f Add support for the __sync_sub_and_fetch atomics and friends for X86. The code
was already present, but not hooked up to anything.

llvm-svn: 55018
2008-08-19 23:09:18 +00:00
Dan Gohman 065e24709e Fast-isel is now *minimally* functional. Add a testcase to
demonstrate the extent of its capabilities. Note that it
only attempts to operate on one of the blocks in this
testcase.

llvm-svn: 55016
2008-08-19 22:37:59 +00:00
Dale Johannesen 5afbf510aa Add support for 8 and 16 bit forms of __sync
builtins on X86.

Change "lock" instructions to be on a separate line.
This is needed to work around a bug in the Darwin
assembler.

llvm-svn: 54999
2008-08-19 18:47:28 +00:00
Evan Cheng ab35bfdf18 Fix a (u)comiss intrinsic lowering bug. It was using anyext which can return junk in higher bits. Patch by Nate Begeman.
llvm-svn: 54903
2008-08-17 19:22:34 +00:00
Dan Gohman 7e3c392248 Allow SelectionDAG to create EXTRACT_VECTOR_ELT nodes with
non-constant indices. Only a few of the peephole checks require
a constant index.

llvm-svn: 54764
2008-08-13 21:51:37 +00:00
Dan Gohman c82ad79c64 Improve the grep commands for this test to be tolerant of ABI
differences, and to be more specific.

llvm-svn: 54648
2008-08-11 20:10:41 +00:00
Dan Gohman 127bb03b8c Take the FrameOffset into account when computing the alignment
of stack objects. This fixes PR2656.

llvm-svn: 54646
2008-08-11 18:27:03 +00:00
Dan Gohman 4e2f3ace2c Add an EXTRACTPSmr pattern to match the pattern that
X86ISelLowering creates.

llvm-svn: 54544
2008-08-08 18:30:21 +00:00
Dan Gohman 527ca7e253 Re-enable elimination of unnecessary SUBREG_TO_REG instructions in
LowerSubregs, and fix an x86-64 isel bug that this exposed.

SUBREG_TO_REG for x86-64 implicit zero extension is only safe for
isel to generate when the source is known to always have zeros in
the high 32 bits. The EXTRACT_SUBREG instruction does not clear
the high 32 bits.

llvm-svn: 54444
2008-08-07 02:54:50 +00:00
Dan Gohman a8dbaeb1df Add an extra example that shouldn't get an and instruction.
llvm-svn: 54443
2008-08-07 02:23:06 +00:00
Dan Gohman 91c2c432c0 Re-introduce the 8-bit subreg zext-inreg patterns for x86-32,
this time using MOV32to32_ and MOV16to16_. Thanks to Evan for
suggesting this.

llvm-svn: 54418
2008-08-06 18:27:21 +00:00
Evan Cheng 7823a411d5 Fix PR2620: Fix X86cmppd selection code so it expects operands to be v2f64.
llvm-svn: 54376
2008-08-05 22:19:15 +00:00
Evan Cheng aa33b932bd Fix PR2596: out of bound reference.
llvm-svn: 54375
2008-08-05 21:51:46 +00:00
Owen Anderson b358c8f482 Update the remaining tests not to use -disable-correct-folding, and remove two
that couldn't be updated.

llvm-svn: 54359
2008-08-05 18:19:14 +00:00
Owen Anderson 8ccb8d35c4 One more -disable-correct-folding case removed.
llvm-svn: 54358
2008-08-05 18:08:56 +00:00
Owen Anderson ae0af127a6 Remove another -disable-correct-folding use.
llvm-svn: 54357
2008-08-05 18:05:58 +00:00
Owen Anderson 31a0dea5c0 Eliminate another use of -disable-correct-folding.
llvm-svn: 54356
2008-08-05 18:03:01 +00:00
Evan Cheng 0ca10c9572 Fix PR2568: Fix bug that cause redudant kill marker after its live interval has been extended due to coalescing.
llvm-svn: 54346
2008-08-05 07:10:38 +00:00
Owen Anderson 5018bd032a Update these tests to work by disabling the new correct CFG generation. This flag should ONLY be used to for tests like these.
llvm-svn: 54334
2008-08-04 23:55:29 +00:00
Dan Gohman 90c724cadc Fix SDISel lowering of PHI nodes to use ComputeValueVTs.
This allows it to work correctly on aggregate values.
This fixes PR2623.

llvm-svn: 54331
2008-08-04 23:42:46 +00:00
Dale Johannesen 962178e77e Make sse2 explicit, for non-x86 hosts.
llvm-svn: 54251
2008-07-31 20:16:33 +00:00
Dan Gohman 345d63ccf2 Improve dagcombining for sext-loads and sext-in-reg nodes.
llvm-svn: 54239
2008-07-31 00:50:31 +00:00
Dan Gohman f8cc0aa87d I missed this file in r54223. movzbl is now used instead
of movzbw here.

llvm-svn: 54224
2008-07-30 18:23:34 +00:00
Dan Gohman 86b06335aa Reapply r54147 with a constraint to only use the 8-bit
subreg form on x86-64, to avoid the problem with x86-32
having GPRs that don't have 8-bit subregs.

Also, change several 16-bit instructions to use 
equivalent 32-bit instructions. These have a smaller
encoding and avoid partial-register updates.

llvm-svn: 54223
2008-07-30 18:09:17 +00:00
Mon P Wang 2c839d4b1e Added support for overloading intrinsics (atomics) based on pointers
to different address spaces.  This alters the naming scheme for those
intrinsics, e.g., atomic.load.add.i32 => atomic.load.add.i32.p0i32

llvm-svn: 54195
2008-07-30 04:36:53 +00:00
Dan Gohman 43105328d3 Revert 54147.
llvm-svn: 54148
2008-07-29 01:02:18 +00:00
Dan Gohman 26ec56c75c Add x86 isel patterns to match what would be a ZERO_EXTEND_INREG operation,
which is represented in codegen as an 'and' operation. This matches them
with movz instructions, instead of leaving them to be matched by and
instructions with an immediate field.

llvm-svn: 54147
2008-07-28 22:18:25 +00:00
Dan Gohman 108c58aef4 Fix embedded CRLF characters.
llvm-svn: 54125
2008-07-27 18:37:58 +00:00
Nate Begeman 59063cb315 Fix test RUN line
llvm-svn: 54040
2008-07-25 19:08:59 +00:00
Nate Begeman 283b2da27a Disable mov{L, LP, HP, HLP, *DUP} shuffles for mmx
mmx needs its own fancy shuffle logic based on unpack; for now we get correct but awful code.

Also commit Mon Ping's VSETCC patch

llvm-svn: 54039
2008-07-25 19:05:58 +00:00
Dan Gohman 4e6a25ac45 This test needs -aggressive-remat enabled.
llvm-svn: 54015
2008-07-25 15:25:32 +00:00
Dan Gohman 09b0448dbc Enable rematerialization of constants using AliasAnalysis::pointsToConstantMemory,
and knowledge of PseudoSourceValues. This unfortunately isn't sufficient to allow
constants to be rematerialized in PIC mode -- the extra indirection is a
complication.

llvm-svn: 54000
2008-07-25 00:02:30 +00:00
Dan Gohman d9f9e2e260 Add target triples so these tests behave as expected on non-darwin hosts.
llvm-svn: 53991
2008-07-24 18:08:01 +00:00
Evan Cheng c353a5f3b3 New test case.
llvm-svn: 53971
2008-07-24 00:22:05 +00:00
Evan Cheng a2b4b4ad99 Fix PR2485: do all 4-element SSE shuffles in max. of 2 shuffle instructions.
Based on patch by Nicolas Capens.

llvm-svn: 53939
2008-07-23 00:22:17 +00:00
Duncan Sands 775e509525 LegalizeTypes support for VSETCC. Fixes PR2575.
llvm-svn: 53938
2008-07-22 23:54:03 +00:00
Evan Cheng b8ff223f26 Fix pr2566: incorrect assumption about bit_convert. It doesn't not have to output a vector value. Patch by Nicolas Capens!
llvm-svn: 53932
2008-07-22 20:42:56 +00:00
Evan Cheng 0384670141 Fix PR2574: implement v2f32 scalar_to_vector.
llvm-svn: 53927
2008-07-22 18:39:19 +00:00
Bill Wendling 75840e6435 Fix for first part of PR2562. Generate the "pinsrw" instruction for inserts
into v4i16 vectors.

llvm-svn: 53807
2008-07-20 02:32:23 +00:00
Anton Korobeynikov 74d3fa165c Testcase for PR2549
llvm-svn: 53785
2008-07-19 06:31:12 +00:00
Evan Cheng cefd6e62fa Subreg live interval valno may not have a corresponding def machineinstr since it's less precise.
llvm-svn: 53734
2008-07-17 19:48:53 +00:00
Evan Cheng c5add64f0a Add nounwind.
llvm-svn: 53733
2008-07-17 19:48:04 +00:00
Evan Cheng e0a352e8e7 Fix PR2536: a nasty spiller bug. If a two-address instruction uses a register but the use portion of its live range is not part of its liveinterval, it must be defined by an implicit_def. In that case, do not spill the use. e.g.
8   %reg1024<def> = IMPLICIT_DEF
12  %reg1024<def> = INSERT_SUBREG %reg1024<kill>, %reg1025, 2

The live range [12, 14) are not part of the r1024 live interval since it's defined by an implicit def. It will not conflicts with live interval of r1025. Now suppose both registers are spilled, you can easily see a situation where both registers are reloaded before the INSERT_SUBREG and both target registers that would overlap.

llvm-svn: 53503
2008-07-12 01:56:02 +00:00
Duncan Sands d9948110a6 Port a shift-by-1 optimization from LegalizeDAG: it
was presumably added after the rest of the code was
copied to LegalizeTypes.

llvm-svn: 53459
2008-07-11 16:54:57 +00:00
Bill Wendling 5774466a33 The frame address on an x86-64 box needs to be offset by -8, not -4.
llvm-svn: 53450
2008-07-11 07:18:52 +00:00
Evan Cheng 71b7398463 Fix for PR2472. Use movss to set lower 32-bits of a zero XMM vector.
llvm-svn: 53386
2008-07-10 01:08:23 +00:00
Anton Korobeynikov dad7a35acf Testcase for PR2024
llvm-svn: 53327
2008-07-09 14:09:41 +00:00
Evan Cheng 03001cb820 Fix two serious LSR bugs.
1. LSR runOnLoop is always returning false regardless if any transformation is made.
2. AddUsersIfInteresting can create new instructions that are added to DeadInsts. But there is a later early exit which prevents them from being freed.

llvm-svn: 53193
2008-07-07 19:51:32 +00:00
Dale Johannesen 3bfe3b0e76 Considering predecessors of exit blocks gets
us a little more tail merging.

llvm-svn: 52986
2008-07-01 21:50:49 +00:00
Chris Lattner 501c0b82d6 test doesn't need eh info
llvm-svn: 52811
2008-06-27 03:14:20 +00:00
Dale Johannesen 0e28b3b9ee Allow for rounding up of stack frame.
llvm-svn: 52751
2008-06-26 01:55:32 +00:00
Chris Lattner b1e66ce3bb when we know the signbit of an input to uint_to_fp is zero,
change it to sint_to_fp on targets where that is cheaper (and
visaversa of course).  This allows us to compile uint_to_fp to:

_test:
	movl	4(%esp), %eax
	shrl	$23, %eax
	cvtsi2ss	%eax, %xmm0
	movl	8(%esp), %eax
	movss	%xmm0, (%eax)
	ret

instead of:

	.align	3
LCPI1_0:					##  double
	.long	0	## double least significant word 4.5036e+15
	.long	1127219200	## double most significant word 4.5036e+15
	.text
	.align	4,0x90
	.globl	_test
_test:
	subl	$12, %esp
	movl	16(%esp), %eax
	shrl	$23, %eax
	movl	%eax, (%esp)
	movl	$1127219200, 4(%esp)
	movsd	(%esp), %xmm0
	subsd	LCPI1_0, %xmm0
	cvtsd2ss	%xmm0, %xmm0
	movl	20(%esp), %eax
	movss	%xmm0, (%eax)
	addl	$12, %esp
	ret

llvm-svn: 52747
2008-06-26 00:16:49 +00:00
Evan Cheng 3fc2372d3a - Fix a x86 vector isel bug: illegal transformation of a vector_shuffle into a
shift.
- Add a readme entry for a missing vector_shuffle optimization that results in
  awful codegen.

llvm-svn: 52740
2008-06-25 20:52:59 +00:00
Mon P Wang 6a490371c9 Added MemOperands to Atomic operations since Atomics touches memory.
Added abstract class MemSDNode for any Node that have an associated MemOperand
Changed atomic.lcs => atomic.cmp.swap, atomic.las => atomic.load.add, and
atomic.lss => atomic.load.sub

llvm-svn: 52706
2008-06-25 08:15:39 +00:00
Evan Cheng 73db52ebf8 Enable two-address remat by default.
llvm-svn: 52701
2008-06-25 01:16:38 +00:00
Dale Johannesen 6316bc705e v2f32 is now a valid (MMX) type which breaks this
test (doesn't work for any MMX vector types, it's
not me).  Rewritten to use v2i16 which is generic
and going to stay that way; I think that preserves
the point of the test.

llvm-svn: 52692
2008-06-24 22:03:36 +00:00
Evan Cheng 3f2ceac565 If it's determined safe, remat MOV32r0 (i.e. xor r, r) and others as it is instead of using the longer MOV32ri instruction.
llvm-svn: 52670
2008-06-24 07:10:51 +00:00
Bill Wendling 559067c55c Make test work on non-x86 machines (like my G4 PPC).
llvm-svn: 52619
2008-06-23 06:16:31 +00:00
Evan Cheng f593a65497 Undo spill weight tweak. Need to investigate the performance regressions.
llvm-svn: 52572
2008-06-21 06:45:54 +00:00
Eli Friedman 8d66e98c92 Fix a bug with <8 x i16> shuffle lowering on X86 where parts of the
shuffle could be skipped.  The check is invalid because the loop index i 
doesn't correspond to the element actually inserted. The correct check is
already done a few lines earlier, for whether the element is already in 
the right spot, so this shouldn't have any effect on the codegen for 
code that was already correct.

llvm-svn: 52486
2008-06-19 06:09:51 +00:00
Evan Cheng b139ee95b4 New test case.
llvm-svn: 52483
2008-06-19 01:50:24 +00:00
Evan Cheng d6d9cfa72a This also got better (55 - 51 instructions). But doing one more re-materialization.
llvm-svn: 52482
2008-06-19 01:50:13 +00:00
Evan Cheng 1d1907d778 This got better.
llvm-svn: 52481
2008-06-19 01:46:43 +00:00
Evan Cheng 1cde1f8d5e Do not issue identity copies.
llvm-svn: 52373
2008-06-16 22:52:53 +00:00
Evan Cheng 49bad4c9e1 - Add "Commutative" property to intrinsics. This allows tblgen to generate the commuted variants for dagisel matching code.
- Mark lots of X86 intrinsics as "Commutative" to allow load folding.

llvm-svn: 52353
2008-06-16 20:29:38 +00:00
Evan Cheng fb79059b85 Teach the spiller to commute instructions in order to fold a reload. This hits 410 times on 444.namd and 122 times on 252.eon.
llvm-svn: 52266
2008-06-13 23:58:02 +00:00
Duncan Sands 8651e9c584 Disable some DAG combiner optimizations that may be
wrong for volatile loads and stores.  In fact this
is almost all of them!  There are three types of
problems: (1) it is wrong to change the width of
a volatile memory access.  These may be used to
do memory mapped i/o, in which case a load can have
an effect even if the result is not used.  Consider
loading an i32 but only using the lower 8 bits.  It
is wrong to change this into a load of an i8, because
you are no longer tickling the other three bytes.  It
is also unwise to make a load/store wider.  For
example, changing an i16 load into an i32 load is
wrong no matter how aligned things are, since the
fact of loading an additional 2 bytes can have
i/o side-effects.  (2) it is wrong to change the
number of volatile load/stores: they may be counted
by the hardware.  (3) it is wrong to change a volatile
load/store that requires one memory access into one
that requires several.  For example on x86-32, you
can store a double in one processor operation, but to
store an i64 requires two (two i32 stores).  In a
multi-threaded program you may want to bitcast an i64
to a double and store as a double because that will
occur atomically, and be indivisible to other threads.
So it would be wrong to convert the store-of-double
into a store of an i64, because this will become two
i32 stores - no longer atomic.  My policy here is
to say that the number of processor operations for
an illegal operation is undefined.  So it is alright
to change a store of an i64 (requires at least two
stores; but could be validly lowered to memcpy for
example) into a store of double (one processor op).
In short, if the new store is legal and has the same
size then I say that the transform is ok.  It would
also be possible to say that transforms are always
ok if before they were illegal, whether after they
are illegal or not, but that's more awkward to do
and I doubt it buys us anything much.
However this exposed an interesting thing - on x86-32
a store of i64 is considered legal!  That is because
operations are marked legal by default, regardless of
whether the type is legal or not.  In some ways this
is clever: before type legalization this means that
operations on illegal types are considered legal;
after type legalization there are no illegal types
so now operations are only legal if they really are.
But I consider this to be too cunning for mere mortals.
Better to do things explicitly by testing AfterLegalize.
So I have changed things so that operations with illegal
types are considered illegal - indeed they can never
map to a machine operation.  However this means that
the DAG combiner is more conservative because before
it was "accidentally" performing transforms where the
type was illegal because the operation was nonetheless
marked legal.  So in a few such places I added a check
on AfterLegalize, which I suppose was actually just
forgotten before.  This causes the DAG combiner to do
slightly more than it used to, which resulted in the X86
backend blowing up because it got a slightly surprising
node it wasn't expecting, so I tweaked it.

llvm-svn: 52254
2008-06-13 19:07:40 +00:00
Evan Cheng 2d788ce3fb Fix some tests.
llvm-svn: 52245
2008-06-12 21:23:38 +00:00
Dale Johannesen 01b7cae58d Fix parameter spelling: sse not sse1
llvm-svn: 52185
2008-06-10 17:57:58 +00:00
Matthijs Kooijman 07f4eecd2a Fix some more quoting issues in RUN lines, this time regarding unintended
variable expansions involving the $ character.

This fixes 4 tests that were not running properly before.

llvm-svn: 52183
2008-06-10 16:10:32 +00:00
Matthijs Kooijman 400c49c781 Remove double pipes in RUN commandlines.
This fixes 5 testcases that were not being run properly before.

llvm-svn: 52180
2008-06-10 15:11:36 +00:00
Dan Gohman 1b095b443c Convert several tests to use temporary files instead of redundantly
executing the test commands.

llvm-svn: 52163
2008-06-10 00:36:41 +00:00
Rafael Espindola 29479df2ac add support for PIC on linux x86-64
llvm-svn: 52139
2008-06-09 09:52:31 +00:00
Evan Cheng 976b1eee81 Fix a memcpy lowering bug. Even though the memcpy alignment is smaller than the desired alignment, the frame destination alignment may still be larger than the desired alignment. Don't change its alignment to something smaller.
llvm-svn: 51970
2008-06-04 23:37:54 +00:00
Dan Gohman 92d62b43c2 Fix the position of MemOperands in nodes that use variadic_ops
in DAGISelEmitter output. This bug was recently uncovered by the
addition of patterns for CALL32m and CALL64m, which are nodes
that now have both MemOperands and variadic_ops.

This bug was especially visible with PIC in various configurations,
because the new patterns are matching the indirect call code used
in many PIC configurations.

llvm-svn: 51877
2008-06-02 17:40:38 +00:00
Dan Gohman 96af4ddb62 Add patterns for CALL32m and CALL64m. They aren't matched in most
cases due to an isel deficiency already noted in
lib/Target/X86/README.txt, but they can be matched in this fold-call.ll
testcase, for example.

This is interesting mainly because it exposes a tricky tblgen bug;
tblgen was incorrectly computing the starting index for variable_ops
in the case of a complex pattern.

llvm-svn: 51706
2008-05-29 21:50:34 +00:00
Dan Gohman 714663ab94 Expand small memmovs using inline code. Set the X86 threshold for expanding
memmove to a more plausible value, now that it's actually being used.

llvm-svn: 51696
2008-05-29 19:42:22 +00:00
Evan Cheng 5e28227dbd Implement vector shift up / down and insert zero with ps{rl}lq / ps{rl}ldq.
llvm-svn: 51667
2008-05-29 08:22:04 +00:00
Evan Cheng 6892c5507f Add nounwind.
llvm-svn: 51665
2008-05-29 07:09:24 +00:00
Evan Cheng 68079268f5 Fix PR2289: vr defined by multiple implicit_def as result of coalescing.
llvm-svn: 51648
2008-05-28 17:40:10 +00:00
Evan Cheng 427412e7c8 Teach local register allocator to deal with landing pad MBB's.
llvm-svn: 51647
2008-05-28 17:22:32 +00:00
Dan Gohman 221e9d0d22 Specify a target so that this tests tests what it's intended to test.
llvm-svn: 51600
2008-05-27 17:55:57 +00:00
Dan Gohman 923a375053 Make this test independent of the target-triple; the stack alignment
is specifically what this test depends on.

llvm-svn: 51599
2008-05-27 17:44:23 +00:00
Nick Lewycky 213e114a2c The Linux ABI emits an extra "movl %esp, %ebp" in function prologue and
sometimes a "mov %ebp, %esp" in the epilogue.

Force these tests that rely on counting 'mov' to use i686-apple-darwin8.8.0
where they were written.

llvm-svn: 51568
2008-05-26 20:18:56 +00:00
Evan Cheng 948627aadd New loadl_pd and loadh_pd tests.
llvm-svn: 51525
2008-05-24 00:10:02 +00:00
Evan Cheng 04d24edcbb Use movlps / movhps to modify low / high half of 16-byet memory location.
llvm-svn: 51501
2008-05-23 21:23:16 +00:00
Dan Gohman 3388d022ac Use PMULDQ for v2i64 multiplies when SSE4.1 is available. And add
load-folding table entries for PMULDQ and PMULLD.

llvm-svn: 51489
2008-05-23 17:49:40 +00:00
Evan Cheng f3be7a7ea7 Bug: rcpps can only folds a load if the address is 16-byte aligned. Fixed many 'ps' load folding patterns in X86InstrSSE.td which are missing the proper alignment checks.
Also fixed some 80 col. violations.

llvm-svn: 51462
2008-05-23 00:37:07 +00:00
Evan Cheng a1100782d5 Add a couple of test cases.
llvm-svn: 51441
2008-05-22 21:19:19 +00:00
Evan Cheng 53963b775e Add missing patterns.
llvm-svn: 51435
2008-05-22 18:56:56 +00:00
Chris Lattner a87f1a568c testcase for PR2267
llvm-svn: 51408
2008-05-22 04:45:22 +00:00
Evan Cheng a5d27ae586 Fix PR2343. An *interesting* coalescer bug.
BB1:                                                                                                                                                  
  vr1025 = copy vr1024                                                                                                                                
  ..                                                                                                                                                  
BB2:                                                                                                                                                  
  vr1024 = op                                                                                                                                         
         = op vr1025                                                                                                                                     
  <loop eventually branch back to BB1>

Even though vr1025 is copied from vr1024, it's not safe to coalesced them since live range of vr1025 intersects the def of vr1024. This happens when vr1025 is assigned the value of the previous iteration of vr1024 in the loop.

llvm-svn: 51394
2008-05-21 22:34:12 +00:00
Gabor Greif 1e427c3264 sabre brings to my attention that the 'tr' suffix is also obsolete
llvm-svn: 51349
2008-05-20 21:00:03 +00:00
Gabor Greif f45ff35bfe Rename the last test with .llx extension to .ll, resolve duplicate test by renaming to isnan2. Now that no test has llx ending there is no need to search for them from dg.exp too.
llvm-svn: 51328
2008-05-20 19:52:04 +00:00
Dan Gohman cd2e772d08 Run vortex-bug as x86-64, which is what the original bug was triggered on.
llvm-svn: 51289
2008-05-20 00:54:39 +00:00
Dale Johannesen 0bf92b14f1 Use common where we mean common, not weak.
llvm-svn: 51173
2008-05-16 00:52:30 +00:00
Dan Gohman 0a0fa7cf78 Fix a bug in LoopStrengthReduce that caused it to emit IR with
use-before-def. The problem comes up in code with multiple PHIs where
one PHI is being rewritten in terms of the other, but the other needs
to be casted first. LLVM rules requre the cast instruction to be
inserted after any PHI instructions, but when instructions were
inserted to replace the second PHI value with a function of the first,
they were ended up going before the cast instruction. Avoid this
problem by remembering the location of the cast instruction, when one
is needed, and inserting the expansion of the new value after it.

This fixes a bug that surfaced in 255.vortex on x86-64 when
instcombine was removed from the middle of the loop optimization
passes. 

llvm-svn: 51169
2008-05-15 23:26:57 +00:00
Dan Gohman 3ab94df276 When bit-twiddling CondCode values for integer comparisons produces
SETOEQ, is it does with (SETEQ & SETULE), map it to SETEQ.

llvm-svn: 51112
2008-05-14 18:17:09 +00:00
Evan Cheng 1120279ae6 Instead of a vector load, shuffle and then extract an element. Load the element from address with an offset.
pshufd $1, (%rdi), %xmm0
        movd %xmm0, %eax
=>
        movl 4(%rdi), %eax

llvm-svn: 51026
2008-05-13 08:35:03 +00:00
Evan Cheng 3f40c69083 On x86, it's safe to treat i32 load anyext as a normal i32 load. Ditto for i8 anyext load to i16.
llvm-svn: 51019
2008-05-13 00:54:02 +00:00
Evan Cheng b980f6fb3d Xform bitconvert(build_pair(load a, load b)) to a single load if the load locations are at the right offset from each other.
llvm-svn: 51008
2008-05-12 23:04:07 +00:00
Dale Johannesen e6942c31ea New test for tail merging
llvm-svn: 51007
2008-05-12 22:59:44 +00:00
Evan Cheng 71b9afb053 When transforming a vector_shuffle to a load, the base address must not be an undef.
llvm-svn: 50940
2008-05-10 06:46:49 +00:00
Evan Cheng 9c4d685165 Add nounwind.
llvm-svn: 50931
2008-05-10 02:22:25 +00:00
Evan Cheng bec201fa06 If all sources of a PHI node are defined by an implicit_def, just emit an implicit_def instead of a copy.
llvm-svn: 50927
2008-05-10 00:17:50 +00:00
Evan Cheng 867af2678f Add a pattern to do move the low element of a v4f32 and zero extend the rest.
llvm-svn: 50922
2008-05-09 23:37:55 +00:00
Evan Cheng 961339bbdb Handle a few more cases of folding load i64 into xmm and zero top bits.
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.

llvm-svn: 50918
2008-05-09 21:53:03 +00:00
Evan Cheng 0352e63e39 Simplify test.
llvm-svn: 50911
2008-05-09 19:56:32 +00:00
Evan Cheng 0360ecbec1 Use movq to move low half of XMM register and zero-extend the rest.
llvm-svn: 50874
2008-05-08 22:35:02 +00:00
Evan Cheng 78af38c392 Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine.
llvm-svn: 50838
2008-05-08 00:57:18 +00:00
Evan Cheng 0d6311d46c Add nounwind.
llvm-svn: 50837
2008-05-07 22:59:08 +00:00
Evan Cheng 7ca4a67ca1 Yet another nasty spiller bug.
%ecx = op
store %cl<kill>, (addr)
(addr) = op %al

It's not safe to unfold the last operand and eliminate store even though %cl is marked kill. It's a sub-register use which means one of its super-register(s) may be used below.

llvm-svn: 50794
2008-05-07 00:49:28 +00:00
Anton Korobeynikov f5d2c3b45a Use target triple in tests, not 'realign-stack=0' option. Per request.
llvm-svn: 50778
2008-05-06 23:09:29 +00:00
Evan Cheng ef3faa1b1c Fix PR2287. Darwin passes mmx values in register in 64-mode, not Linux.
llvm-svn: 50716
2008-05-06 07:23:50 +00:00
Mon P Wang 3e58393c3d Added addition atomic instrinsics and, or, xor, min, and max.
llvm-svn: 50663
2008-05-05 19:05:59 +00:00
Chris Lattner 9c0c60d080 no need for eh info
llvm-svn: 50658
2008-05-05 18:24:33 +00:00
Dan Gohman bcde172222 Add AsmPrinter support for emitting a directive to declare that
the code being generated does not require an executable stack.

Also, add target-specific code to make use of this on Linux
on x86. 

llvm-svn: 50634
2008-05-05 00:28:39 +00:00
Evan Cheng d9481366e3 Select vector shift with non-immediate i32 shift amount operand by first moving the operand into the right register.
llvm-svn: 50619
2008-05-04 09:15:50 +00:00
Evan Cheng cdf22f2953 Add separate intrinsics for MMX / SSE shifts with i32 integer operands. This allow us to simplify the horribly complicated matching code.
llvm-svn: 50601
2008-05-03 00:52:09 +00:00
Chris Lattner 34931afff7 specify an arch for non-x86 hosts.
llvm-svn: 50576
2008-05-02 15:11:58 +00:00
Chris Lattner d4b2a67cf3 don't randomly miscompile seto/setuo just because we are in
ffastmath mode.  This fixes rdar://5902801, a miscompilation
of gcc.dg/builtins-8.c.

Bill, please pull this into Tak.

llvm-svn: 50523
2008-05-01 07:26:11 +00:00
Arnold Schwaighofer 68988d13f6 Really commit the test checking the argument lowering behaviour on x86-64 :).
llvm-svn: 50478
2008-04-30 09:19:47 +00:00
Chris Lattner 5c88f7b1ad make the vector conversion magic handle multiple results.
We now compile test2/test3 to:

_test2:
	## InlineAsm Start
	set %xmm0, %xmm1
	## InlineAsm End
	addps	%xmm1, %xmm0
	ret
_test3:
	## InlineAsm Start
	set %xmm0, %xmm1
	## InlineAsm End
	paddd	%xmm1, %xmm0
	ret

as expected.

llvm-svn: 50389
2008-04-29 04:48:56 +00:00
Chris Lattner f9a49c4322 add support for multiple return values in inline asm. This is a step
towards PR2094.  It now compiles the attached .ll file to:

_sad16_sse2:
	movslq	%ecx, %rax
	## InlineAsm Start
	%ecx %rdx %rax %rax %r8d %rdx %rsi
	## InlineAsm End
	## InlineAsm Start
	set %eax
	## InlineAsm End
	ret

which is pretty decent for a 3 output, 4 input asm.

llvm-svn: 50386
2008-04-29 04:29:54 +00:00
Evan Cheng 11b98b6612 Another extract_subreg coalescing bug.
e.g.
vr1024<2> extract_subreg vr1025, 2
If vr1024 do not have the same register class as vr1025, it's not safe to coalesce this away. For example, vr1024 might be a GPR32 while vr1025 might be a GPR64.

llvm-svn: 50385
2008-04-29 01:41:44 +00:00
Evan Cheng 73c3b4741a Add -march=x86.
llvm-svn: 50380
2008-04-28 23:31:41 +00:00
Evan Cheng 315e3cb9a3 Test case.
llvm-svn: 50377
2008-04-28 22:14:34 +00:00
Chris Lattner 2237973438 Implement a signficant optimization for inline asm:
When choosing between constraints with multiple options,
like "ir", test to see if we can use the 'i' constraint and
go with that if possible.  This produces more optimal ASM in
all cases (sparing a register and an instruction to load it),
and fixes inline asm like this:

void test () {
  asm volatile (" %c0 %1 " : : "imr" (42), "imr"(14));
}

Previously we would dump "42" into a memory location (which
is ok for the 'm' constraint) which would cause a problem
because the 'c' modifier is not valid on memory operands.

Isn't it great how inline asm turns 'missed optimization'
into 'compile failed'??

Incidentally, this was the todo in 
PowerPC/2007-04-24-InlineAsm-I-Modifier.ll

Please do NOT pull this into Tak.

llvm-svn: 50315
2008-04-27 00:37:18 +00:00
Nate Begeman 98f0898e42 Feedback from chris
llvm-svn: 50305
2008-04-25 21:47:35 +00:00
Nate Begeman f10b493fc0 Add a testcase for the recent "handle variable vector insert elt in mem" patch
llvm-svn: 50303
2008-04-25 21:26:59 +00:00
Evan Cheng 402572a149 Update tests.
llvm-svn: 50293
2008-04-25 20:13:47 +00:00
Evan Cheng ccde6dd016 Special handling for MMX values being passed in either GPR64 or lower 64-bits of XMM registers.
llvm-svn: 50289
2008-04-25 19:11:04 +00:00
Evan Cheng df38b35a1e MMX argument passing fixes:
On Darwin / Linux x86-32, v8i8, v4i16, v2i32 values are passed in MM[0-2].                                                                                                                                      
On Darwin / Linux x86-32, v1i64 values are passed in memory.                                                                                                                                                    
On Darwin x86-64, v8i8, v4i16, v2i32 values are passed in XMM[0-7].                                                                                                                                     
On Darwin x86-64, v1i64 values are passed in 64-bit GPRs.

llvm-svn: 50257
2008-04-25 07:56:45 +00:00
Chris Lattner 741c7a3b49 Loosen up an assertion to allow intrinsics. I really have no
idea what this code (findNonImmUse) does, so I'm only guessing 
that this is the right thing.  It would be really really nice
if this had comments and perhaps switched to SmallPtrSet
(hint hint) :)

This fixes rdar://5886601, a crash on gcc.target/i386/sse4_1-pblendw.c

llvm-svn: 50252
2008-04-25 05:13:01 +00:00
Evan Cheng 9165e165dc Fix bug in x86 memcpy / memset lowering. If there are trailing bytes not handled by rep instructions, a new memcpy / memset is introduced for them. However, since source / destination addresses are already adjusted, their offsets should be zero.
llvm-svn: 50239
2008-04-25 00:26:43 +00:00
Anton Korobeynikov dd4ef2e30c Disable stack realignment for these tests
llvm-svn: 50172
2008-04-23 18:25:44 +00:00
Anton Korobeynikov c3ada5c9c4 Fix test becase ABI stack alignment dropped to 'normal' value
llvm-svn: 50171
2008-04-23 18:25:16 +00:00
Anton Korobeynikov 955a8a9101 Fix test, instruction count is valid only if stack is not realigned
llvm-svn: 50170
2008-04-23 18:24:48 +00:00
Dan Gohman f166d2d0d6 Implement an x86-64 ABI detail of passing structs by hidden first
argument. The x86-64 ABI requires the incoming value of %rdi to
be copied to %rax on exit from a function that is returning a
large C struct.

Also, add a README-X86-64 entry detailing the missed optimization
opportunity and proposing an alternative approach.

llvm-svn: 50075
2008-04-21 23:59:07 +00:00
Chris Lattner 470ab00c76 A better fix for my previous patch, MOVZQI2PQIrr just requires SSE2.
llvm-svn: 49986
2008-04-20 05:52:46 +00:00
Chris Lattner a124f1e219 Not all x86-64 machines have sse3 apparently.
llvm-svn: 49985
2008-04-20 05:47:56 +00:00
Chris Lattner 50fb77f829 rename *.llx -> *.ll
llvm-svn: 49970
2008-04-19 22:29:10 +00:00
Evan Cheng 7e4a55bc58 Be more careful with insert_subreg and extract_subreg where either source or destination operand has already been coalesced with another register that's defined by a insert_subreg or extract_subreg.
llvm-svn: 49843
2008-04-17 07:58:04 +00:00
Evan Cheng c8c3a899c0 Fix a sub-register indice propagation bug.
llvm-svn: 49832
2008-04-17 00:06:42 +00:00
Evan Cheng 147cb764b5 Don't forget about sub-register indices when rematting instructions.
llvm-svn: 49830
2008-04-16 23:44:44 +00:00
Evan Cheng 7b989d853e Really test what's intended.
llvm-svn: 49802
2008-04-16 18:21:55 +00:00
Evan Cheng e45b8f89c5 Rewrite LiveVariable liveness computation. The new implementation is much simplified. It eliminated the nasty recursive routines and removed the partial def / use bookkeeping. There is also potential for performance improvement by replacing the conservative handling of partial physical register definitions. The code is currently disabled until live interval analysis is taught of the name scheme.
This patch also fixed a couple of nasty corner cases.

llvm-svn: 49784
2008-04-16 09:46:40 +00:00
Dan Gohman d43d3beeb0 Add support for the form of the SSE41 extractps instruction that
puts its result in a 32-bit GPR.

llvm-svn: 49762
2008-04-16 02:32:24 +00:00
Dan Gohman 8c99ccaf96 Recreate the size SDNode instead of reusing the old one in the x86
memcpy lowering code; this ensures that the size node has the desired
result type. This fixes a regression from r49572 with @llvm.memcpy.i64
on x86-32.

llvm-svn: 49761
2008-04-16 01:32:32 +00:00