Commit Graph

715 Commits

Author SHA1 Message Date
Gabor Greif d36e3e8850 improve heuristics to find the 'and' corresponding to 'tst' to also catch opportunities on thumb2
added some doxygen on the way

llvm-svn: 115033
2010-09-29 10:12:08 +00:00
Owen Anderson a3181e2d79 Add a subtarget hook for reporting the misprediction penalty. Use this to provide more precise
cost modeling for if-conversion.  Now if only we had a way to estimate the misprediction probability.

Adjsut CodeGen/ARM/ifcvt10.ll.  The pipeline on Cortex-A8 is long enough that it is still profitable
to predicate an ldm, but the shorter pipeline on Cortex-A9 makes it unprofitable.

llvm-svn: 114995
2010-09-28 21:57:50 +00:00
Anton Korobeynikov 81bdc93bbb User proper libcall names & condcodes while compiling for ARM EABI.
Patch by Evzen Muller!

llvm-svn: 114991
2010-09-28 21:39:26 +00:00
Owen Anderson 88af7d00fc Part one of switching to using a more sane heuristic for determining if-conversion profitability.
Rather than having arbitrary cutoffs, actually try to cost model the conversion.

For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.

llvm-svn: 114973
2010-09-28 18:32:13 +00:00
Bob Wilson 3dc97324c1 Add a command line option "-arm-strict-align" to disallow unaligned memory
accesses for ARM targets that would otherwise allow it.  Radar 8465431.

llvm-svn: 114941
2010-09-28 04:09:35 +00:00
Jakob Stoklund Olesen 415a7a6fec Revert "Disable codegen prepare critical edge splitting. Machine instruction passes now"
This reverts revision 114633. It was breaking llvm-gcc-i386-linux-selfhost.

It seems there is a downstream bug that is exposed by
-cgp-critical-edge-splitting=0. When that bug is fixed, this patch can go back
in.

Note that the changes to tailcallfp2.ll are not reverted. They were good are
required.

llvm-svn: 114859
2010-09-27 18:43:48 +00:00
Jakob Stoklund Olesen 4f3443e74d Explicitly disable CGP critical edge splitting for this test so it won't break
by reenabling it temporarily.

llvm-svn: 114858
2010-09-27 18:43:43 +00:00
Jakob Stoklund Olesen f2a279b902 Don't depend on basic block numbering.
llvm-svn: 114857
2010-09-27 18:43:40 +00:00
Evan Cheng dbcc4b4d4d Enable code placement optimization pass for ARM.
llvm-svn: 114746
2010-09-24 19:07:23 +00:00
Bob Wilson 7fbbe9a43a Set alignment operand for NEON VST instructions.
llvm-svn: 114709
2010-09-23 23:42:37 +00:00
Bob Wilson 9eeb890172 Set alignment operand for NEON VLD instructions.
llvm-svn: 114696
2010-09-23 21:43:54 +00:00
Evan Cheng 794aaa79e2 Disable codegen prepare critical edge splitting. Machine instruction passes now
break critical edges on demand.

llvm-svn: 114633
2010-09-23 06:55:34 +00:00
Evan Cheng d757c88bba OptimizeCompareInstr should avoid iterating pass the beginning of the MBB when the 'and' instruction is after the comparison.
llvm-svn: 114506
2010-09-21 23:49:07 +00:00
Jim Grosbach 94dfd6fc4f Simplify ARM callee-saved register handling by removing the distinction
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.

For example, previously we would generate code like:
        push    {r4, r5, r6, r7, lr}
        add     r7, sp, #12
        stmdb   sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
        push    {r4, r5, r6, r7, r8, r10, r11, lr}
        add     r7, sp, #12

rdar://8445635

llvm-svn: 114340
2010-09-20 19:32:20 +00:00
Bob Wilson cb6db98897 Add target-specific DAG combiner for BUILD_VECTOR and VMOVRRD. An i64
value should be in GPRs when it's going to be used as a scalar, and we use
VMOVRRD to make that happen, but if the value is converted back to a vector
we need to fold to a simple bit_convert.  Radar 8407927.

llvm-svn: 114233
2010-09-17 22:59:05 +00:00
Jim Grosbach 7a6c37d3e7 Teach the (non-MC) instruction printer to use the cannonical names for push/pop,
and shift instructions on ARM. Update the tests to match.

llvm-svn: 114230
2010-09-17 22:36:38 +00:00
Jim Grosbach 6d800f88da Update tests to handle MC-inst instruction printing of shift operations. The
legacy asm printer uses instructions of the form, "mov r0, r0, lsl #3", while
the MC-instruction printer uses the form "lsl r0, r0, #3". The latter mnemonic
is correct and preferred according the ARM documentation (A8.6.98). The former
are pseudo-instructions for the latter.

llvm-svn: 114221
2010-09-17 21:58:46 +00:00
Jim Grosbach 4a5e54021a FileCheck-ize
llvm-svn: 114218
2010-09-17 21:46:16 +00:00
Jim Grosbach 20da4e360b Move thumb2 tests to the thumb2 directory
llvm-svn: 114206
2010-09-17 20:34:09 +00:00
Jim Grosbach 9b0cd20f72 tweak test to check instructions rather than relying on the comment string
llvm-svn: 114204
2010-09-17 20:27:26 +00:00
Jim Grosbach f3ceecec7e tweak test to check instructions rather than relying on the comment string
llvm-svn: 114200
2010-09-17 20:21:03 +00:00
Jim Grosbach c18a460adc tweak test to check instructions rather than relying on the comment string
llvm-svn: 114199
2010-09-17 20:17:41 +00:00
Bob Wilson 660d7ecf32 Reapply Gabor's 113839, 113840, and 113876 with a fix for a problem
encountered while building llvm-gcc for arm.  This is probably the same issue
that the ppc buildbot hit. llvm::prior works on a MachineBasicBlock::iterator,
not a plain MachineInstr.

llvm-svn: 113983
2010-09-15 17:12:08 +00:00
Gabor Greif 9ae4b271f2 the darwin9-powerpc buildbot keeps consistently crashing,
backing out following to get it back to green,
so I can investigate in peace:

svn merge -c -113840  llvm/test/CodeGen/ARM/arm-and-tst-peephole.ll
svn merge -c -113876 -c -113839 llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp

llvm-svn: 113980
2010-09-15 16:53:07 +00:00
Gabor Greif 00e34f4b32 forgot the testcase change for r113839
llvm-svn: 113840
2010-09-14 09:30:17 +00:00
Gabor Greif 5dbe800203 test for and-tst peephole optimization
documents the status-quo with its opportunities

llvm-svn: 113838
2010-09-14 08:50:43 +00:00
Owen Anderson c237a849e3 Re-apply r113679, which was reverted in r113720, which added a paid of new instcombine transforms
to expose greater opportunities for store narrowing in codegen.  This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.

llvm-svn: 113763
2010-09-13 17:59:27 +00:00
Eric Christopher 26abd3e0c2 Revert 113679, it was causing an infinite loop in a testcase that I've sent
on to Owen.

llvm-svn: 113720
2010-09-12 06:09:23 +00:00
Evan Cheng 1d6aa46cd7 Fix test so it passes on non-Darwin hosts.
llvm-svn: 113577
2010-09-10 06:20:01 +00:00
Bob Wilson 8617234658 Fix merging base-updates for VLDM/VSTM: Before I switched these instructions
to use AddrMode4, there was a count of the registers stored in one of the
operands.  I changed that to just count the operands but forgot to adjust for
the size of D registers.  This was noticed by Evan as a performance problem
but it is a potential correctness bug as well, since it is possible that this
could merge a base update with a non-matching immediate.

llvm-svn: 113576
2010-09-10 05:15:04 +00:00
Evan Cheng bf4070756f Teach if-converter to be more careful with predicating instructions that would
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.

llvm-svn: 113570
2010-09-10 01:29:16 +00:00
Eric Christopher ca2ec95154 Remove ssp from this test.
llvm-svn: 113392
2010-09-08 19:32:34 +00:00
Bob Wilson f65c9ef720 Replace NEON vabdl, vaba, and vabal intrinsics with combinations of the
vabd intrinsic and add and/or zext operations.  In the case of vaba, this
also avoids the need for a DAG combine pattern to combine vabd with add.
Update tests.  Auto-upgrade the old intrinsics.

llvm-svn: 112941
2010-09-03 01:35:08 +00:00
Sandeep Patel 0ca17f7e8a Fix an unnecessary XFAIL
llvm-svn: 112853
2010-09-02 20:19:24 +00:00
Jim Grosbach 66c681a644 Now that register allocation properly considers reserved regs, simplify the
ARM register class allocation order functions to take advantage of that.

llvm-svn: 112841
2010-09-02 18:14:29 +00:00
Bob Wilson 75a6408f88 Convert VLD1 and VLD2 instructions to use pseudo-instructions until
after regalloc.

llvm-svn: 112825
2010-09-02 16:00:54 +00:00
Bob Wilson 38ab35a911 Remove NEON vmull, vmlal, and vmlsl intrinsics, replacing them with multiply,
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests.  Add auto-upgrade support for the old intrinsics.

llvm-svn: 112773
2010-09-01 23:50:19 +00:00
Chris Lattner 39eccb4754 temporarily revert r112664, it is causing a decoding conflict, and
the testcases should be merged.

llvm-svn: 112711
2010-09-01 16:00:50 +00:00
Bill Wendling 6789f8b6ae We have a chance for an optimization. Consider this code:
int x(int t) {
  if (t & 256)
    return -26;
  return 0;
}

We generate this:

     tst.w   r0, #256
     mvn     r0, #25
     it      eq
     moveq   r0, #0

while gcc generates this:

     ands    r0, r0, #256
     it      ne
     mvnne   r0, #25
     bx      lr

Scandalous really!

During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):

  %r0 = ISD::AND ...
  ARMISD::CMPZ %r0, 0         @ sets [CPSR]
  %r0 = ARMISD::MOVCC 0, -26  @ reads [CPSR]

All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!

llvm-svn: 112664
2010-08-31 22:41:22 +00:00
Anton Korobeynikov 3a1d87a7ba Fix borken test
llvm-svn: 112555
2010-08-30 23:41:49 +00:00
Bob Wilson 4cd8a126c3 Remove NEON vmovn intrinsic, replacing it with vector truncate operations.
Auto-upgrade the old intrinsic and update tests.

llvm-svn: 112507
2010-08-30 20:02:30 +00:00
Duncan Sands 68c30907cc Correct bogus module triple specifications.
llvm-svn: 112469
2010-08-30 10:48:29 +00:00
Bob Wilson d0c054886c Remove NEON vaddl, vaddw, vsubl, and vsubw intrinsics. Instead, use llvm
IR add/sub operations with one or both operands sign- or zero-extended.
Auto-upgrade the old intrinsics.

llvm-svn: 112416
2010-08-29 05:57:34 +00:00
Bob Wilson 13ce07fa92 Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like
all the other LDM/STM instructions.  This fixes asm printer crashes when
compiling with -O0.  I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier.  Much of the backend
was not aware of these special cases.  The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode.  I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON.  Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322
2010-08-27 23:18:17 +00:00
Bob Wilson edf722add3 Add alignment arguments to all the NEON load/store intrinsics.
Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271
2010-08-27 17:13:24 +00:00
Bob Wilson 4629f423f8 Revert svn 107892 (with changes to work with trunk). It caused a crash if
a VLD result was not used (Radar 8355607).  It should also fix pr7988, but
I haven't verified that yet.

llvm-svn: 112118
2010-08-26 00:13:36 +00:00
Eric Christopher 6b1533a1a9 Add another basic test cribbed from the x86 fast-isel tests.
llvm-svn: 112036
2010-08-25 07:57:29 +00:00
Eric Christopher 37d547aee6 Run this on thumb and arm.
llvm-svn: 112035
2010-08-25 07:53:15 +00:00
Eric Christopher e58c03698e Make this testcase actually executed with fast-isel on arm.
llvm-svn: 112033
2010-08-25 07:47:00 +00:00
Bob Wilson be745d8c00 Replace some NEON vmovl intrinsic that I missed earlier.
llvm-svn: 111696
2010-08-20 23:22:43 +00:00
Bob Wilson 9a511c07e4 Replace the arm.neon.vmovls and vmovlu intrinsics with vector sign-extend and
zero-extend operations.

llvm-svn: 111614
2010-08-20 04:54:02 +00:00
Dan Gohman 82656fb0e1 When sending stats output to stdout for grepping, don't emit normal
output to standard output also.

llvm-svn: 111435
2010-08-18 22:22:44 +00:00
Bob Wilson fb7eaff759 Expand ZERO_EXTEND operations for NEON vector types.
Testcase from Nick Lewycky.

llvm-svn: 111341
2010-08-18 01:45:52 +00:00
Bob Wilson 942b10f511 Change ARM PKHTB and PKHBT instructions to use a shift_imm operand to avoid
printing "lsl #0".  This fixes the remaining parts of pr7792.  Make
corresponding changes for encoding/decoding these instructions.

llvm-svn: 111251
2010-08-17 17:23:19 +00:00
Bob Wilson 411dfad981 Allow more cases of undef shuffle indices and add tests for them.
llvm-svn: 111226
2010-08-17 05:54:34 +00:00
Evan Cheng f259efde47 PHI elimination should not break back edge. It can cause some significant code placement issues. rdar://8263994
good:
LBB0_2:
  mov     r2, r0
  . . .
  mov     r1, r2
  bne     LBB0_2

bad:
LBB0_2:
  mov     r2, r0
  . . .
@ BB#3:
  mov     r1, r2
  b       LBB0_2

llvm-svn: 111221
2010-08-17 01:20:36 +00:00
Bob Wilson eee4824f74 Add a testcase for svn 111208.
llvm-svn: 111212
2010-08-16 23:44:29 +00:00
Bob Wilson 804f6159f1 Generalize a pattern for PKHTB: an SRL of 16-31 bits will guarantee
that the high halfword is zero.  The shift need not be exactly 16 bits.

llvm-svn: 111196
2010-08-16 22:26:55 +00:00
Bob Wilson 8f553757c4 Convert a test to use FileCheck.
llvm-svn: 111153
2010-08-16 17:05:27 +00:00
Bob Wilson 3c9ed76ba5 Temporarily disable tail calls on ARM to work around some linker problems.
llvm-svn: 111050
2010-08-13 22:43:33 +00:00
Bill Wendling 6a98131468 Consider this code snippet:
float t1(int argc) {
  return (argc == 1123) ? 1.234f : 2.38213f;
}

We would generate truly awful code on ARM (those with a weak stomach should look
away):

_t1:
  movw   r1, #1123
  movs   r2, #1
  movs   r3, #0
  cmp    r0, r1
  mov.w  r0, #0
  it     eq
  moveq  r0, r2
  movs   r1, #4
  cmp    r0, #0
  it     ne
  movne  r3, r1
  adr    r0, #LCPI1_0
  ldr    r0, [r0, r3]
  bx     lr

The problem was that legalization was creating a cascade of SELECT_CC nodes, for
for the comparison of "argc == 1123" which was fed into a SELECT node for the ?:
statement which was itself converted to a SELECT_CC node. This is because the
ARM back-end doesn't have custom lowering for SELECT nodes, so it used the
default "Expand".

I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this
testcase, but can obviously be expanded to include more cases.

Now we generate this, which looks optimal to me:

_t1:
  movw   r1, #1123
  movs   r2, #0
  cmp    r0, r1
  adr    r0, #LCPI0_0
  it     eq
  moveq  r2, #4
  ldr    r0, [r0, r2]
  bx     lr
  .align  2
LCPI0_0:
  .long   1075344593  @ float 2.382130e+00
  .long   1067316150  @ float 1.234000e+00

llvm-svn: 110799
2010-08-11 08:43:16 +00:00
Evan Cheng 5190f09291 Report error if codegen tries to instantiate a ARM target when the cpu does support it. e.g. cortex-m* processors.
llvm-svn: 110798
2010-08-11 07:17:46 +00:00
Bill Wendling 79937dfc5b Update test to match output of optimize compares for ARM.
llvm-svn: 110765
2010-08-11 01:05:02 +00:00
Bill Wendling 871d4e1170 The optimize comparisons pass removes the "cmp" instruction this is checking for.
llvm-svn: 110739
2010-08-10 22:16:05 +00:00
Rafael Espindola 027d5bcf89 Fix eabi calling convention when a 64 bit value shadows r3.
Without this what was happening was:

* R3 is not marked as "used"
* ARM backend thinks it has to save it to the stack because of vaarg
* Offset computation correctly ignores it
* Offsets are wrong

llvm-svn: 110446
2010-08-06 15:35:32 +00:00
Bill Wendling 26feb849a4 Testcase for r110248.
llvm-svn: 110249
2010-08-04 21:56:30 +00:00
Bob Wilson 79daf7e0ae Combine NEON VABD (absolute difference) intrinsics with ADDs to make VABA
(absolute difference with accumulate) intrinsics.  Radar 8228576.

llvm-svn: 110170
2010-08-04 00:12:08 +00:00
Anton Korobeynikov 6bcea068db Currently EH lowering code expects typeinfo to be global only.
This assumption is not satisfied due to global mergeing.
Workaround the issue by temporary disablinge mergeing of const globals.
Also, ignore LLVM "special" globals. This fixes PR7716

llvm-svn: 109423
2010-07-26 18:45:39 +00:00
Evan Cheng df907f4594 - Allow target to specify when is register pressure "too high". In most cases,
it's too late to start backing off aggressive latency scheduling when most
  of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
  For ARM, this is almost always a win on # of instructions. It's runtime
  neutral for most of the tests. But for some kernels with high register
  pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
  54 and sped up by 20%.

llvm-svn: 109279
2010-07-23 22:39:59 +00:00
Evan Cheng 285903853f More register pressure aware scheduling work.
llvm-svn: 109064
2010-07-21 23:53:58 +00:00
Eric Christopher 84bdfd80df Baby steps towards ARM fast-isel.
llvm-svn: 109047
2010-07-21 22:26:11 +00:00
Rafael Espindola 4277e14dc4 Fix calling convention on ARM if vfp2+ is enabled.
llvm-svn: 109009
2010-07-21 11:38:30 +00:00
Jim Grosbach b97e2bbe32 Add combiner patterns to more effectively utilize the BFI (bitfield insert)
instruction for non-constant operands. This includes the case referenced
in the README.txt regarding a bitfield copy.

llvm-svn: 108608
2010-07-17 03:30:54 +00:00
Jim Grosbach 11013eda5a Add basic support to code-gen the ARM/Thumb2 bit-field insert (BFI) instruction
and a combine pattern to use it for setting a bit-field to a constant
value. More to come for non-constant stores.

llvm-svn: 108570
2010-07-16 23:05:05 +00:00
Evan Cheng 55f0c6b9fc Split -enable-finite-only-fp-math to two options:
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.

llvm-svn: 108465
2010-07-15 22:07:12 +00:00
Jim Grosbach a90af1ba38 Improve 64-subtraction of immediates when parts of the immediate can fit
in the literal field of an instruction. E.g.,
long long foo(long long a) {
  return a - 734439407618LL;
}

rdar://7038284

llvm-svn: 108339
2010-07-14 17:45:16 +00:00
Bob Wilson bad47f62f6 Add support for NEON VMVN immediate instructions.
llvm-svn: 108324
2010-07-14 06:31:50 +00:00
Bob Wilson 103a0dcfe1 Add an ARM-specific DAG combining to avoid redundant VDUPLANE nodes.
Radar 7373643.

llvm-svn: 108303
2010-07-14 01:22:12 +00:00
Bob Wilson a3f1901531 Use a target-specific VMOVIMM DAG node instead of BUILD_VECTOR to represent
NEON VMOV-immediate instructions.  This simplifies some things.

llvm-svn: 108275
2010-07-13 21:16:48 +00:00
Evan Cheng 0cc4ad983d Extend the r107852 optimization which turns some fp compare to code sequence using only i32 operations. It now optimize some f64 compares when fp compare is exceptionally slow (e.g. cortex-a8). It also catches comparison against 0.0.
llvm-svn: 108258
2010-07-13 19:27:42 +00:00
Rafael Espindola a76eccf815 Fix va_arg for doubles. With this patch VAARG nodes always contain the
correct alignment information, which simplifies ExpandRes_VAARG a bit.

The patch introduces a new alignment information to TargetLoweringInfo. This is
needed since the two natural candidates cannot be used:

* The 's' in target data: If this is set to the minimal alignment of any
  argument, getCallFrameTypeAlignment would return 4 for doubles on ARM for
  example.
* The getTransientStackAlignment method. It is possible for an architecture to
  have argument less aligned than what we maintain the stack pointer.

llvm-svn: 108072
2010-07-11 04:01:49 +00:00
Jim Grosbach 2a5725b1a3 In the presence of variable sized objects, allocate an emergency spill slot.
rdar://8131327

llvm-svn: 108008
2010-07-09 20:27:06 +00:00
Jakob Stoklund Olesen a57965827f Fix test to be less sensitive of regalloc accidents
llvm-svn: 107951
2010-07-09 01:32:11 +00:00
Bob Wilson 88a4e6dc0e Print "dregpair" NEON operands with a space between them, for readability and
consistency with other instructions that have lists of register operands.

llvm-svn: 107944
2010-07-09 00:47:20 +00:00
Bob Wilson 21eed476e8 Reenable DAG combining for vector shuffles. It looks like it was temporarily
disabled and then never turned back on again.  Adjust some tests, one because
this change avoids an unnecessary instruction, and the other to make it
continue testing what it was intended to test.

llvm-svn: 107941
2010-07-09 00:38:12 +00:00
Evan Cheng 0f54854a1d Check for FiniteOnlyFPMath as well.
llvm-svn: 107904
2010-07-08 20:12:24 +00:00
Evan Cheng be1f7a931e r107852 is only safe with -enable-unsafe-fp-math to account for +0.0 == -0.0.
llvm-svn: 107856
2010-07-08 06:01:49 +00:00
Evan Cheng 25f9364cbd Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met:
1. The arguments are f32.
2. The arguments are loads and they have no uses other than the comparison.
3. The comparison code is EQ or NE.

e.g.
        vldr.32 s0, [r1]
        vldr.32 s1, [r0]
        vcmpe.f32       s1, s0
        vmrs    apsr_nzcv, fpscr
	beq     LBB0_2
=>
        ldr     r1, [r1]
        ldr     r0, [r0]
        cmp     r0, r1
        beq     LBB0_2

More complicated cases will be implemented in subsequent patches.

llvm-svn: 107852
2010-07-08 02:08:50 +00:00
Dale Johannesen e2289285ae Changes to ARM tail calls, mostly cosmetic.
Add explicit testcases for tail calls within the same module.
Duplicate some code to humor those who think .w doesn't apply on ARM.
Leave this disabled on Thumb1, and add some comments explaining why it's hard
and won't gain much.

llvm-svn: 107851
2010-07-08 01:18:23 +00:00
Rafael Espindola 7c510aa7bc Don't create neon moves in CopyRegToReg. NEONMoveFixPass will do the conversion
if profitable.

llvm-svn: 107673
2010-07-06 16:24:34 +00:00
Bob Wilson 771d04b969 Fix incorrect asm-printing of some NEON immediates. Fix weak testcase so
that it checks the immediate values, not just the instructions opcodes.
Radar 8110263.

llvm-svn: 107487
2010-07-02 17:23:44 +00:00
Bill Wendling 03bcd6ecc8 Implement the "linker_private_weak" linkage type. This will be used for
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.

For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:

      .globl l_objc_msgSend_fixup_alloc
      .weak_definition l_objc_msgSend_fixup_alloc
      .section __DATA, __objc_msgrefs, coalesced
      .align 3
l_objc_msgSend_fixup_alloc:
       .quad   _objc_msgSend_fixup
       .quad   L_OBJC_METH_VAR_NAME_1

This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".

Currently only supported on Darwin platforms.

llvm-svn: 107433
2010-07-01 21:55:59 +00:00
Jakob Stoklund Olesen dadea5b178 Fix the handling of partial redefines in the fast register allocator.
A partial redefine needs to be treated like a tied operand, and the register
must be reloaded while processing use operands.

This fixes a bug where partially redefined registers were processed as normal
defs with a reload added. The reload could clobber another use operand if it was
a kill that allowed register reuse.

llvm-svn: 107193
2010-06-29 19:15:30 +00:00
Bob Wilson d91d5bfc95 Fix a register scavenger crash when dealing with undefined subregs.
The LowerSubregs pass needs to preserve implicit def operands attached to
EXTRACT_SUBREG instructions when it replaces those instructions with copies.

llvm-svn: 107189
2010-06-29 18:42:49 +00:00
Rafael Espindola 38a7d7cbc3 Add a VT argument to getMinimalPhysRegClass and replace the copy related uses
of getPhysicalRegisterRegClass with it.

If we want to make a copy (or estimate its cost), it is better to use the
smallest class as more efficient operations might be possible.

llvm-svn: 107140
2010-06-29 14:02:34 +00:00
Bob Wilson 269a89fd3a Unlike other targets, ARM now uses BUILD_VECTORs post-legalization so they
can't be changed arbitrarily by the DAGCombiner without checking if it is
running after legalization.

llvm-svn: 107097
2010-06-28 23:40:25 +00:00
Rafael Espindola 2041abd958 When splitting a VAARG, remember its alignment.
This produces terrible but correct code.

llvm-svn: 106952
2010-06-26 18:22:20 +00:00
Daniel Dunbar acbdf53db4 Thumb2ITBlockPass: Fix a possible dereference of an invalid iterator. This was
introduced in r106343, but only showed up recently (with a particular compiler &
linker combination) because of the particular check, and because we have no
builtin checking for dereferencing the end of an array, which is truly
unfortunate.

llvm-svn: 106908
2010-06-25 23:14:54 +00:00
Evan Cheng 02b184de5b Change if-conversion block size limit checks to add some flexibility.
llvm-svn: 106901
2010-06-25 22:42:03 +00:00
Dan Gohman 9a2f0473b2 Teach EmitLiveInCopies to omit copies for unused virtual registers,
and to clean up unused incoming physregs from the live-in list.

llvm-svn: 106805
2010-06-24 22:23:02 +00:00