Commit Graph

1073 Commits

Author SHA1 Message Date
Tanya Lattner 1d11720ae4 Handle perfect shuffle case that generates a vrev for vectors of floats.
Add test case.

llvm-svn: 131582
2011-05-18 21:44:54 +00:00
Tanya Lattner 48b182c3a4 In r131488 I misunderstood how VREV works. It splits the vector in half and splits each half. Therefore, the real problem was that we were using a VREV64 for a 4xi16, when we should have been using a VREV32.
Updated test case and reverted change to the PerfectShuffle Table.

llvm-svn: 131529
2011-05-18 06:42:21 +00:00
Tanya Lattner c7e291b354 vrev is incorrectly defined in the perfect shuffle table. The ordering is backwards (should be 0x3210 versus 0x1032) which exposed a bug when doing a shuffle on a 4xi16. I've attached a test case.
llvm-svn: 131488
2011-05-17 20:48:40 +00:00
Jakob Stoklund Olesen 4edf17d91f Teach LiveInterval::isZeroLength about null SlotIndexes.
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.

This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic.  Prioritizing
register allocation according to spill weight can cause more registers to be
used.

llvm-svn: 131436
2011-05-16 23:50:05 +00:00
Galina Kistanova 9e56e51fab Correction. Use explicit target triple in the test.
llvm-svn: 131252
2011-05-12 21:55:34 +00:00
Nadav Rotem 8a7beb80f0 Fixes a bug in the DAGCombiner. LoadSDNodes have two values (data, chain).
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.

llvm-svn: 131183
2011-05-11 14:40:50 +00:00
Rafael Espindola 19c1a56287 Produce a __debug_frame section on darwin ARM when appropriate.
llvm-svn: 131151
2011-05-10 21:04:45 +00:00
Dan Gohman dd550305e6 Give this test an explicit register allocator, so that it can work even if
the default register allocator is changed.

llvm-svn: 130883
2011-05-04 23:14:02 +00:00
Bill Wendling 2a40131f6b SjLj EH could produce a machine basic block that legitimately has more than one
landing pad as its successor.

SjLj exception handling jumps to the correct landing pad via a switch statement
that's generated right before code-gen. Loosen the constraint in the machine
instruction verifier to allow for this. Note, this isn't the most rigorous check
since we cannot determine where that switch statement came from. But it's
marginally better than turning this check off when SjLj exceptions are used.
<rdar://problem/9187612>

llvm-svn: 130881
2011-05-04 22:54:05 +00:00
Galina Kistanova e53ae508ec This test fails on ARM. The test shouldn't explicitly specify alignment (and alignment 4 is wrong) and requires hard-float.
llvm-svn: 130875
2011-05-04 21:57:44 +00:00
Devang Patel 39ecf816c5 Do not emit location expression size twice.
llvm-svn: 130854
2011-05-04 19:00:57 +00:00
Jakob Stoklund Olesen 51b35f7bb1 Fix a bunch of ARM tests to be register allocation independent.
llvm-svn: 130800
2011-05-03 22:31:21 +00:00
Evan Cheng 93b5cdc5ab Make the test less likely to fail with minor changes.
llvm-svn: 130778
2011-05-03 19:09:32 +00:00
Bob Wilson c5242b0e78 Remove test for iOS divmod function, since that is disabled for now.
llvm-svn: 130769
2011-05-03 17:54:49 +00:00
Bruno Cardoso Lopes 168c9005b5 Add a few ARM coprocessor intrinsics. Testcases included
llvm-svn: 130763
2011-05-03 17:29:22 +00:00
Dan Gohman 6136e94897 Add an unfolded offset field to LSR's Formula record. This is used to
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.

llvm-svn: 130743
2011-05-03 00:46:49 +00:00
Jakob Stoklund Olesen edfabc9aad Weekly fix of register allocation dependent unit tests.
llvm-svn: 130567
2011-04-30 01:37:52 +00:00
Eli Friedman 4105ed1523 Make FastEmit_ri_ try a bit harder to succeed for supported operations; FastEmit_i can fail for non-Thumb2 ARM. Makes ARMSimplifyAddress work correctly, and reduces the number of fast-isel bailouts on non-Thumb ARM.
llvm-svn: 130560
2011-04-29 23:34:52 +00:00
Eli Friedman 328bad02fa Switch to ImmLeaf (which can be used by FastISel) for a few more common ARM/Thumb2 patterns.
llvm-svn: 130552
2011-04-29 22:48:03 +00:00
Eli Friedman dd937843d3 Fix run-line, again. :(
llvm-svn: 130540
2011-04-29 21:33:03 +00:00
Eli Friedman 86caced370 Re-committing r130454, which does not in fact break anything.
Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130539
2011-04-29 21:22:56 +00:00
Eric Christopher 8d46b47787 Add trunc->branch support, this won't help with clang's i8->i1 truncations
for bools, but is a start.

llvm-svn: 130534
2011-04-29 20:02:39 +00:00
Eli Friedman 517728b1ae Revert r130454; apparently this doesn't actually work.
llvm-svn: 130462
2011-04-28 23:55:14 +00:00
Eli Friedman 37b9ede969 Fix runline.
llvm-svn: 130455
2011-04-28 23:12:24 +00:00
Eli Friedman e4ecd42926 Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130454
2011-04-28 23:03:25 +00:00
Devang Patel 3e021533cd Teach dwarf writer to handle complex address expression for .debug_loc entries.
This fixes clang generated blocks' variables' debug info.
Radar 9279956.

llvm-svn: 130373
2011-04-28 02:22:40 +00:00
Evan Cheng 9808d31b9e If converter was being too cute. It look for root BBs (which don't have
successors) and use inverse depth first search to traverse the BBs. However
that doesn't work when the CFG has infinite loops. Simply do a linear
traversal of all BBs work just fine.

rdar://9344645

llvm-svn: 130324
2011-04-27 19:32:43 +00:00
Jakob Stoklund Olesen 71d3b895ba Also add <imp-def> operands for defined and dead super-registers when rewriting.
We cannot rely on the <imp-def> operands added by LiveIntervals in all cases as
demonstrated by the test case.

llvm-svn: 130313
2011-04-27 17:42:31 +00:00
Evan Cheng 1355bbdd11 Be careful about scheduling nodes above previous calls. It increase usages of
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.

Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.

rdar://9329627

llvm-svn: 130245
2011-04-26 21:31:35 +00:00
Evan Cheng dbb86b8108 This test should be in MC. It breaks with changes to scheduling / register allocation so it's being removed.
llvm-svn: 130243
2011-04-26 21:09:04 +00:00
Chris Lattner 189ca1498f don't emit the symbol name twice for local bss and common
symbols.  For example, don't emit:
        .comm   _i,4,2                  ## @i
                                        ## @i

instead emit:
        .comm   _i,4,2                  ## @i

llvm-svn: 130192
2011-04-26 06:14:13 +00:00
Eric Christopher 238a21f2d5 Make this test disable fast isel as it's not needed.
llvm-svn: 130165
2011-04-25 22:39:46 +00:00
Benjamin Kramer ba446cc12a Make tests more useful.
lit needs a linter ...

llvm-svn: 130126
2011-04-25 10:12:01 +00:00
Andrew Trick 0ed5778a1e Thumb2 and ARM add/subtract with carry fixes.
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.

Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.

Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).

llvm-svn: 130048
2011-04-23 03:55:32 +00:00
Devang Patel 94ad6ac13c Fix DWARF description of Q registers.
llvm-svn: 129952
2011-04-21 23:22:35 +00:00
Devang Patel 3712c14be9 Fix DWARF description of S registers.
llvm-svn: 129947
2011-04-21 22:48:26 +00:00
Devang Patel be22131c28 Test case for r129922
llvm-svn: 129934
2011-04-21 20:16:43 +00:00
Evan Cheng 5f1ba4cd2d Remove -use-divmod-libcall. Let targets opt in when they are available.
llvm-svn: 129884
2011-04-20 22:20:12 +00:00
Eric Christopher bcaedb5ce0 Rewrite the expander for umulo/smulo to remember to sign extend the input
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).

Fixes rdar://9292577

llvm-svn: 129842
2011-04-20 01:19:45 +00:00
Daniel Dunbar 4a7783b0c2 CodeGen: Eliminate a use of getDarwinMajorNumber().
- There is a minor semantic change here (evidenced by the test change) for
   Darwin triples that have no version component. I debated changing the default
   behavior of isOSVersionLT, but decided it made more sense for triples to be
   explicit.

llvm-svn: 129802
2011-04-19 20:32:39 +00:00
Bob Wilson 0858c3aaed This patch combines several changes from Evan Cheng for rdar://8659675.
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Enable these fp vmlx codegen changes for Cortex-A9.

llvm-svn: 129775
2011-04-19 18:11:57 +00:00
Bob Wilson d04a83f8f2 Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637.
llvm-svn: 129774
2011-04-19 18:11:52 +00:00
Bob Wilson a2881ee8a4 Avoid some 's' 16-bit instruction which partially update CPSR
(and add false dependency) when it isn't dependent on last CPSR defining
instruction. rdar://8928208

llvm-svn: 129773
2011-04-19 18:11:49 +00:00
Bob Wilson df612ba006 Avoid write-after-write issue hazards for Cortex-A9.
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.

This is currently disabled by default.  We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.

llvm-svn: 129772
2011-04-19 18:11:45 +00:00
Jakob Stoklund Olesen fb1249548f Tighten test case a bit.
Ideally, we would match an S-register to its containing D-register, but that
requires arithmetic (divide by 2).

llvm-svn: 129756
2011-04-19 06:14:45 +00:00
Jakob Stoklund Olesen bf78618db6 Make tests register allocation independent again.
llvm-svn: 129739
2011-04-19 00:14:43 +00:00
Evan Cheng 4079133796 Do not lose mem_operands while lowering VLD / VST intrinsics.
llvm-svn: 129738
2011-04-19 00:04:03 +00:00
Eric Christopher c37aa0b26a Fix a bug where we were counting the alias sets as completely used
registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.

Fixes rdar://9207598

llvm-svn: 129711
2011-04-18 19:26:25 +00:00
Evan Cheng b14ce09fca Fix divmod libcall lowering. Convert to {S|U}DIVREM first and then expand the node to a libcall. rdar://9280991
llvm-svn: 129633
2011-04-16 03:08:26 +00:00
Cameron Zwarich 9c65e4d69c Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate
a case involving EOR, so I only added a test for ORR.

llvm-svn: 129610
2011-04-15 21:24:38 +00:00
Cameron Zwarich 0829b3065a The AND instruction leaves the V flag unmodified, so it falls victim to the same
problem as all of the other instructions we fold with CMPs.

llvm-svn: 129602
2011-04-15 20:45:00 +00:00
Cameron Zwarich 93eae1571c Add missing register forms of instructions to the ARM CMP-folding code. This
fixes <rdar://problem/9287901>.

llvm-svn: 129599
2011-04-15 20:28:28 +00:00
Evan Cheng 12bb05b75b Fix another fcopysign lowering bug. If src is f64 and destination is f32, don't
forget to right shift the source by 32 first. rdar://9287902

llvm-svn: 129556
2011-04-15 01:31:00 +00:00
Cameron Zwarich 415b5e8341 Fix a typo in an ARM-specific DAG combine. This fixes <rdar://problem/9278274>.
llvm-svn: 129468
2011-04-13 21:01:19 +00:00
Cameron Zwarich 70be27e913 Fix an obvious problem with an alignment computation. AsmPrinter actually does
the max itself, so it is not easy to write a test case for this, but I added a
test case that would fail if the code in AsmPrinter were removed.

llvm-svn: 129432
2011-04-13 09:02:43 +00:00
Cameron Zwarich cdf59f7016 If a global variable has a specified alignment that is less than the preferred
alignment for its type, use the minimum of the specified alignment and the ABI
alignment. This fixes <rdar://problem/9275290>.

llvm-svn: 129428
2011-04-13 06:03:16 +00:00
Andrew Trick b53a00d2cb Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.

Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129421
2011-04-13 00:38:32 +00:00
Eric Christopher 28f4c729f7 Temporarily revert r129408 to see if it brings the bots back.
llvm-svn: 129417
2011-04-13 00:20:59 +00:00
Eric Christopher d829f43c06 Fix a bug where we were counting the alias sets as completely used
registers for fast allocation.

Fixes rdar://9207598

llvm-svn: 129408
2011-04-12 23:23:14 +00:00
Andrew Trick 1b60ad6644 Revert 129383. It causes some targets to hit a scheduler assert.
llvm-svn: 129385
2011-04-12 20:14:07 +00:00
Andrew Trick c5dd24a542 PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129383
2011-04-12 19:54:36 +00:00
Cameron Zwarich fbcd69b96a Split a store of a VMOVDRR into two integer stores to avoid mixing NEON and ARM
stores of arguments in the same cache line. This fixes the second half of
<rdar://problem/8674845>.

llvm-svn: 129345
2011-04-12 02:24:17 +00:00
Evan Cheng ef42bea704 Look pass copies when determining whether hoisting would end up inserting more copies. rdar://9266679
llvm-svn: 129297
2011-04-11 21:09:18 +00:00
Chris Lattner ea6afab4b0 remove a bunch of CHECK lines that aren't checking what
they thought they were, because alternation was expanding
wrong in {{}}'s.

llvm-svn: 129194
2011-04-09 06:31:06 +00:00
Chris Lattner 1c42a4d159 don't test for codegen of 'store undef'
llvm-svn: 129184
2011-04-09 02:31:26 +00:00
Evan Cheng 74d92c1924 Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time.
llvm-svn: 129152
2011-04-08 21:37:21 +00:00
Evan Cheng 9a3f2772f0 Add option to emit @llvm.trap as a function call instead of a trap instruction. rdar://9249183.
llvm-svn: 129107
2011-04-07 20:31:12 +00:00
Andrew Trick 2ad0b37318 Added a check in the preRA scheduler for potential interference on a
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.

Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing

llvm-svn: 129100
2011-04-07 19:54:57 +00:00
Tanya Lattner 266792a55a Prevent ARM DAG Combiner from doing an AND or OR combine on an illegal vector type (vectors of size 3). Also included test cases.
llvm-svn: 129074
2011-04-07 15:24:20 +00:00
Evan Cheng a7c7b54dde Change -arm-divmod-libcall to a target neutral option.
llvm-svn: 129045
2011-04-07 00:58:44 +00:00
Owen Anderson bdff1c997a Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB.
llvm-svn: 129038
2011-04-06 23:35:59 +00:00
Jakob Stoklund Olesen 1ec41e2bd9 These tests no longer require linear scan because reserved register coalescing is now universal.
llvm-svn: 128936
2011-04-05 21:40:41 +00:00
Johnny Chen 293875ef55 Fix test-llvm failures.
llvm-svn: 128906
2011-04-05 18:41:40 +00:00
Eric Christopher f392a69ff7 Fix up testcase for previous commit.
llvm-svn: 128870
2011-04-05 00:56:01 +00:00
Cameron Zwarich 6fe5c29430 Do some peephole optimizations to remove pointless VMOVs from Neon to integer
registers that arise from argument shuffling with the soft float ABI. These
instructions are particularly slow on Cortex A8. This fixes one half of
<rdar://problem/8674845>.

llvm-svn: 128759
2011-04-02 02:40:43 +00:00
Jim Grosbach 360c369967 LDRD/STRD instructions should print both Rt and Rt2 in the asm string.
llvm-svn: 128736
2011-04-01 20:26:57 +00:00
Evan Cheng a6a992a662 Add test case.
llvm-svn: 128707
2011-04-01 06:27:25 +00:00
Evan Cheng 0f86d6de50 FileCheck'ify test.
llvm-svn: 128706
2011-04-01 03:36:33 +00:00
Jakob Stoklund Olesen 0888bcf542 Fix ARM tests to be register allocator independent.
llvm-svn: 128680
2011-03-31 22:14:03 +00:00
Evan Cheng 38bf5adcea Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

llvm-svn: 128665
2011-03-31 19:38:48 +00:00
Jakob Stoklund Olesen ae044c06bf Pick a conservative register class when creating a small live range for remat.
The rematerialized instruction may require a more constrained register class
than the register being spilled. In the test case, the spilled register has been
inflated to the DPR register class, but we are rematerializing a load of the
ssub_0 sub-register which only exists for DPR_VFP2 registers.

The register class is reinflated after spilling, so the conservative choice is
only temporary.

llvm-svn: 128610
2011-03-31 03:54:44 +00:00
Cameron Zwarich 53dd03d537 Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

llvm-svn: 128584
2011-03-30 23:01:21 +00:00
Evan Cheng 18381b4257 Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.

Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.

First part of rdar://8832507, rdar://9203134

llvm-svn: 128502
2011-03-29 23:06:19 +00:00
Cameron Zwarich 143f9aea2b Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes
<rdar://problem/8875309> and <rdar://problem/9057191>.

llvm-svn: 128492
2011-03-29 21:41:55 +00:00
Evan Cheng e2086e740f Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

llvm-svn: 128444
2011-03-29 01:56:09 +00:00
Devang Patel abc77347a7 Enable GlobalMerge on darwin.
llvm-svn: 128183
2011-03-23 23:34:19 +00:00
Evan Cheng 425489d397 Cmp peephole optimization isn't always safe for signed arithmetics.
int tries = INT_MAX;    
while (tries > 0) {
      tries--;
}

The check should be:
        subs    r4, #1
        cmp     r4, #0
        bgt     LBB0_1

The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.

rdar://9172742

llvm-svn: 128179
2011-03-23 22:52:04 +00:00
Rafael Espindola 1557fd6d39 Write the section table and the section data in the same order that
gun as does. This makes it a lot easier to compare the output of both
as the addresses are now a lot closer.

llvm-svn: 127972
2011-03-20 18:44:20 +00:00
Evan Cheng dc1d626a3d Match a few more obvious patterns to revsh. rdar://9147637.
llvm-svn: 127913
2011-03-18 21:52:42 +00:00
Daniel Dunbar fd95b016fb Revert r127757, "Patch to a fix dwarf relocation problem on ARM. One-line fix
plus the test where it used to break.", which broke Clang self-host of a
Debug+Asserts compiler, on OS X.

llvm-svn: 127763
2011-03-16 22:16:39 +00:00
Renato Golin a3aeafeb35 Patch to a fix dwarf relocation problem on ARM. One-line fix plus the test where it used to break.
llvm-svn: 127757
2011-03-16 21:05:52 +00:00
Bill Wendling ebecb33307 Some minor cleanups based on feedback.
llvm-svn: 127694
2011-03-15 20:47:26 +00:00
Evan Cheng 42401d6af2 Do not form thumb2 ldrd / strd if the offset is by multiple of 4. rdar://9133587
llvm-svn: 127683
2011-03-15 18:41:52 +00:00
Evan Cheng e4b8ac9fef Add a peephole optimization to optimize pairs of bitcasts. e.g.
v2 = bitcast v1
...
v3 = bitcast v2
...
   = v3
=>
v2 = bitcast v1
...
   = v1
if v1 and v3 are of in the same register class.

bitcast between i32 and fp (and others) are often not nops since they
are in different register classes. These bitcast instructions are often
left because they are in different basic blocks and cannot be
eliminated by dag combine.

rdar://9104514

llvm-svn: 127668
2011-03-15 05:13:13 +00:00
Bill Wendling 928de16793 Testcase for r127630.
llvm-svn: 127648
2011-03-15 01:49:08 +00:00
Jim Grosbach 3af6fe66b9 Clean up ARM tail calls a bit. They're pseudo-instructions for normal branches.
Also more cleanly separate the ARM vs. Thumb functionality. Previously, the
encoding would be incorrect for some Thumb instructions (the indirect calls).

llvm-svn: 127637
2011-03-15 00:30:40 +00:00
Bill Wendling e1fd78f2bc Generate a VTBL instruction instead of a series of loads and stores when we
can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better
than this:

_shuf:
@ BB#0:       @ %entry
  push        {r4, r7, lr}
  add         r7, sp, #4
  sub         sp, #12
  mov         r4, sp
  bic         r4, r4, #7
  mov         sp, r4
  mov         r2, sp
  vmov        d16, r0, r1
  orr         r0, r2, #6
  orr         r3, r2, #7
  vst1.8      {d16[0]}, [r3]
  vst1.8      {d16[5]}, [r0]
  subs        r4, r7, #4
  orr         r0, r2, #5
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #4
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #3
  vst1.8      {d16[0]}, [r0]
  orr         r0, r2, #2
  vst1.8      {d16[2]}, [r0]
  orr         r0, r2, #1
  vst1.8      {d16[1]}, [r0]
  vst1.8      {d16[3]}, [r2]
  vldr.64     d16, [sp]
  vmov        r0, r1, d16
  mov         sp, r4
  pop         {r4, r7, pc}

The "illegal" testcase in vext.ll is no longer illegal.
<rdar://problem/9078775>

llvm-svn: 127630
2011-03-14 23:02:38 +00:00
Eric Christopher d3cc9fdd8e Fix this test up a bit.
llvm-svn: 127621
2011-03-14 21:05:21 +00:00
Evan Cheng d2f3b01797 Minor optimization. sign-ext/anyext of undef is still undef.
llvm-svn: 127598
2011-03-14 18:15:55 +00:00
Eric Christopher c313d94068 Saving files before committing is overrated.
Add a RUN line to this test.

llvm-svn: 127520
2011-03-12 01:36:23 +00:00
Eric Christopher 174d872702 Sometimes isPredicable lies to us and tells us we don't need the operands.
Go ahead and add them on when we might want to use them and let
later passes remove them.

Fixes rdar://9118569

llvm-svn: 127518
2011-03-12 01:09:29 +00:00
Jim Grosbach 6d371ce37e Properly pseudo-ize the ARM LDMIA_RET instruction. This has the nice side-
effect that we get proper instruction printing using the "pop" mnemonic for it.

llvm-svn: 127502
2011-03-11 22:51:41 +00:00
Cameron Zwarich 338d362200 Roll r127459 back in:
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127498
2011-03-11 21:52:04 +00:00
Daniel Dunbar 94ccb27b43 Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often get
created from the", it broke some GCC test suite tests.

llvm-svn: 127477
2011-03-11 19:30:30 +00:00
Cameron Zwarich cc27b3acc4 Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127459
2011-03-11 04:54:27 +00:00
Evan Cheng adb9c03e41 Avoid replacing the value of a directly stored load with the stored value if the load is indexed. rdar://9117613.
llvm-svn: 127440
2011-03-11 00:48:56 +00:00
Jim Grosbach 62a7b473af Properly pseudo-ize MOVCCr and MOVCCs.
llvm-svn: 127434
2011-03-10 23:56:09 +00:00
Bob Wilson 45acbd03db Fix a compiler crash where a Glue value had multiple uses. Radar 9049552.
llvm-svn: 127198
2011-03-08 01:17:20 +00:00
Joerg Sonnenberger 62f759791a Be nice to Xcore and the XMOS assembler and avoid quoting section names
that contain only letters, digits and the characters "_" and ".".

llvm-svn: 127028
2011-03-04 20:03:14 +00:00
Devang Patel 906df92d5c XFAIL for all. These tests are darwin specific anyway.
llvm-svn: 127022
2011-03-04 19:38:10 +00:00
Devang Patel a0d73fd65e Disable ARMGlobalMerge on darwin. The debugger is not yet able to extract individual variable's info from merged global.
llvm-svn: 127019
2011-03-04 19:11:05 +00:00
Joerg Sonnenberger 852ab890b5 Bug#9033: For the ELF assembler output, always quote the section name.
llvm-svn: 126963
2011-03-03 22:31:08 +00:00
Cameron Zwarich 5dd2aa2615 Eliminate the unused CodeGenPrepare option to split critical edges.
llvm-svn: 126825
2011-03-02 03:31:46 +00:00
Bill Wendling 3b1459b810 Narrow right shifts need to encode their immediates differently from a normal
shift.

   16-bit: imm6<5:3> = '001', 8 - <imm> is encded in imm6<2:0>
   32-bit: imm6<5:4> = '01',16 - <imm> is encded in imm6<3:0>
   64-bit: imm6<5> = '1', 32 - <imm> is encded in imm6<4:0>

llvm-svn: 126723
2011-03-01 01:00:59 +00:00
Jakob Stoklund Olesen 87d7408f29 Fix typo introduced by r126661: "Fix a typo which ..."
llvm-svn: 126666
2011-02-28 19:18:59 +00:00
Evan Cheng 6e3d443646 Fix a typo which cause dag combine crash. rdar://9059537.
llvm-svn: 126661
2011-02-28 18:45:27 +00:00
Bob Wilson e3ecd5fb9b Add patterns to use post-increment addressing for Neon VST1-lane instructions.
llvm-svn: 126477
2011-02-25 06:42:42 +00:00
Devang Patel b52040da17 Move arch specific tests in arch specific directories.
llvm-svn: 126401
2011-02-24 19:06:27 +00:00
Devang Patel 37e056e455 Check only relevant strings in output to increase stability of the tests.
llvm-svn: 126338
2011-02-23 22:35:57 +00:00
Evan Cheng d6b641e5bc More fcopysign correctness and performance fix.
The previous codegen for the slow path (when values are in VFP / NEON
registers) was incorrect if the source is NaN.

The new codegen uses NEON vbsl instruction to copy the sign bit. e.g.
        vmov.i32        d1, #0x80000000
        vbsl    d1, d2, d0
If NEON is not available, it uses integer instructions to copy the sign bit.
rdar://9034702

llvm-svn: 126295
2011-02-23 02:24:55 +00:00
Evan Cheng 2ce663031f available_externally (hidden or not) GVs are always accessed via stubs. rdar://9027648.
llvm-svn: 126191
2011-02-22 06:58:34 +00:00
Bob Wilson 60f50bc9f1 PR9139: Specify ARM/Darwin triple for vector-DAGCombine.ll test.
The i64_buildvector test in this file relies on the alignment of i64 and
f64 types being the same, which is true for Darwin but not AAPCS.

llvm-svn: 125525
2011-02-14 22:12:50 +00:00
Nate Begeman fa62d50481 Implement sdiv & udiv for <4 x i16> and <8 x i8> NEON vector types.
This avoids moving each element to the integer register file and calling __divsi3 etc. on it.

llvm-svn: 125402
2011-02-11 20:53:29 +00:00
Evan Cheng 2da1c95993 Fix buggy fcopysign lowering.
This
define float @foo(float %x, float %y) nounwind readnone {
entry:
  %0 = tail call float @copysignf(float %x, float %y) nounwind readnone
  ret float %0
}

Was compiled to:
    vmov     s0, r1
    bic      r0, r0, #-2147483648
    vmov     s1, r0
    vcmpe.f32    s0, #0
    vmrs         apsr_nzcv, fpscr
    it           lt
    vneglt.f32   s1, s1
    vmov         r0, s1
    bx           lr

This fails to copy the sign of -0.0f because it's lost during the float to int
conversion. Also, it's sub-optimal when the inputs are in GPR registers.

Now it uses integer and + or operations when it's profitable. And it's correct!
    lsrs    r1, r1, #31
    bfi     r0, r1, #31, #1
    bx      lr
rdar://8984306

llvm-svn: 125357
2011-02-11 02:28:55 +00:00
Andrew Trick 7d90bf5551 PostRA antidependence breaker unit test for PR8986.
llvm-svn: 125091
2011-02-08 17:42:05 +00:00
Andrew Trick c5daa45c8d PostRA antidependence breaker unit test for rdar://8959122.
llvm-svn: 125090
2011-02-08 17:41:12 +00:00
Evan Cheng e1a4ac9b5b Fix an obvious typo which caused an isel assertion. rdar://8964854.
llvm-svn: 125023
2011-02-07 18:50:47 +00:00
Bob Wilson 06fce87c4a Add codegen support for using post-increment NEON load/store instructions.
The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using
post-increment versions, but all the rest of the NEON load/store instructions
should be handled now.

llvm-svn: 125014
2011-02-07 17:43:21 +00:00
Jason W Kim 85b0af177f Rework some .ARM.attribute work for improved gcc compatibility.
Unified EmitTextAttribute for both Asm and Obj emission (.cpu only)
Added necessary cortex-A8 related attrs for codegen compat tests.

llvm-svn: 124995
2011-02-07 00:49:53 +00:00
Evan Cheng d42641c6b5 Given a pair of floating point load and store, if there are no other uses of
the load, then it may be legal to transform the load and store to integer
load and store of the same width.

This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []

to

ldr r12, []
str r12, []

rdar://8944252

llvm-svn: 124708
2011-02-02 01:06:55 +00:00
Eric Christopher ebd8db7d00 Add a testcase for my last checkin.
llvm-svn: 124358
2011-01-27 06:01:17 +00:00
Evan Cheng d6093ff4cb Don't merge restore with tail call instruction.
llvm-svn: 124167
2011-01-25 01:28:33 +00:00
Evan Cheng 2f2435d026 Last round of fixes for movw + movt global address codegen.
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.

It's now enabled by default for Darwin.

llvm-svn: 123991
2011-01-21 18:55:51 +00:00
Andrew Trick bd428ec50f Enable support for precise scheduling of the instruction selection
DAG. Disable using "-disable-sched-cycles".

For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.

Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
  operations can have their latency covered. i.e. two independent 4
  cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
  latency-based stalls on their uses will be prioritized by depth before height
  (height is irrelevant if no stalls occur in the schedule below this point).

llvm-svn: 123971
2011-01-21 06:19:05 +00:00
Andrew Trick 47ff14b091 Convert -enable-sched-cycles and -enable-sched-hazard to -disable
flags. They are still not enable in this revision.

Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.

Generalized unit tests to work with sched-cycles.

llvm-svn: 123969
2011-01-21 05:51:33 +00:00
Evan Cheng 028ccbfcbf Don't be overly aggressive with CSE of "ldr constantpool". If it's a pc-relative
value, the "add pc" must be CSE'ed at the same time. We could follow the same
approach as T2 by adding pseudo instructions that combine the ldr + "add pc".
But the better approach is to use movw + movt (which I will enable soon), so
I'll leave this as a TODO.

llvm-svn: 123949
2011-01-20 23:55:07 +00:00
Evan Cheng f2e914be15 Add test.
llvm-svn: 123906
2011-01-20 08:38:21 +00:00
Evan Cheng b8b0ad80a8 Sorry, several patches in one.
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.

Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.

ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
   which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
   a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
   to re-materialize the instruction, allow machine LICM to hoist the set of
   instructions out of the loop and make it possible to CSE them. It's a bit
   hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.

With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.

llvm-svn: 123905
2011-01-20 08:34:58 +00:00
Eric Christopher bb14f65672 If we can, lower the multiply part of a umulo/smulo call to a libcall
with an invalid type then split the result and perform the overflow check
normally.

Fixes the 32-bit parts of rdar://8622122 and rdar://8774702.

llvm-svn: 123864
2011-01-20 00:29:24 +00:00
Devang Patel 2d9e532a3a Fix debug info for merged global.
llvm-svn: 123862
2011-01-20 00:02:16 +00:00
Evan Cheng dfce83c8f5 Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.
movw    r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
        movt    r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
        add     r0, pc, r0

It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.

llvm-svn: 123619
2011-01-17 08:03:18 +00:00
Eric Christopher 3904343c68 Even if we don't have 7 bytes of stack space we may need to save and
restore the stack pointer from the frame pointer on thumbv6.

Fixes rdar://8819685

llvm-svn: 123196
2011-01-11 00:16:04 +00:00
Evan Cheng 078b0b095e Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Bob Wilson 6f2b8966ca Lower some BUILD_VECTORS using VEXT+shuffle.
Patch by Tim Northover.

llvm-svn: 123035
2011-01-07 21:37:30 +00:00
Bob Wilson 99da75c17d Add testcases for PR8411 (vget_low and vget_high implemented as shuffles).
llvm-svn: 122997
2011-01-07 06:44:14 +00:00
Bob Wilson 8265d56638 Add ARM patterns to match EXTRACT_SUBVECTOR nodes.
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.

The test changes are needed to keep those spill-q tests from testing aligned
spills and restores.  If the only aligned stack objects are spill slots, we
no longer realign the stack frame.  Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.

llvm-svn: 122995
2011-01-07 04:59:04 +00:00
Bob Wilson 914df82a2e PR8921: LDM/POP do not support interworking prior to v5t.
llvm-svn: 122970
2011-01-06 19:24:41 +00:00
Anton Korobeynikov 879be84aa5 Update the test
llvm-svn: 122666
2011-01-01 20:57:26 +00:00
Bob Wilson 36be00ceb3 Radar 8803471: Fix expansion of ARM BCCi64 pseudo instructions.
If the basic block containing the BCCi64 (or BCCZi64) instruction ends with
an unconditional branch, that branch needs to be deleted before appending
the expansion of the BCCi64 to the end of the block.

llvm-svn: 122521
2010-12-23 22:45:49 +00:00
Bob Wilson 1a20c2aedd Add ARM-specific DAG combining to cast i64 vector element load/stores to f64.
Type legalization splits up i64 values into pairs of i32 values, which leads
to poor quality code when inserting or extracting i64 vector elements.
If the vector element is loaded or stored, it can be treated as an f64 value
and loaded or stored directly from a VPR register.  Use the pre-legalization
DAG combiner to cast those vector elements to f64 types so that the type
legalizer won't mess them up.  Radar 8755338.

llvm-svn: 122319
2010-12-21 06:43:19 +00:00
Chris Lattner 2b43f2df2b move this test into the ARM test so that it is only run when the arm backend
is enabled.

llvm-svn: 122163
2010-12-19 02:58:14 +00:00
Bob Wilson 00871c71e9 Fix result type of Neon floating-point comparisons against zero.
The result vector elements are always integers.  Radar 8782191.

llvm-svn: 122112
2010-12-18 00:04:33 +00:00
Bill Wendling 3fff1fd49b During local stack slot allocation, the materializeFrameBaseRegister function
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.

Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)

<rdar://problem/8782198>

llvm-svn: 122104
2010-12-17 23:09:14 +00:00
Bob Wilson 5408144add Fix a DAGCombiner crash when folding binary vector operations with constant
BUILD_VECTOR operands where the element type is not legal.  I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.

llvm-svn: 122102
2010-12-17 23:06:49 +00:00
Bob Wilson 494d7f0367 Combine several vector-related DAGCombiner tests.
llvm-svn: 122101
2010-12-17 23:06:46 +00:00
Bob Wilson bfc6904fc6 Fix crash compiling a QQQQ REG_SEQUENCE for a Neon vld3_lane operation.
Radar 8776599

llvm-svn: 122018
2010-12-17 01:21:12 +00:00
Jason W Kim 4c1386add9 1. ARM/MC/ELF: A few more ELF relocs for .o
2. Fixed EmitLocalCommonSymbol for ELF (Yes, they exist. :)
   Test added.

llvm-svn: 121951
2010-12-16 03:12:17 +00:00
Eric Christopher 347f4c32e8 Don't handle -arm-long-calls in fast isel for now.
llvm-svn: 121919
2010-12-15 23:47:29 +00:00
Bob Wilson fa27a8621c Add Neon VCVT instructions for f32 <-> f16 conversions.
Clang is now providing intrinsics for these and so we need to support them
in the backend.  Radar 8068427.

llvm-svn: 121902
2010-12-15 22:14:12 +00:00
Evan Cheng c177813755 bfi A, (and B, C1), C2) -> bfi A, B, C2 iff C1 & C2 == C1. rdar://8458663
llvm-svn: 121746
2010-12-14 03:22:07 +00:00
Jason W Kim 1296055841 fix fixme case typo :-)
llvm-svn: 121743
2010-12-14 01:42:38 +00:00
Jason W Kim 0e909c5f9c First cut of ARM/MC/ELF PIC relocations.
Test has fixme, to move to .s -> .o test when AsmParser works better.

llvm-svn: 121732
2010-12-13 23:16:07 +00:00
Evan Cheng 3434575704 (or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == #shamt. rdar://8752056
llvm-svn: 121606
2010-12-11 04:11:38 +00:00
Bob Wilson 9375d27460 Add float patterns for Neon vld1-lane/dup and vst1-lane operations.
llvm-svn: 121583
2010-12-10 22:13:32 +00:00
Bob Wilson d29b38c893 Fix some invalid alignments for Neon vld-dup and vld/st-lane instructions.
Alignments smaller than the total size of the memory being loaded or stored,
unless the alignment is 8 bytes, are not allowed.  Add tests for this, too.

llvm-svn: 121506
2010-12-10 19:37:42 +00:00
Jim Grosbach 5fccad84a3 ARM stm/ldm instructions require more than one register in the register list.
Otherwise, a plain str/ldr should be used instead. Make sure we account for
that in prologue/epilogue code generation.
rdar://8745460

llvm-svn: 121391
2010-12-09 18:31:13 +00:00
Jason W Kim c79c5f6e8c ARM/MC/ELF TPsoft is now a proper pseudo inst.
Added test to check bl __aeabi_read_tp gets emitted properly for ELF/ASM
as well as ELF/OBJ (including fixup)

Also added support for ELF::R_ARM_TLS_IE32

llvm-svn: 121312
2010-12-08 23:14:44 +00:00
Evan Cheng 775ead3293 Fix a bad prologue / epilogue codegen bug where the compiler would emit illegal
vpush instructions to save / restore VFP / NEON registers like this:
vpush {d8,d10,d11}
vpop {d8,d10,d11}

vpush and vpop do not allow gaps in the register list.
rdar://8728956

llvm-svn: 121197
2010-12-07 23:08:38 +00:00
Devang Patel c24048a718 If dbg_declare() or dbg_value() is not lowered by isel then emit DEBUG message instead of creating DBG_VALUE for undefined value in reg0.
llvm-svn: 121059
2010-12-06 22:39:26 +00:00
Evan Cheng 62c7b5bf76 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960
2010-12-05 22:04:16 +00:00
Jason W Kim 29805961d8 ARM/MC/ELF relocation "hello world" for movw/movt.
Lifted adjustFixupValue() from Darwin for sharing w ELF.
Test added
TODO:
  refactor ELFObjectWriter::RecordRelocation more.
  Possibly share more code with Darwin?
  Lots more relocations...

llvm-svn: 120534
2010-12-01 02:40:06 +00:00
Evan Cheng d4b0873c06 Enable sibling call optimization of libcalls which are expanded during
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777

llvm-svn: 120501
2010-11-30 23:55:39 +00:00
Bob Wilson 431ac4ef50 Add support for NEON VLD3-dup instructions.
The encoding for alignment in VLD4-dup instructions is still a work in progress.

llvm-svn: 120356
2010-11-30 00:00:35 +00:00
Evan Cheng 9a133f623c Mark Darwin call instructions as using "r7" to prevent the frame-register
assignment instructions from being moved below / above calls.
rdar://8690640

llvm-svn: 120339
2010-11-29 22:43:27 +00:00
Benjamin Kramer a22f0ce1a3 Add missing colon.
llvm-svn: 120336
2010-11-29 22:39:38 +00:00
Bob Wilson 77ab165afe Add support for NEON VLD3-dup instructions.
llvm-svn: 120312
2010-11-29 19:35:29 +00:00
Bob Wilson 2d790df105 Add support for NEON VLD2-dup instructions.
llvm-svn: 120236
2010-11-28 06:51:26 +00:00
Bob Wilson c92eea0175 Add NEON VLD1-dup instructions (load 1 element to all lanes).
llvm-svn: 120194
2010-11-27 06:35:16 +00:00
Bob Wilson d7d2cf7842 Recognize sign/zero-extended constant BUILD_VECTORs for VMULL operations.
We need to check if the individual vector elements are sign/zero-extended
values.  For now this only handles constants values.  Radar 8687140.

llvm-svn: 120034
2010-11-23 19:38:38 +00:00
Evan Cheng eb56dca4fd Fix epilogue codegen to avoid leaving the stack pointer in an invalid
state. Previously Thumb2 would restore sp from fp like this:
mov sp, r7
sub, sp, #4
If an interrupt is taken after the 'mov' but before the 'sub', callee-saved
registers might be clobbered by the interrupt handler. Instead, try
restoring directly from sp:
add sp, #4
Or, if necessary (with VLA, etc.) use a scratch register to compute sp and
then restore it:
sub.w r4, r7, #8
mov sp, r7
rdar://8465407

llvm-svn: 119977
2010-11-22 18:12:04 +00:00
Tanya Lattner cd68095650 Fix bug in DAGCombiner for ARM that was trying to do a ShiftCombine on illegal types (vector should be split first).
Added test case.

llvm-svn: 119749
2010-11-18 22:06:46 +00:00
Eric Christopher b006fc9c07 Rewrite stack callee saved spills and restores to use push/pop instructions.
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.

Adjust all testcases accordingly.

llvm-svn: 119725
2010-11-18 19:40:05 +00:00
Dale Johannesen 0659c8f157 These tests are looking for library function names that
appear to differ on Linux.  Try to make them pass on Linux.
Would be good for a Linux person to review this.

llvm-svn: 119572
2010-11-17 21:57:32 +00:00
Bob Wilson 881b45ccdf Change ARMGlobalMerge to keep BSS globals in separate pools.
This completes the fixes for Radar 8673120.

llvm-svn: 119566
2010-11-17 21:25:39 +00:00
Bob Wilson 4c8ab19c22 Fix ARMGlobalMerge pass to check if globals are entirely within range.
It is generally not sufficient to check if the starting offset is in range
of the maximum offset that can be efficiently used for the target.

llvm-svn: 119565
2010-11-17 21:25:36 +00:00
Bob Wilson 59182fb4b5 Change the symbol for merged globals from "merged" to "_MergedGlobals".
This makes it more clear that the symbol is an internal, compiler-generated
name and gives a little more description about its contents.

llvm-svn: 119564
2010-11-17 21:25:33 +00:00
Bob Wilson f796d4b469 Fix the ARMGlobalMerge pass to look at variable sizes instead of pointer sizes.
It was mistakenly looking at the pointer type when checking for the size of
global variables.  This is a partial fix for Radar 8673120.

llvm-svn: 119563
2010-11-17 21:25:27 +00:00
Evan Cheng 7f8ab6ee8b Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.

Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.

Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.

2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.

rdar://8663787, rdar://8241368

llvm-svn: 119548
2010-11-17 20:13:28 +00:00
Jakob Stoklund Olesen e2b8858611 Fix PR8612 in the standard spiller, take two.
The live range of a register defined by an early clobber starts at the use slot,
not the def slot.

Except when it is an early clobber tied to a use operand. Then it starts at the
def slot like a standard def.

llvm-svn: 119305
2010-11-16 00:40:59 +00:00
Jakob Stoklund Olesen 97154f67d9 Revert "Fix PR8612 in the standard spiller as well."
This reverts r119183 which borke the buildbots.

llvm-svn: 119270
2010-11-15 21:51:51 +00:00
Eric Christopher 964943780b Recommit this change and remove the failing part of the test - it didn't
pass in the first place and was masked by earlier failures not warning
and aborting the block.

llvm-svn: 119184
2010-11-15 21:11:06 +00:00
Jakob Stoklund Olesen 97825bcbfd Fix PR8612 in the standard spiller as well.
The live range of a register defined by an early clobber starts at the use slot,
not the def slot.

llvm-svn: 119183
2010-11-15 20:55:53 +00:00
Jakob Stoklund Olesen ddf25c341c When spilling a register defined by an early clobber, make sure that the new
live ranges for the spill register are also defined at the use slot instead of
the normal def slot.

This fixes PR8612 for the inline spiller. A use was being allocated to the same
register as a spilled early clobber def.

This problem exists in all the spillers. A fix for the standard spiller is
forthcoming.

llvm-svn: 119182
2010-11-15 20:55:49 +00:00
Evan Cheng 2bcb8daa44 Add conditional move of large immediate.
llvm-svn: 118968
2010-11-13 02:25:14 +00:00
Evan Cheng 8ce967e393 Fix an obvious typo which inverted an immediate.
llvm-svn: 118951
2010-11-13 00:27:47 +00:00
Eric Christopher a08ccc8cb9 This should be still failing, but is. Disable it with the
forget-me-stick for now.

llvm-svn: 118950
2010-11-13 00:25:06 +00:00
Evan Cheng 0fc8084a64 Add conditional mvn instructions.
llvm-svn: 118935
2010-11-12 22:42:47 +00:00
Evan Cheng 2d59ee34f1 Add some missing isel predicates on def : pat patterns to avoid generating VFP vmla / vmls (they cause stalls). Disabling them in isel is properly not a right solution, I'll look into a proper solution next.
llvm-svn: 118922
2010-11-12 20:32:20 +00:00
Owen Anderson c7baee31ad Add support for ARM's specialized vector-compare-against-zero instructions.
llvm-svn: 118453
2010-11-08 23:21:22 +00:00
Dale Johannesen 0ef474730f Revert 118422 in search of bot verdancy.
llvm-svn: 118429
2010-11-08 19:17:22 +00:00