Commit Graph

254 Commits

Author SHA1 Message Date
Chad Rosier be7625161e VMOVQQQQs pseudo instructions are only created by ARMBaseInstrInfo::copyPhysReg.
Therefore, rather then generate a pseudo instruction, which is later expanded,
generate the necessary instructions in place.

llvm-svn: 138163
2011-08-20 00:17:25 +00:00
Owen Anderson 732f82c463 Rewrite some ARM InstrInfo functions to be most accepting of arbitrary register subclasses. Hopefully this fixes some buildbots.
llvm-svn: 137223
2011-08-10 17:21:20 +00:00
Jakob Stoklund Olesen 6a14dc01ff Promote VMOVS to VMOVD when possible.
On Cortex-A8, we use the NEON v2f32 instructions for f32 arithmetic. For
better latency, we also send D-register copies down the NEON pipeline by
translating them to vorr instructions.

This patch promotes even S-register copies to D-register copies when
possible so they can also go down the NEON pipeline.  Example:

        vldr.32 s0, LCPI0_0
    loop:
        vorr    d1, d0, d0
    loop2:
        ...
        vadd.f32        d1, d1, d16

The vorr instruction looked like this after regalloc:

    %S2<def> = COPY %S0, %D1<imp-def>

Copies involving odd S-registers, and copies that don't define the full
D-register are left alone.

llvm-svn: 137182
2011-08-09 23:41:44 +00:00
Jakob Stoklund Olesen c04a66b48e Implement isLoadFromStackSlotPostFE and isStoreToStackSlotPostFE for ARM.
They improve the verbose assembly.

llvm-svn: 137069
2011-08-08 21:45:32 +00:00
Owen Anderson b595ed0085 Split up the ARM so_reg ComplexPattern into so_reg_reg and so_reg_imm, allowing us to distinguish the encodings that use shifted registers from those that use shifted immediates. This is necessary to allow the fixed-length decoder to distinguish things like BICS vs LDRH.
llvm-svn: 135693
2011-07-21 18:54:16 +00:00
Evan Cheng a20cde31e7 Sink ARMMCExpr and ARMAddressingModes into MC layer. First step to separate ARM MC code from target.
llvm-svn: 135636
2011-07-20 23:34:39 +00:00
Owen Anderson 454e1c7abb Remove VMOVDneon and VMOVQ, which are just aliases for VORR. This continues to simplify the path towards an auto-generated disassembler.
llvm-svn: 135290
2011-07-15 18:46:47 +00:00
Evan Cheng bc153d49b7 Next round of MC refactoring. This patch factor MC table instantiations, MC
registeration and creation code into XXXMCDesc libraries.

llvm-svn: 135184
2011-07-14 20:59:42 +00:00
Owen Anderson 651b230ca0 Add a target-indepedent entry to MCInstrDesc to describe the encoded size of an opcode. Switch ARM over to using that rather than its own special MCInstrDesc bits.
llvm-svn: 135106
2011-07-13 23:22:26 +00:00
Jakub Staszak 9b07c0ab6b Use BranchProbability instead of floating points in IfConverter.
llvm-svn: 134858
2011-07-10 02:58:07 +00:00
Evan Cheng 703a0fbf39 Hide the call to InitMCInstrInfo into tblgen generated ctor.
llvm-svn: 134244
2011-07-01 17:57:27 +00:00
Jim Grosbach d86f34d631 Refactor away tSpill and tRestore pseudos in ARM backend.
The tSpill and tRestore instructions are just copies of the tSTRspi and
tLDRspi instructions, respectively. Just use those directly instead.

llvm-svn: 134092
2011-06-29 20:26:39 +00:00
Evan Cheng 194c3dc01f Move CallFrameSetupOpcode and CallFrameDestroyOpcode to TargetInstrInfo.
llvm-svn: 134030
2011-06-28 21:14:33 +00:00
Evan Cheng 1e210d08d8 Merge XXXGenRegisterNames.inc into XXXGenRegisterInfo.inc
llvm-svn: 134024
2011-06-28 20:07:07 +00:00
Evan Cheng 6cc775f905 - Rename TargetInstrDesc, TargetOperandInfo to MCInstrDesc and MCOperandInfo and
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.

llvm-svn: 134021
2011-06-28 19:10:37 +00:00
Chris Lattner 1d0c25756e use the MachineInstrBuilder operator-> to simplify some code.
There are probably more instances of this floating around.

llvm-svn: 130474
2011-04-29 05:24:29 +00:00
Evan Cheng 7d6cd4902e Change A9 scheduling itineraries VLD* / VST* entries default to "aligned". That
is, it assumes addresses are 64-bit aligned (which should be the more common
case). If the alignment is found not to be aligned, then getOperandLatency()
would adjust the operand latency computation by one to compensate for it.
rdar://9294833

llvm-svn: 129742
2011-04-19 01:21:49 +00:00
Cameron Zwarich 9c65e4d69c Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate
a case involving EOR, so I only added a test for ORR.

llvm-svn: 129610
2011-04-15 21:24:38 +00:00
Cameron Zwarich 0829b3065a The AND instruction leaves the V flag unmodified, so it falls victim to the same
problem as all of the other instructions we fold with CMPs.

llvm-svn: 129602
2011-04-15 20:45:00 +00:00
Cameron Zwarich 93eae1571c Add missing register forms of instructions to the ARM CMP-folding code. This
fixes <rdar://problem/9287901>.

llvm-svn: 129599
2011-04-15 20:28:28 +00:00
Chris Lattner 0ab5e2cded Fix a ton of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129558
2011-04-15 05:18:47 +00:00
Cameron Zwarich 8001850ee8 Fix a typo.
llvm-svn: 129429
2011-04-13 06:39:16 +00:00
Owen Anderson bdff1c997a Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB.
llvm-svn: 129038
2011-04-06 23:35:59 +00:00
Owen Anderson d6c5a741b5 Get rid of the non-writeback versions VLDMDB and VSTMDB, which don't actually exist.
llvm-svn: 128461
2011-03-29 16:45:53 +00:00
Evan Cheng f098bf1199 Nasty bug in ARMBaseInstrInfo::produceSameValue(). The MachineConstantPoolEntry
entries being compared may not be ARMConstantPoolValue. Without checking
whether they are ARMConstantPoolValue first, and if the stars and moons
are aligned properly, the equality test may return true (when the first few
words of two Constants' values happen to be identical) and very bad things can
happen.

rdar://9125354

llvm-svn: 128203
2011-03-24 06:20:03 +00:00
Evan Cheng 425489d397 Cmp peephole optimization isn't always safe for signed arithmetics.
int tries = INT_MAX;    
while (tries > 0) {
      tries--;
}

The check should be:
        subs    r4, #1
        cmp     r4, #0
        bgt     LBB0_1

The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.

rdar://9172742

llvm-svn: 128179
2011-03-23 22:52:04 +00:00
Anton Korobeynikov e7410dd0d5 Preliminary support for ARM frame save directives emission via MI flags.
This is just very first approximation how the stuff should be done
(e.g. ARM-only for now). More to follow.

llvm-svn: 127101
2011-03-05 18:43:32 +00:00
Evan Cheng 2f2435d026 Last round of fixes for movw + movt global address codegen.
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.

It's now enabled by default for Darwin.

llvm-svn: 123991
2011-01-21 18:55:51 +00:00
Andrew Trick 47ff14b091 Convert -enable-sched-cycles and -enable-sched-hazard to -disable
flags. They are still not enable in this revision.

Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.

Generalized unit tests to work with sched-cycles.

llvm-svn: 123969
2011-01-21 05:51:33 +00:00
Evan Cheng 028ccbfcbf Don't be overly aggressive with CSE of "ldr constantpool". If it's a pc-relative
value, the "add pc" must be CSE'ed at the same time. We could follow the same
approach as T2 by adding pseudo instructions that combine the ldr + "add pc".
But the better approach is to use movw + movt (which I will enable soon), so
I'll leave this as a TODO.

llvm-svn: 123949
2011-01-20 23:55:07 +00:00
Evan Cheng b8b0ad80a8 Sorry, several patches in one.
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.

Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.

ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
   which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
   a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
   to re-materialize the instruction, allow machine LICM to hoist the set of
   instructions out of the loop and make it possible to CSE them. It's a bit
   hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.

With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.

llvm-svn: 123905
2011-01-20 08:34:58 +00:00
Evan Cheng dfce83c8f5 Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.
movw    r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
        movt    r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
        add     r0, pc, r0

It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.

llvm-svn: 123619
2011-01-17 08:03:18 +00:00
Jakob Stoklund Olesen 2fb5b31578 Simplify a bunch of isVirtualRegister() and isPhysicalRegister() logic.
These functions not longer assert when passed 0, but simply return false instead.

No functional change intended.

llvm-svn: 123155
2011-01-10 02:58:51 +00:00
Evan Cheng 078b0b095e Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Andrew Trick 10ffc2b6c2 Various bits of framework needed for precise machine-level selection
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.

Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.

Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.

Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.

ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.

ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.

llvm-svn: 122541
2010-12-24 05:03:26 +00:00
Andrew Trick c416ba612b whitespace
llvm-svn: 122539
2010-12-24 04:28:06 +00:00
Bob Wilson 651eaa02b8 Remove the rest of the *_sfp Neon instruction patterns.
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now.  It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior.  Since that isn't obviously wrong, I've just
changed the test file.  This completes the work for Radar 8711675.

llvm-svn: 121730
2010-12-13 23:02:37 +00:00
Jim Grosbach 327cf8ee5f Refactor the ARM CMPz* patterns to just use the normal CMP instructions when
possible. They were duplicates for everything exception the source pattern
before.

llvm-svn: 121179
2010-12-07 20:41:06 +00:00
Evan Cheng 62c7b5bf76 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960
2010-12-05 22:04:16 +00:00
Jim Grosbach 81af4f9eb1 Rename t2 TBB and TBH instructions to reference that they encode the jump table
data. Next up, pseudo-izing them.

llvm-svn: 120320
2010-11-29 21:28:32 +00:00
Anton Korobeynikov d08fbd19f5 Move callee-saved regs spills / reloads to TFI
llvm-svn: 120228
2010-11-27 23:05:03 +00:00
Eric Christopher b006fc9c07 Rewrite stack callee saved spills and restores to use push/pop instructions.
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.

Adjust all testcases accordingly.

llvm-svn: 119725
2010-11-18 19:40:05 +00:00
Evan Cheng 2d4e42fba6 Silence compiler warnings.
llvm-svn: 119610
2010-11-18 01:43:23 +00:00
Evan Cheng 7f8ab6ee8b Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.

Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.

Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.

2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.

rdar://8663787, rdar://8241368

llvm-svn: 119548
2010-11-17 20:13:28 +00:00
Evan Cheng 655364797e Simplify code that toggle optional operand to ARM::CPSR.
llvm-svn: 119484
2010-11-17 08:06:50 +00:00
Bill Wendling a68e3a5397 Encode the multi-load/store instructions with their respective modes ('ia',
'db', 'ib', 'da') instead of having that mode as a separate field in the
instruction. It's more convenient for the asm parser and much more readable for
humans.
<rdar://problem/8654088>

llvm-svn: 119310
2010-11-16 01:16:36 +00:00
Evan Cheng 2ce016c7f8 Code clean up. The peephole pass should be the one updating the instruction
iterator, not TII->OptimizeCompareInstr.

llvm-svn: 119186
2010-11-15 21:20:45 +00:00
Eric Christopher b90f7004cf Revert this temporarily.
llvm-svn: 118827
2010-11-11 19:47:02 +00:00
Eric Christopher e6283f950d Change the prologue and epilogue to use push/pop for the low ARM registers.
llvm-svn: 118823
2010-11-11 19:26:03 +00:00
Evan Cheng debf9c502a Two sets of changes. Sorry they are intermingled.
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
   "optimize for latency". Call instructions don't have the right latency and
   this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
   not # of micro-ops since multi-latency instructions is completely executed
   even when the predicate is false. Also, some instruction will be "slower"
   when they are predicated due to the register def becoming implicit input.
   rdar://8598427

llvm-svn: 118135
2010-11-03 00:45:17 +00:00