Commit Graph

419 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen 0965585cb1 Experimental support for aligned NEON spills.
ARM targets with NEON units have access to aligned vector loads and
stores that are potentially faster than unaligned operations.

Add support for spilling the callee-saved NEON registers to an aligned
stack area using 16-byte aligned NEON loads and store.

This feature is off by default, controlled by an -align-neon-spills
command line option.

llvm-svn: 147211
2011-12-23 00:36:18 +00:00
Jakob Stoklund Olesen b95c102c2f Heed spill slot alignment on ARM.
Use the spill slot alignment as well as the local variable alignment to
determine when the stack needs to be realigned. This works now that the
ARM target can always realign the stack by using a base pointer.

Still respect the ARMBaseRegisterInfo::canRealignStack() function
vetoing a realigned stack.  Don't use aligned spill code in that case.

llvm-svn: 146997
2011-12-20 22:15:04 +00:00
Evan Cheng 68132d8093 ARM target code clean up. Check for iOS, not Darwin where it makes sense.
llvm-svn: 146981
2011-12-20 18:26:50 +00:00
Evan Cheng 903231bc58 Fix a CPSR liveness tracking bug introduced when I converted IT block to bundle.
llvm-svn: 146805
2011-12-17 01:25:34 +00:00
Jakob Stoklund Olesen 9790187b6c Fix off-by-one error in bucket sort.
The bad sorting caused a misaligned basic block when building 176.vpr in
ARM mode.

<rdar://problem/10594653>

llvm-svn: 146767
2011-12-16 23:00:05 +00:00
Evan Cheng 7fae11b231 - Add MachineInstrBundle.h and MachineInstrBundle.cpp. This includes a function
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
  and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
  prevent IT blocks from being broken apart.

llvm-svn: 146542
2011-12-14 02:11:42 +00:00
Chandler Carruth 6b0e34c445 Manually upgrade the test suite to specify the flag to cttz and ctlz.
I followed three heuristics for deciding whether to set 'true' or
'false':

- Everything target independent got 'true' as that is the expected
  common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
  set the flag in the way that exercises the most of codegen. For most
  architectures this is also the likely path from a GCC builtin, with
  'true' being set. It will (eventually) require lowering away that
  difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
  operation should be tested.

Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.

llvm-svn: 146370
2011-12-12 11:59:10 +00:00
Owen Anderson 0b9b9da6c8 Teach SelectionDAG to match more calls to libm functions onto existing SDNodes. Mark these nodes as illegal by default, unless the target declares otherwise.
llvm-svn: 146171
2011-12-08 19:32:14 +00:00
Chris Lattner 6a144a2227 Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic.
llvm-svn: 145171
2011-11-27 06:54:59 +00:00
Evan Cheng 7ca4b6eb5c Add vmov.f32 to materialize f32 immediate splats which cannot be handled by
integer variants. rdar://10437054

llvm-svn: 144608
2011-11-15 02:12:34 +00:00
Jim Grosbach 3e2c6f380c ARM VLDR/VSTR instructions don't need a size suffix.
Canonicallize on the non-suffixed form, but continue to accept assembly that
has any correctly sized type suffix.

llvm-svn: 144583
2011-11-14 23:03:21 +00:00
Jakob Stoklund Olesen 8ec1a92afd Linear scan is going away.
llvm-svn: 144472
2011-11-12 22:39:34 +00:00
Jakob Stoklund Olesen 4deff7bc1d Switch a few tests off linearscan.
llvm-svn: 144460
2011-11-12 19:53:52 +00:00
Jim Grosbach 4e0dbee62b ARM Darwin default relocation model is PIC.
This matches clang, so default options in llc and friends are now closer to
clang's defaults.

llvm-svn: 140863
2011-09-30 17:41:35 +00:00
Eli Friedman f6fbfd3f83 Last batch of test conversions to new atomic instructions.
llvm-svn: 140585
2011-09-27 00:17:29 +00:00
Eli Friedman ab7b99ab9c Convert more tests to new atomic instructions.
llvm-svn: 140567
2011-09-26 21:36:10 +00:00
Andrew Trick 1191773a62 Generalize this test's CHECK statements to handle different indvars modes.
llvm-svn: 139577
2011-09-13 02:46:27 +00:00
Evan Cheng e891654a58 Change ARM / Thumb2 addc / adde and subc / sube modeling to use physical
register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.

When a i64 sub is expanded to subc + sube.
  libcall #1
     \
      \        subc 
       \       /  \
        \     /    \
         \   /    libcall #2
          sube

If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.

  subc
   |
  libcall #2
   |
  libcall #1
   |
  sube

However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.

The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.

rdar://10019576

llvm-svn: 138791
2011-08-30 01:34:54 +00:00
Jim Grosbach 066e9ec1e4 Update tests.
llvm-svn: 138116
2011-08-19 22:19:48 +00:00
Jim Grosbach 90103ccc05 Thumb assembly parsing and encoding for LDM instruction.
Fix base register type and canonicallize to the "ldm" spelling rather than
"ldmia." Add diagnostics for incorrect writeback token and out-of-range
registers.

llvm-svn: 137986
2011-08-18 21:50:53 +00:00
Eli Friedman a27da98921 Fix up the patterns for SXTB, SXTH, UXTB, and UXTH so that they are correctly active without HasT2ExtractPack. PR10611.
llvm-svn: 137061
2011-08-08 19:49:37 +00:00
Jakub Staszak 15e5b742ad Use MachineBranchProbabilityInfo in If-Conversion instead of its own heuristics.
llvm-svn: 136826
2011-08-03 22:34:43 +00:00
Evan Cheng 2129f59637 Introduce MCCodeGenInfo, which keeps information that can affect codegen
(including compilation, assembly). Move relocation model Reloc::Model from
TargetMachine to MCCodeGenInfo so it's accessible even without TargetMachine.

llvm-svn: 135468
2011-07-19 06:37:02 +00:00
Evan Cheng f863e3fb73 Improve codegen for select's:
if (x != 0) x = 1
if (x == 1) x = 1

Previous codegen looks like this:
        mov     r1, r0
        cmp     r1, #1
        mov     r0, #0
        moveq   r0, #1

The naive lowering select between two different values. It should recognize the
test is equality test so it's more a conditional move rather than a select:
        cmp     r0, #1
        movne   r0, #0

rdar://9758317

llvm-svn: 135017
2011-07-13 00:42:17 +00:00
Jim Grosbach ade1fb17d4 Improve test cases from r134746.
Use memory barriers to force if-conversion off for these tests instead of
the internal llc command line option ifcvt-limit.

llvm-svn: 134986
2011-07-12 16:06:01 +00:00
Jim Grosbach 7471937ad7 Make tBX_RET and tBX_RET_vararg predicable.
The normal tBX instruction is predicable, so there's no reason the
pseudos for using it as a return shouldn't be. Gives us some nice code-gen
improvements as can be seen by the test changes. In particular, several
tests now have to disable if-conversion because it works too well and defeats
the test.

llvm-svn: 134746
2011-07-08 21:50:04 +00:00
Jakob Stoklund Olesen bbe2a5cfff Fix more register allocation sensitive tests.
llvm-svn: 134667
2011-07-08 00:24:06 +00:00
Evan Cheng 8b2bda09a5 Change some ARM subtarget features to be single bit yes/no in order to sink them down to MC layer. Also fix tests.
llvm-svn: 134590
2011-07-07 03:55:05 +00:00
Chandler Carruth a6e593b4eb FileCheck-ize another test. Reduces the llc invocations from 8 to 1, and
makes one of the tests actually mean something (as the string 'add' will
always appear in the output of this file).

llvm-svn: 134358
2011-07-02 21:34:52 +00:00
Jim Grosbach cf1464d943 ARMv7M vs. ARMv7E-M support.
The DSP instructions in the Thumb2 instruction set are an optional extension
in the Cortex-M* archtitecture. When present, the implementation is considered
an "ARMv7E-M implementation," and when not, an "ARMv7-M implementation."

Add a subtarget feature hook for the v7e-m instructions and hook it up. The
cortex-m3 cpu is an example of a v7m implementation, while the cortex-m4 is
a v7e-m implementation.

rdar://9572992

llvm-svn: 134261
2011-07-01 21:12:19 +00:00
Jim Grosbach b98ab91e39 Thumb1 register to register MOV instruction is predicable.
Fix a FIXME and allow predication (in Thumb2) for the T1 register to
register MOV instructions. This allows some better codegen with
if-conversion (as seen in the test updates), plus it lays the groundwork
for pseudo-izing the tMOVCC instructions.

llvm-svn: 134197
2011-06-30 22:10:46 +00:00
Jim Grosbach 353da73186 Pseudo-ize the t2LDMIA_RET instruction.
It's just a t2LDMIA_UPD instruction with extra codegen properties, so it
doesn't need the encoding information. As a side-benefit, we now correctly
recognize for instruction printing as a 'pop' instruction.

llvm-svn: 134173
2011-06-30 18:25:42 +00:00
Benjamin Kramer d2a84f6a63 Don't depend on the optimization reverted in r134067.
llvm-svn: 134068
2011-06-29 14:07:18 +00:00
Chris Lattner 8936d2bfbc Remove support for parsing the "type i32" syntax for defining a numbered
top level type without a specified number.  This syntax isn't documented
and blocks forward progress.

llvm-svn: 133371
2011-06-19 00:03:46 +00:00
Chris Lattner 80ed9dc9e5 rip out a ton of intrinsic modernization logic from AutoUpgrade.cpp, which is
for pre-2.9 bitcode files.  We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.

As usual, updating the testsuite is a PITA.

llvm-svn: 133337
2011-06-18 06:05:24 +00:00
Jakob Stoklund Olesen 831ae0105a Switch ARM to using AltOrders instead of MethodBodies.
This slightly changes the GPR allocation order on Darwin where R9 is not
a callee-saved register:

Before: %R0 %R1 %R2 %R3 %R12 %R9 %LR %R4 %R5 %R6 %R8 %R10 %R11
After:  %R0 %R1 %R2 %R3 %R9 %R12 %LR %R4 %R5 %R6 %R8 %R10 %R11
llvm-svn: 133326
2011-06-18 01:14:46 +00:00
Chris Lattner b90ed2233c manually upgrade a bunch of tests to modern syntax, and remove some that
are either unreduced or only test old syntax.

llvm-svn: 133228
2011-06-17 03:14:27 +00:00
Rafael Espindola 844485af13 Implement Jakob's suggestion on how to detect fall thought without calling
AnalyzeBranch.

llvm-svn: 132981
2011-06-14 06:08:32 +00:00
Rafael Espindola defd4b0875 AnalyzeBranch doesn't change which successors a bb has, just the order
we try to branch to them.

Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for

-----------
...
jne foo
jmp bar

foo:
----------

llvm-svn: 132882
2011-06-12 03:20:32 +00:00
Cameron Zwarich 2e252de512 Fix an issue where the two-address conversion pass incorrectly rewrites untied
operands to an early clobber register. This fixes <rdar://problem/9566076>.

llvm-svn: 132738
2011-06-07 23:54:00 +00:00
Jakob Stoklund Olesen b8bf3c0f8b Switch AllocationOrder to using RegisterClassInfo instead of a BitVector
of reserved registers.

Use RegisterClassInfo in RABasic as well. This slightly changes som
allocation orders because RegisterClassInfo puts CSR aliases last.

llvm-svn: 132581
2011-06-03 20:34:53 +00:00
Stuart Hastings aa02c0847d Since I can't reproduce the failures from 131261, re-trying with a
simplified version.  <rdar://problem/9298790>

llvm-svn: 131274
2011-05-13 00:51:54 +00:00
Stuart Hastings 8d57d8ea64 Revert 131266 and 131261 due to buildbot complaints.
rdar://problem/9298790

llvm-svn: 131269
2011-05-13 00:15:17 +00:00
Stuart Hastings ef4940254f Tweak 131261 (thumb2-cbnz.ll) to generate the intended cbnz.
rdar://problem/9298790

llvm-svn: 131266
2011-05-13 00:10:03 +00:00
Stuart Hastings 89f1b47e3a Non-fast-isel followup to 129634; correctly handle branches controlled
by non-CMP expressions.  The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester.  Alas, the ARM assembly would be very difficult to check with
FileCheck.

The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790

llvm-svn: 131261
2011-05-12 23:36:41 +00:00
Eli Friedman 5401962643 Re-revert r130877; it's apparently causing a regression on 197.parser,
possibly related to cbnz formation.

llvm-svn: 130977
2011-05-06 05:23:07 +00:00
Eli Friedman 0fe4608af2 Re-commit r130862 with a minor change to avoid an iterator running off the edge in some cases.
Original message:

Teach MachineCSE how to do simple cross-block CSE involving physregs.  This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .

llvm-svn: 130877
2011-05-04 22:10:36 +00:00
Eli Friedman 3bd79ba856 Back out r130862; it appears to be breaking bootstrap.
llvm-svn: 130867
2011-05-04 20:48:42 +00:00
Eli Friedman a16fc2fec0 Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130862
2011-05-04 19:54:24 +00:00
Jakob Stoklund Olesen 28a93a49bb Fix more register and coalescing dependencies.
llvm-svn: 130859
2011-05-04 19:02:11 +00:00
Jakob Stoklund Olesen d7fd7bfc31 Explicitly request physreg coalesing for a bunch of Thumb2 unit tests.
These tests all follow the same pattern:

	mov	r2, r0
	movs	r0, #0
	$CMP	r2, r1
	it	eq
	moveq	r0, #1
	bx	lr

The first 'mov' can be eliminated by rematerializing 'movs r0, #0' below the
test instruction:

	$CMP	r0, r1
	mov.w	r0, #0
	it	eq
	moveq	r0, #1
	bx	lr

So far, only physreg coalescing can do that. The register allocators won't yet
split live ranges just to eliminate copies. They can learn, but this particular
problem is not likely to show up in real code. It only appears because r0 is
used for both the function argument and return value.

llvm-svn: 130858
2011-05-04 19:02:07 +00:00
Jakob Stoklund Olesen edfabc9aad Weekly fix of register allocation dependent unit tests.
llvm-svn: 130567
2011-04-30 01:37:52 +00:00
Andrew Trick e794e17524 Teach Thumb2 isel to fold and->rotr ==> ROR.
Generalization of Nate Begeman's patch!

llvm-svn: 130502
2011-04-29 14:18:15 +00:00
Andrew Trick 65266ed4d7 Combine thumb2-ror tests.
llvm-svn: 130498
2011-04-29 14:02:41 +00:00
Evan Cheng 1355bbdd11 Be careful about scheduling nodes above previous calls. It increase usages of
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.

Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.

rdar://9329627

llvm-svn: 130245
2011-04-26 21:31:35 +00:00
Benjamin Kramer ba446cc12a Make tests more useful.
lit needs a linter ...

llvm-svn: 130126
2011-04-25 10:12:01 +00:00
Andrew Trick 76dca78cb4 Accidental function name mangling.
llvm-svn: 130050
2011-04-23 04:08:15 +00:00
Andrew Trick 0ed5778a1e Thumb2 and ARM add/subtract with carry fixes.
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.

Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.

Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).

llvm-svn: 130048
2011-04-23 03:55:32 +00:00
Andrew Trick 1a1f8d4640 whitespace
llvm-svn: 130046
2011-04-23 03:24:11 +00:00
Evan Cheng c0d2004e3c In Thumb2 mode, lower frame indix references to:
add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.

rdar://9321541

llvm-svn: 129971
2011-04-22 01:42:52 +00:00
Andrew Trick b53a00d2cb Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.

Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129421
2011-04-13 00:38:32 +00:00
Chris Lattner 418b1037b0 fix two completely broken tests, which were matching due to PR9629.
llvm-svn: 129195
2011-04-09 06:34:38 +00:00
Jakob Stoklund Olesen 100f53fd25 Fix Thumb and Thumb2 tests to be register allocator independent.
llvm-svn: 128690
2011-03-31 23:31:50 +00:00
Eric Christopher d553096688 Fix the bfi handling for or (and a mask) (and b mask). We need the two
masks to match inversely for the code as is to work. For the example given
we actually want:

bfi r0, r2, #1, #1

not #0, however, given the way the pattern is written it's not possible
at the moment.

Fixes rdar://9177502

llvm-svn: 128320
2011-03-26 01:21:03 +00:00
Cameron Zwarich 338d362200 Roll r127459 back in:
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127498
2011-03-11 21:52:04 +00:00
Daniel Dunbar 94ccb27b43 Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often get
created from the", it broke some GCC test suite tests.

llvm-svn: 127477
2011-03-11 19:30:30 +00:00
Cameron Zwarich cc27b3acc4 Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127459
2011-03-11 04:54:27 +00:00
Bob Wilson 43dff0f4b4 Move a test that ended up in the wrong place.
llvm-svn: 124933
2011-02-05 04:15:50 +00:00
Evan Cheng 2f2435d026 Last round of fixes for movw + movt global address codegen.
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.

It's now enabled by default for Darwin.

llvm-svn: 123991
2011-01-21 18:55:51 +00:00
Andrew Trick bd428ec50f Enable support for precise scheduling of the instruction selection
DAG. Disable using "-disable-sched-cycles".

For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.

Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
  operations can have their latency covered. i.e. two independent 4
  cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
  latency-based stalls on their uses will be prioritized by depth before height
  (height is irrelevant if no stalls occur in the schedule below this point).

llvm-svn: 123971
2011-01-21 06:19:05 +00:00
Bob Wilson 8265d56638 Add ARM patterns to match EXTRACT_SUBVECTOR nodes.
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.

The test changes are needed to keep those spill-q tests from testing aligned
spills and restores.  If the only aligned stack objects are spill slots, we
no longer realign the stack frame.  Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.

llvm-svn: 122995
2011-01-07 04:59:04 +00:00
Bob Wilson 651eaa02b8 Remove the rest of the *_sfp Neon instruction patterns.
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now.  It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior.  Since that isn't obviously wrong, I've just
changed the test file.  This completes the work for Radar 8711675.

llvm-svn: 121730
2010-12-13 23:02:37 +00:00
Evan Cheng 3434575704 (or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == #shamt. rdar://8752056
llvm-svn: 121606
2010-12-11 04:11:38 +00:00
Jim Grosbach 5fccad84a3 ARM stm/ldm instructions require more than one register in the register list.
Otherwise, a plain str/ldr should be used instead. Make sure we account for
that in prologue/epilogue code generation.
rdar://8745460

llvm-svn: 121391
2010-12-09 18:31:13 +00:00
Bob Wilson ed854baad5 The Thumb tADDrSPi instruction is not valid when the destination is SP.
Check for that and try narrowing it to tADDspi instead.  Radar 8724703.

llvm-svn: 120892
2010-12-04 04:40:19 +00:00
Jim Grosbach ca7eaaafda When using the 'push' mnemonic for Thumb2 stmdb, be explicit when it's the
32-bit wide version by adding the .w suffix.

llvm-svn: 120838
2010-12-03 20:33:01 +00:00
Owen Anderson 943fb60b1f Add correct encodings for STRD and LDRD, including fixup support. Additionally, update these to unified syntax.
llvm-svn: 120589
2010-12-01 19:18:46 +00:00
Evan Cheng eb56dca4fd Fix epilogue codegen to avoid leaving the stack pointer in an invalid
state. Previously Thumb2 would restore sp from fp like this:
mov sp, r7
sub, sp, #4
If an interrupt is taken after the 'mov' but before the 'sub', callee-saved
registers might be clobbered by the interrupt handler. Instead, try
restoring directly from sp:
add sp, #4
Or, if necessary (with VLA, etc.) use a scratch register to compute sp and
then restore it:
sub.w r4, r7, #8
mov sp, r7
rdar://8465407

llvm-svn: 119977
2010-11-22 18:12:04 +00:00
Eric Christopher b006fc9c07 Rewrite stack callee saved spills and restores to use push/pop instructions.
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.

Adjust all testcases accordingly.

llvm-svn: 119725
2010-11-18 19:40:05 +00:00
Dale Johannesen 0659c8f157 These tests are looking for library function names that
appear to differ on Linux.  Try to make them pass on Linux.
Would be good for a Linux person to review this.

llvm-svn: 119572
2010-11-17 21:57:32 +00:00
Evan Cheng 7f8ab6ee8b Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.

Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.

Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.

2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.

rdar://8663787, rdar://8241368

llvm-svn: 119548
2010-11-17 20:13:28 +00:00
Evan Cheng debf9c502a Two sets of changes. Sorry they are intermingled.
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
   "optimize for latency". Call instructions don't have the right latency and
   this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
   not # of micro-ops since multi-latency instructions is completely executed
   even when the predicate is false. Also, some instruction will be "slower"
   when they are predicated due to the register def becoming implicit input.
   rdar://8598427

llvm-svn: 118135
2010-11-03 00:45:17 +00:00
Jim Grosbach 0b7fda23cc Revert r114340 (improvements in Darwin function prologue/epilogue), as it broke
assumptions about stack layout. Specifically, LR must be saved next to FP.

llvm-svn: 118026
2010-11-02 17:35:25 +00:00
Bob Wilson 7ed597149b Overhaul memory barriers in the ARM backend. Radar 8601999.
There were a number of issues to fix up here:
* The "device" argument of the llvm.memory.barrier intrinsic should be
used to distinguish the "Full System" domain from the "Inner Shareable"
domain.  It has nothing to do with using DMB vs. DSB instructions.
* The compiler should never need to emit DSB instructions.  Remove the
ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB.
* Merge the separate DMB/DSB instructions for options only used for the
disassembler with the default DMB/DSB instructions.  Add the default
"full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum.
* Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement
a data memory barrier using the MCR instruction.
* Fix up encodings for these instructions (except MCR).
I also updated the tests and added a few new ones to check for DMB options
that were not currently being exercised.

llvm-svn: 117756
2010-10-30 00:54:37 +00:00
Evan Cheng 6c1414f9c2 Avoiding overly aggressive latency scheduling. If the two nodes share an
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.

BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB

=>

BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB

This fixed the recent 256.bzip2 regression.

llvm-svn: 117675
2010-10-29 18:09:28 +00:00
Evan Cheng 87066f0677 More accurate estimate / tracking of register pressure.
- Initial register pressure in the loop should be all the live defs into the
  loop. Not just those from loop preheader which is often empty.
- When an instruction is hoisted, update register pressure from loop preheader
  to the original BB.
- Treat only use of a virtual register as kill since the code is still SSA.

llvm-svn: 116956
2010-10-20 22:03:58 +00:00
Dale Johannesen ff37675c72 Fix crash introduced in 116852. 8573915.
llvm-svn: 116955
2010-10-20 22:03:37 +00:00
Dale Johannesen 710a2d9d46 Enable using vdup for vector constants which are splat of
integers by default, and remove the controlling flag, now
that LICM will hoist such vdup's.  8003375.

llvm-svn: 116852
2010-10-19 20:00:17 +00:00
Evan Cheng 63c7608c34 Re-enable register pressure aware machine licm with fixes. Hoist() may have
erased the instruction during LICM so UpdateRegPressureAfter() should not
reference it afterwards.

llvm-svn: 116845
2010-10-19 18:58:51 +00:00
Daniel Dunbar 418204e523 Revert r116781 "- Add a hook for target to determine whether an instruction def
is", which breaks some nightly tests.

llvm-svn: 116816
2010-10-19 17:14:24 +00:00
Evan Cheng 8249dfe6ce - Add a hook for target to determine whether an instruction def is
"long latency" enough to hoist even if it may increase spilling. Reloading
  a value from spill slot is often cheaper than performing an expensive
  computation in the loop. For X86, that means machine LICM will hoist
  SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
  instructions.
- Enable register pressure aware machine LICM by default.

llvm-svn: 116781
2010-10-19 00:55:07 +00:00
Bob Wilson 056b694de1 Change register allocation order for ARM VFP and NEON registers to put the
callee-saved registers at the end of the lists.  Also prefer to avoid using
the low registers that are in register subclasses required by certain
instructions, so that those registers will more likely be available when needed.
This change makes a huge improvement in spilling in some cases.  Thanks to
Jakob for helping me realize the problem.

Most of this patch is fixing the testsuite.  There are quite a few places
where we're checking for specific registers.  I changed those to wildcards
in places where that doesn't weaken the tests.  The spill-q.ll and
thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch
of live values to force spills on those tests.

llvm-svn: 116055
2010-10-08 06:15:13 +00:00
Owen Anderson 61158f98ab Enable target-specific mul-lowering on ARM, even at -Os. Remove a test that this makes
irrelevant, but add a new test for the new, improved functionality.

llvm-svn: 114494
2010-09-21 22:51:46 +00:00
Jim Grosbach 94dfd6fc4f Simplify ARM callee-saved register handling by removing the distinction
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.

For example, previously we would generate code like:
        push    {r4, r5, r6, r7, lr}
        add     r7, sp, #12
        stmdb   sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
        push    {r4, r5, r6, r7, r8, r10, r11, lr}
        add     r7, sp, #12

rdar://8445635

llvm-svn: 114340
2010-09-20 19:32:20 +00:00
Jim Grosbach 7a6c37d3e7 Teach the (non-MC) instruction printer to use the cannonical names for push/pop,
and shift instructions on ARM. Update the tests to match.

llvm-svn: 114230
2010-09-17 22:36:38 +00:00
Jim Grosbach 20da4e360b Move thumb2 tests to the thumb2 directory
llvm-svn: 114206
2010-09-17 20:34:09 +00:00
Evan Cheng bf4070756f Teach if-converter to be more careful with predicating instructions that would
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.

llvm-svn: 113570
2010-09-10 01:29:16 +00:00
Bob Wilson 4adbaf1843 Fix NEON VLD pseudo instruction itineraries that were incorrectly copied from
the VST pseudos.  The VLD/VST scheduling still needs work (see pr6722), but
at least we shouldn't confuse the loads with the stores.

llvm-svn: 113473
2010-09-09 05:40:26 +00:00
Jim Grosbach 03f4be86ba Re-apply r112883:
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."

r112986 fixed a latent bug exposed by the above.

llvm-svn: 112989
2010-09-03 18:37:12 +00:00
Daniel Dunbar 2ac3386ef3 Revert "For ARM stack frames that utilize variable sized objects and have either", it is breaking oggenc with Clang for ARMv6.
This reverts commit 8d6e29cfda270be483abf638850311670829ee65.

llvm-svn: 112962
2010-09-03 15:26:42 +00:00