Commit Graph

958 Commits

Author SHA1 Message Date
Evan Cheng 21b0348199 Disable the Thumb no-return call optimization:
mov lr, pc
b.w _foo

The "mov" instruction doesn't set bit zero to one, it's putting incorrect
value in lr. It messes up backtraces.

rdar://12663632

llvm-svn: 167657
2012-11-10 02:09:05 +00:00
Chad Rosier 66bb178eef Revert r167620; this can be implemented using an existing CL option.
llvm-svn: 167622
2012-11-09 18:25:27 +00:00
Chad Rosier 332fc75b2c Add support for -mstrict-align compiler option for ARM targets.
rdar://12340498

llvm-svn: 167620
2012-11-09 17:29:38 +00:00
Chad Rosier 1ec8e404fc Mark the Int_eh_sjlj_dispatchsetup pseudo instruction as clobbering all
registers.  Previously, the register we being marked as implicitly defined, but
not killed.  In some cases this would cause the register scavenger to spill a
dead register.

Also, use an empty register mask to simplify the logic and to reduce the memory
footprint.
rdar://12592448

llvm-svn: 167499
2012-11-06 23:05:24 +00:00
Quentin Colombet 8e1fe84c3c Vext Lowering was missing opportunities
llvm-svn: 167318
2012-11-02 21:32:17 +00:00
Quentin Colombet 5799e9f66c Change ForceSizeOpt attribute into MinSize attribute
llvm-svn: 167020
2012-10-30 16:32:52 +00:00
Quentin Colombet 3ee56a3bf5 [code size][ARM] Emit regular call instructions instead of the move, branch sequence
llvm-svn: 166854
2012-10-27 01:10:17 +00:00
Stepan Dyatkovskiy dab8043048 ARM:
Removed extra stack frame object for fixed byval arguments,
VarArgsStyleRegisters invocation was reworked due to some improper usage in
past. PR14099 also demonstrates it.

llvm-svn: 166273
2012-10-19 08:23:06 +00:00
Stepan Dyatkovskiy e59a920b0c Issue:
Stack is formed improperly for long structures passed as byval arguments for
EABI mode.

If we took AAPCS reference, we can found the next statements:

A: "If the argument requires double-word alignment (8-byte), the NCRN (Next
Core Register Number) is rounded up to the next even register number." (5.5
Parameter Passing, Stage C, C.3).

B: "The alignment of an aggregate shall be the alignment of its most-aligned
component." (4.3 Composite Types, 4.3.1 Aggregates).

So if we have structure with doubles (9 double fields) and 3 Core unused
registers (r1, r2, r3): caller should use r2 and r3 registers only.
Currently r1,r2,r3 set is used, but it is invalid.

Callee VA routine should also use r2 and r3 regs only. All is ok here. This
behaviour is guessed by rounding up SP address with ADD+BFC operations.

Fix:
Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and
8 byte alignment, we waste odd registers then.

P.S.:
I also improved LDRB_POST_IMM regression test. Since ldrb instruction will
not generated by current regression test after this patch. 

llvm-svn: 166018
2012-10-16 07:16:47 +00:00
Silviu Baranga b14097000b Fixed PR13938: the ARM backend was crashing because it couldn't select a VDUPLANE node with the vector input size different from the output size. This was bacause the BUILD_VECTOR lowering code didn't check that the size of the input vector was correct for using VDUPLANE.
llvm-svn: 165929
2012-10-15 09:41:32 +00:00
Manman Ren 7e48b252e7 ARM: tail-call inside a function where part of a byval argument is on caller's
local frame causes problem.

For example:
void f(StructToPass s) {
  g(&s, sizeof(s));
}
will cause problem with tail-call since part of s is passed via registers and
saved in f's local frame. When g tries to access s, part of s may be corrupted
since f's local frame is popped out before the tail-call.

The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for
the caller. This is a conservative approach, if we can prove the address of
s or part of s is not taken and passed to g, it should be okay to perform
tail-call.

rdar://12442472

llvm-svn: 165853
2012-10-12 23:39:43 +00:00
Jim Grosbach 30af442a84 ARM: Mark VSELECT as 'expand'.
The backend already pattern matches to form VBSL when it can. We may want to
teach it to use the vbsl intrinsics at some point to prevent machine licm from
mucking with this, but using the Expand is completely correct.

http://llvm.org/bugs/show_bug.cgi?id=13831
http://llvm.org/bugs/show_bug.cgi?id=13961

Patch by Peter Couperus <peter.couperus@st.com>.

llvm-svn: 165845
2012-10-12 22:59:21 +00:00
Stepan Dyatkovskiy 283baa0027 Fix for LDRB instruction:
SDNode for LDRB_POST_IMM is invalid: number of registers added to SDNode fewer
that described in .td.

7 ops is needed, but SDNode with only 6 is created.

In more details:
In ARMInstrInfo.td, in multiclass AI2_ldridx, in definition _POST_IMM, offset
operand is defined as am2offset_imm. am2offset_imm is complex parameter type,
and actually it consists from dummy register and imm itself. As I understood
trick with dummy reg was made for AsmParser. In ARMISelLowering.cpp, this dummy
register was not added to SDNode, and it cause crash in Peephole Optimizer pass.

The problem fixed by setting up additional dummy reg when emitting
LDRB_POST_IMM instruction.

llvm-svn: 165617
2012-10-10 11:43:40 +00:00
Stepan Dyatkovskiy f13dbb8e24 Issue description:
SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack
objects and byval parameters. So loading byval parameters from stack may be
inserted *before* it will be stored, since these operations are treated as
independent.

Fix:
Currently ARMTargetLowering::LowerFormalArguments saves byval registers with
FixedStack MachinePointerInfo. To fix the problem we need to store byval
registers with MachinePointerInfo referenced to first the "byval" parameter.

Also commit adds two new fields to the InputArg structure: Function's argument
index and InputArg's part offset in bytes relative to the start position of
Function's argument. E.g.: If function's argument is 128 bit width and it was
splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index,
but different offset values. 

llvm-svn: 165616
2012-10-10 11:37:36 +00:00
Bill Wendling c9b22d735a Create enums for the different attributes.
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488
2012-10-09 07:45:08 +00:00
Micah Villmow cdfe20b97f Move TargetData to DataLayout.
llvm-svn: 165402
2012-10-08 16:38:25 +00:00
Bob Wilson e8a549cd92 Add LLVM support for Swift.
llvm-svn: 164899
2012-09-29 21:43:49 +00:00
Sylvestre Ledru 91ce36c986 Revert 'Fix a typo 'iff' => 'if''. iff is an abreviation of if and only if. See: http://en.wikipedia.org/wiki/If_and_only_if Commit 164767
llvm-svn: 164768
2012-09-27 10:14:43 +00:00
Sylvestre Ledru 721cffd53a Fix a typo 'iff' => 'if'
llvm-svn: 164767
2012-09-27 09:59:43 +00:00
Bill Wendling 863bab689a Remove the `hasFnAttr' method from Function.
The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725
2012-09-26 21:48:26 +00:00
James Molloy 9e98ef1c59 Fix ordering of operands on lowering of atomicrmw min/max nodes on ARM.
llvm-svn: 164685
2012-09-26 09:48:32 +00:00
Evan Cheng 90ae8f8442 Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byte
aligned address. Based on patch by David Peixotto.

Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment
hints. rdar://12090772, rdar://12238782

llvm-svn: 164089
2012-09-18 01:42:45 +00:00
Silviu Baranga b47bb94f93 This patch introduces A15 as a target in LLVM.
llvm-svn: 163803
2012-09-13 15:05:10 +00:00
Craig Topper 3e41a5bb31 Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct.
llvm-svn: 163458
2012-09-08 04:58:43 +00:00
Jakob Stoklund Olesen e45e22b20f Custom DAGCombine for and/or/xor are for all ARMs.
The 'select' transformations apply to all ARM architectures and don't
require hasV6T2Ops.

llvm-svn: 163396
2012-09-07 17:34:15 +00:00
James Molloy 9d30dc2432 Fix self-host; ensure signedness is consistent.
llvm-svn: 163306
2012-09-06 10:32:08 +00:00
James Molloy 49bdbce8e1 Improve codegen for BUILD_VECTORs on ARM.
If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base.

llvm-svn: 163304
2012-09-06 09:55:02 +00:00
Arnold Schwaighofer f00fb1c581 Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 163136
2012-09-04 14:37:49 +00:00
Jakob Stoklund Olesen d3bda3c5b9 Fix a couple of typos in EmitAtomic.
Thumb2 instructions are mostly constrained to rGPR, not tGPR which is
for Thumb1.

rdar://problem/12203728

llvm-svn: 162968
2012-08-31 02:08:34 +00:00
Jakob Stoklund Olesen 710093e360 Use a SmallPtrSet to dedup successors in EmitSjLjDispatchBlock.
The test case ARM/2011-05-04-MultipleLandingPadSuccs.ll was creating
duplicate successor list entries.

llvm-svn: 162222
2012-08-20 20:52:03 +00:00
Jakob Stoklund Olesen e1014e7b98 Remove the CAND/COR/CXOR custom ISD nodes and their select code.
These nodes are no longer needed because the peephole pass can fold
CMOV+AND into ANDCC etc.

llvm-svn: 162179
2012-08-18 21:49:50 +00:00
Jakob Stoklund Olesen dded061f85 Also combine zext/sext into selects for ARM.
This turns common i1 patterns into predicated instructions:

  (add (zext cc), x) -> (select cc (add x, 1), x)
  (add (sext cc), x) -> (select cc (add x, -1), x)

For a function like:

  unsigned f(unsigned s, int x) {
    return s + (x>0);
  }

We now produce:

  cmp r1, #0
  it  gt
  addgt.w r0, r0, #1

Instead of:

  movs  r2, #0
  cmp r1, #0
  it  gt
  movgt r2, #1
  add r0, r2

llvm-svn: 162177
2012-08-18 21:25:22 +00:00
Jakob Stoklund Olesen aab43dbfbb Also pass logical ops to combineSelectAndUse.
Add these transformations to the existing add/sub ones:

  (and (select cc, -1, c), x) -> (select cc, x, (and, x, c))
  (or  (select cc, 0, c), x)  -> (select cc, x, (or, x, c))
  (xor (select cc, 0, c), x)  -> (select cc, x, (xor, x, c))

The selects can then be transformed to a single predicated instruction
by peephole.

This transformation will make it possible to eliminate the ISD::CAND,
COR, and CXOR custom DAG nodes.

llvm-svn: 162176
2012-08-18 21:25:16 +00:00
Jakob Stoklund Olesen c1dee482c8 Add comment, clean up code. No functional change.
llvm-svn: 162107
2012-08-17 16:59:09 +00:00
Jakob Stoklund Olesen c19bf0282d Handle ARM MOVCC optimization in PeepholeOptimizer.
Use the target independent select analysis hooks.

llvm-svn: 162060
2012-08-16 23:14:20 +00:00
Jakob Stoklund Olesen 6cb96120f1 Fold predicable instructions into MOVCC / t2MOVCC.
The ARM select instructions are just predicated moves. If the select is
the only use of an operand, the instruction defining the operand can be
predicated instead, saving one instruction and decreasing register
pressure.

This implementation can turn AND/ORR/EOR instructions into their
corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to
predicate any instruction, but we don't yet support predicated
instructions in SSA form.

llvm-svn: 161994
2012-08-15 22:16:39 +00:00
Evan Cheng eec6bc6270 Use vld1/vst1 to load/store f64 if alignment is < 4 and the target allows unaligned access. rdar://12091029
llvm-svn: 161962
2012-08-15 17:44:53 +00:00
Nadav Rotem 3a94c545cf Do not optimize (or (and X,Y), Z) into BFI and other sequences if the AND ISDNode has more than one user.
rdar://11876519

llvm-svn: 161775
2012-08-13 18:52:44 +00:00
Arnold Schwaighofer b73da9453c Revert 161581: Patch to implement UMLAL/SMLAL instructions for the ARM
architecture

It broke MultiSource/Applications/JM/ldecod/ldecod on armv7 thumb O0 g and armv7
thumb O3.

llvm-svn: 161736
2012-08-12 05:11:56 +00:00
Craig Topper 4fa625fda7 Change addTypeForNeon to use MVT instead of EVT so all the calls to getSimpleVT can be removed.
llvm-svn: 161735
2012-08-12 03:16:37 +00:00
Arnold Schwaighofer 81b2eec1ab Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 161581
2012-08-09 15:25:52 +00:00
Bob Wilson 3e6fa462f3 Fall back to selection DAG isel for calls to builtin functions.
Fast isel doesn't currently have support for translating builtin function
calls to target instructions.  For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization.  Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>

llvm-svn: 161232
2012-08-03 04:06:28 +00:00
Eric Christopher b3322364e4 Add support for the ARM GHC calling convention, this patch was in 3.0,
but somehow managed to be dropped later.

Patch by Karel Gardas.

llvm-svn: 161226
2012-08-03 00:05:53 +00:00
Jim Grosbach 6df755cc4e ARM: Don't assume an SDNode is a constant.
Before accessing a node as a ConstandSDNode, make sure it actually is one.
No testcase of non-trivial size.

rdar://11948669

llvm-svn: 160735
2012-07-25 17:02:47 +00:00
Andrew Trick a22cdb713b Fix ARMTargetLowering::isLegalAddImmediate to consider thumb encodings.
Based on Evan's suggestion without a commitable test.

llvm-svn: 160441
2012-07-18 18:34:27 +00:00
Andrew Trick bc325168c3 whitespace
llvm-svn: 160440
2012-07-18 18:34:24 +00:00
Manman Ren 6e1fd46fdf ARM: use NOEN loads and stores if possible when handling struct byval.
This change is to be enabled in clang.

rdar://9877866

llvm-svn: 158684
2012-06-18 22:23:48 +00:00
Manman Ren e0763c7472 ARM: optimization for sub+abs.
This patch will optimize abs(x-y)
FROM
sub, movs, rsbmi
TO
subs, rsbmi

For abs, we will use cmp instead of movs. This is necessary because we already
have an existing peephole pass which optimizes away cmp following sub.

rdar: 11633193
llvm-svn: 158551
2012-06-15 21:32:12 +00:00
Bill Wendling 4b79647a6e Re-enable the CMN instruction.
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>

llvm-svn: 158302
2012-06-11 08:07:26 +00:00
Manman Ren e873552091 ARM: properly handle alignment for struct byval.
Factor out the expansion code into a function.
This change is to be enabled in clang.

rdar://9877866

llvm-svn: 157830
2012-06-01 19:33:18 +00:00
Manman Ren 9f9111651e ARM: support struct byval in llvm
We handle struct byval by inserting a pseudo op, which will be expanded to a
loop at ExpandISelPseudos.
A separate patch for clang will be submitted to enable struct byval.

rdar://9877866

llvm-svn: 157793
2012-06-01 02:44:42 +00:00
Justin Holewinski aa58397b3c Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall
to pass around a struct instead of a large set of individual values.  This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.

NV_CONTRIB

llvm-svn: 157479
2012-05-25 16:35:28 +00:00
Jakob Stoklund Olesen 691ae3388f Use the right register class for LDRrs.
llvm-svn: 157152
2012-05-20 06:38:47 +00:00
Benjamin Kramer e31f31e5c0 Add a new target hook "predictableSelectIsExpensive".
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.

Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.

I'm not entirely happy with the name of this flag, suggestions welcome ;)

llvm-svn: 156233
2012-05-05 12:49:14 +00:00
Matt Beaumont-Gay e82ab6baa7 Pacify GCC's -Wreturn-type
llvm-svn: 156189
2012-05-04 18:34:27 +00:00
Hans Wennborg aea412008e Make ARM and Mips use TargetMachine::getTLSModel()
This moves the logic for selecting a TLS model to a single place,
instead of the previous three (ARM, Mips, and X86 which already
uses this function).

llvm-svn: 156162
2012-05-04 09:40:39 +00:00
Bob Wilson 9245c93656 Don't introduce illegal types when creating vmull operations. <rdar://11324364>
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal.  Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.

llvm-svn: 155824
2012-04-30 16:53:34 +00:00
Craig Topper c7242e054d Convert more uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent.
llvm-svn: 155188
2012-04-20 07:30:17 +00:00
Evan Cheng d0007f3c83 Handle llvm.fma.* intrinsics. rdar://10914096
llvm-svn: 154439
2012-04-10 21:40:28 +00:00
Evan Cheng f8bad08001 Fix a long standing tail call optimization bug. When a libcall is emitted
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370
2012-04-10 01:51:00 +00:00
Chad Rosier e0e38f61a5 When performing a truncating store, it's possible to rearrange the data
in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339

llvm-svn: 154340
2012-04-09 20:32:02 +00:00
Chad Rosier 99cbde9e82 Update comments and remove unnecessary isVolatile() check.
llvm-svn: 154336
2012-04-09 19:38:15 +00:00
Jim Grosbach 0c509fa6bf Tidy up. 80 columns.
llvm-svn: 154226
2012-04-06 23:43:50 +00:00
Chandler Carruth 8a102c21e3 There is no portable std::abs overload for int64_t, use the llvm::abs64
which exists for this purpose.

llvm-svn: 154199
2012-04-06 20:10:52 +00:00
Jakob Stoklund Olesen 967b86a0a2 Allow negative immediates in ARM and Thumb2 compares.
ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.

llvm-svn: 154183
2012-04-06 17:45:04 +00:00
Rafael Espindola ba0a6cabb8 Always compute all the bits in ComputeMaskedBits.
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Evan Cheng a40d40602c ARM target should allow codegenprep to duplicate ret instructions to enable tailcall opt. rdar://11140249
llvm-svn: 153717
2012-03-30 01:24:39 +00:00
Lang Hames 591cdaf2ee Try using vmov.i32 to materialize FP32 constants that can't be materialized by
vmov.f32.

llvm-svn: 153696
2012-03-29 21:56:11 +00:00
Craig Topper f6e7e12f75 Remove unnecessary llvm:: qualifications
llvm-svn: 153500
2012-03-27 07:21:54 +00:00
Craig Topper 5fa0caafc0 Prune includes and replace uses of ARMRegisterInfo.h with ARMBaeRegisterInfo.h
llvm-svn: 153422
2012-03-26 00:45:15 +00:00
Craig Topper 07720d8dcd Replace uses of ARMBaseInstrInfo and ARMTargetMachine with the Base versions.
llvm-svn: 153421
2012-03-25 23:49:58 +00:00
Anton Korobeynikov 3edd854d64 Perform mul combine when multiplying wiht negative constants.
Patch by Weiming Zhao!
This fixes PR12212

llvm-svn: 153049
2012-03-19 19:19:50 +00:00
Craig Topper 188ed9d56e Reorder includes to match coding standards. Fix an issue or two exposed by that.
llvm-svn: 152978
2012-03-17 07:33:42 +00:00
Lang Hames c35ee8b54a Use vmov.f32 to materialize f32 consts on ARM. This relaxes constraints on
register allocation by allowing all 32 D-registers to be used. Patch by Cameron
Zwarich.

llvm-svn: 152824
2012-03-15 18:49:02 +00:00
Craig Topper bef78fc2ee Convert more static tables of registers used by calling convention to uint16_t to reduce space.
llvm-svn: 152538
2012-03-11 07:57:25 +00:00
Craig Topper 420525ce3b Use uint16_t to store registers in callee saved register tables to reduce size of static data.
llvm-svn: 151996
2012-03-04 03:33:22 +00:00
Evan Cheng d12af5dc69 Neuter the optimization I implemented with r107852 and r108258 which turn some
floating point equality comparisons into integer ones with -ffast-math. The
issue is the optimization causes +0.0 != -0.0.

Now the optimization is only done when one side is known to be 0.0. The other
side's sign bit is masked off for the comparison.

rdar://10964603

llvm-svn: 151861
2012-03-01 23:27:13 +00:00
Evan Cheng 65f9d19c4f Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call.
llvm-svn: 151645
2012-02-28 18:51:51 +00:00
Daniel Dunbar ee7b899343 Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part.
llvm-svn: 151630
2012-02-28 15:36:07 +00:00
Evan Cheng 87c7b09d8d Some ARM implementaions, e.g. A-series, does return stack prediction. That is,
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.

Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.

rdar://8979299

llvm-svn: 151623
2012-02-28 06:42:03 +00:00
Jakob Stoklund Olesen fa7a53746c Switch ARM target to register masks.
I'll let the buildbots determine the compile time improvements from this
change, but 464.h264ref has 5% faster codegen at -O2.

This patch does cause some assembly changes.  Branch folding can make
different decisions about calls with dead return values.
CriticalAntiDepBreaker may choose different registers because its
liveness tracking is affected.  MachineCopyPropagation may sometimes
leave a dead copy behind.

llvm-svn: 151331
2012-02-24 01:19:29 +00:00
Dan Gohman d4a77c4682 When emitting a cmp with 0 for a lowered select, mask out the high
bits of the value carying the boolean condition, as their contents
are undefined. This fixes rdar://10887484.

llvm-svn: 151310
2012-02-24 00:09:36 +00:00
Evan Cheng f258a15bdf Canonicalize (srl (bswap x), 16) to (rotr (bswap x), 16) if the high 16 bits
of x are zero. This optimizes rev + lsr 16 to rev16.

rdar://10750814

llvm-svn: 151230
2012-02-23 02:58:19 +00:00
Evan Cheng e87681cf34 Optimize a couple of common patterns involving conditional moves where the false
value is zero. Instead of a cmov + op, issue an conditional op instead. e.g.
    cmp   r9, r4
    mov   r4, #0
    moveq r4, #1 
    orr   lr, lr, r4

should be:
    cmp   r9, r4
    orreq lr, lr, #1

That is, optimize (or x, (cmov 0, y, cond)) to (or.cond x, y). Similarly extend
this to xor as well as (and x, (cmov -1, y, cond)) => (and.cond x, y).

It's possible to extend this to ADD and SUB but I don't think they are common.

rdar://8659097

llvm-svn: 151224
2012-02-23 01:19:06 +00:00
Craig Topper 760b134ffa Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.
llvm-svn: 151134
2012-02-22 05:59:10 +00:00
Evan Cheng 0460ae8d80 Proper support for a bastardized darwin-eabi hybird ABI.
llvm-svn: 151083
2012-02-21 20:46:00 +00:00
James Molloy 547d4c0662 Improve generated code for extending loads and some trunc stores on ARM.
Teach TargetSelectionDAG about lengthening loads for vector types and set v4i8 as legal. Allow FP_TO_UINT for v4i16 from v4i32.

llvm-svn: 150956
2012-02-20 09:24:05 +00:00
Bill Wendling 05d6f2ff1e Don't reserve the R0 and R1 registers here. We don't use these registers, and
marking them as "live-in" into a BB ruins some invariants that the back-end
tries to maintain.

llvm-svn: 150437
2012-02-13 23:47:16 +00:00
Jason W Kim c7f4841769 Make valgrind happy.
llvm-svn: 150251
2012-02-10 16:07:59 +00:00
Craig Topper e55c556a24 Convert assert(0) to llvm_unreachable
llvm-svn: 149961
2012-02-07 02:50:20 +00:00
Anton Korobeynikov d0c46550fe Cleanups for EABI standard functions
llvm-svn: 149195
2012-01-29 09:11:50 +00:00
Anton Korobeynikov 1b42e64280 Use base AAPCS for varargs functions even for AAPCS-VFP CC
llvm-svn: 149194
2012-01-29 09:06:09 +00:00
David Blaikie 46a9f016c5 More dead code removal (using -Wunreachable-code)
llvm-svn: 148578
2012-01-20 21:51:11 +00:00
David Blaikie 5d8e42755c Refactor variables unused under non-assert builds (& remove two entirely unused variables).
llvm-svn: 148230
2012-01-16 05:17:39 +00:00
Benjamin Kramer 339ced4e34 Return an ArrayRef from ShuffleVectorSDNode::getMask and push it through CodeGen.
llvm-svn: 148218
2012-01-15 13:16:05 +00:00
Jakob Stoklund Olesen 083dbdca7f Match SelectionDAG logic for enabling movt.
Darwin doesn't do static, and ELF targets only support static.

llvm-svn: 147740
2012-01-07 20:49:15 +00:00
Benjamin Kramer 6898db6269 Remove VectorExtras. This unused helper was written for a type of API that is discouraged now.
llvm-svn: 147738
2012-01-07 19:42:13 +00:00
Bob Wilson 1a74de9504 Add variants of the dispatchsetup pseudo for Thumb and !VFP. <rdar://10620138>
My change r146949 added register clobbers to the eh_sjlj_dispatchsetup pseudo
instruction, but on Thumb1 some of those registers cannot be used.  This
caused massive failures on the testsuite when compiling for Thumb1.  While
fixing that, I noticed that the eh_sjlj_setjmp instruction has a "nofp"
variant, and I realized that dispatchsetup needs the same thing, so I have
added that as well.

llvm-svn: 147204
2011-12-22 23:39:48 +00:00
Eli Friedman c9bf1b1bff Make check a bit more strict so we don't call ARM_AM::getFP32Imm with a value that isn't a 32-bit value. (This is just to be safe; I don't think this actually causes any issues in practice.)
llvm-svn: 146700
2011-12-15 22:56:53 +00:00
Chandler Carruth 637cc6a8aa Initial CodeGen support for CTTZ/CTLZ where a zero input produces an
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.

Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.

Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.

llvm-svn: 146466
2011-12-13 01:56:10 +00:00
Stepan Dyatkovskiy 4683740967 Fixed bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2). Third attempt: simplified checks in test for armv7-apple-darwin11.
llvm-svn: 146341
2011-12-11 14:35:48 +00:00
Chad Rosier 6641294e3b Revert r146322 to appease buildbots. Original commit message:
Fixed bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for
FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2). Second
attempt.

llvm-svn: 146328
2011-12-10 19:55:03 +00:00
Stepan Dyatkovskiy df0b779e9f Fixed bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2). Second attempt.
llvm-svn: 146322
2011-12-10 08:42:24 +00:00
Eli Friedman 4e36a934dc Splats can contain undef's; make sure to handle them correctly. PR11526.
llvm-svn: 146299
2011-12-09 23:54:42 +00:00
Daniel Dunbar c09e4593b2 Revert r146143, "Fix bug 9905: Failure in code selection for llvm intrinsics
sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP,
FEXP2).", it is failing tests.

llvm-svn: 146157
2011-12-08 17:32:18 +00:00
Stepan Dyatkovskiy a4bcf27dae Fix bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2).
llvm-svn: 146143
2011-12-08 07:55:03 +00:00
Evan Cheng 7f8e563a69 Add bundle aware API for querying instruction properties and switch the code
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.

For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.

llvm-svn: 146026
2011-12-07 07:15:52 +00:00
Nick Lewycky 50f02cb21b Move global variables in TargetMachine into new TargetOptions class. As an API
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.

One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.

llvm-svn: 145714
2011-12-02 22:16:29 +00:00
Benjamin Kramer 7ba71be392 Move code into anonymous namespaces.
llvm-svn: 145154
2011-11-26 23:01:57 +00:00
Bob Wilson f6d1728d8f Fix ARM SjLj-EH dispatch setup code. <rdar://problem/10444602>
The EmitBasePointerRecalculation function has 2 problems, one minor and one
fatal.  The minor problem is that it inserts the code at the setjmp
instead of in the dispatch block.  The fatal problem is that at the point
where this code runs, we don't know whether there will be a base pointer,
so the entire function is a no-op.  The base pointer recalculation needs to
be handled as it was before, by inserting a pseudo instruction that gets
expanded late.

Most of the support for the old approach is still here, but it no longer
has any connection to the eh_sjlj_dispatchsetup intrinsic.  Clean up the
parts related to the intrinsic and just generate the pseudo instruction
directly.

llvm-svn: 144781
2011-11-16 07:11:57 +00:00
Jay Foad 0745e645e0 Remove some unnecessary includes of PseudoSourceValue.h.
llvm-svn: 144631
2011-11-15 07:24:32 +00:00
Evan Cheng 7ca4b6eb5c Add vmov.f32 to materialize f32 immediate splats which cannot be handled by
integer variants. rdar://10437054

llvm-svn: 144608
2011-11-15 02:12:34 +00:00
Eli Friedman c4a001478c Make sure to expand SIGN_EXTEND_INREG for NEON vectors. PR11319, round 3.
llvm-svn: 144361
2011-11-11 03:16:38 +00:00
Eli Friedman 2d4055b683 Make sure we correctly unroll conversions between v2f64 and v2i32 on ARM.
llvm-svn: 144241
2011-11-09 23:36:02 +00:00
Lang Hames b85fcd07df Lower mem-ops to unaligned i32/i16 load/stores on ARM where supported.
Add support for trimming constants to GetDemandedBits. This fixes some funky
constant generation that occurs when stores are expanded for targets that don't
support unaligned stores natively.

llvm-svn: 144102
2011-11-08 18:56:23 +00:00
Pete Cooper 82cd9e81fc Added invariant field to the DAG.getLoad method and changed all calls.
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses

llvm-svn: 144100
2011-11-08 18:42:53 +00:00
Eli Friedman 6f84fed675 Make sure to mark vector extload's as expand on ARM. Fixes PR11319.
llvm-svn: 144057
2011-11-08 01:43:53 +00:00
Dan Gohman 198b7ffc11 Reapply r143206, with fixes. Disallow physical register lifetimes
across calls, and only check for nested dependences on the special
call-sequence-resource register.

llvm-svn: 143660
2011-11-03 21:49:52 +00:00
Lang Hames 1f4603d498 Fixed parameter name.
llvm-svn: 143594
2011-11-02 23:37:04 +00:00
Lang Hames 9929c423a1 Try to lower memset/memcpy/memmove to vector instructions on ARM where the alignment permits.
llvm-svn: 143582
2011-11-02 22:52:45 +00:00
Dan Gohman 9b9c970148 Revert r143206, as there are still some failing tests.
llvm-svn: 143262
2011-10-29 00:41:52 +00:00
Dan Gohman 73057ad24f Reapply r143177 and r143179 (reverting r143188), with scheduler
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.

llvm-svn: 143206
2011-10-28 17:55:38 +00:00
Duncan Sands 225a7037d6 Speculatively disable Dan's commits 143177 and 143179 to see if
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

Delete #if 0 code accidentally left in.

llvm-svn: 143188
2011-10-28 09:55:57 +00:00
Dan Gohman 4db3f7dd83 Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

llvm-svn: 143177
2011-10-28 01:29:32 +00:00
Lang Hames c47e283430 Make sure short memsets on ARM lower to stores, even when optimizing for size.
llvm-svn: 143055
2011-10-26 20:56:52 +00:00
James Molloy dd9137aa56 Revert r142530 at least temporarily while a discussion is had on llvm-commits regarding exactly how much optsize should optimize for size over performance.
llvm-svn: 143023
2011-10-26 08:53:19 +00:00
Bill Wendling 1414bc5a14 Use a worklist to prevent the iterator from becoming invalidated because of the 'removeSuccessor' call. Noticed in a Release+Asserts+Check buildbot.
llvm-svn: 143018
2011-10-26 07:16:18 +00:00
Evan Cheng 043c9d3f7a Revert part of r142530. The patch potentially hurts performance especially
on Darwin platforms where -Os means optimize for size without hurting
performance.

llvm-svn: 143002
2011-10-26 01:17:44 +00:00
Eli Friedman a5e244c08d Don't crash on variable insertelement on ARM. PR10258.
llvm-svn: 142871
2011-10-24 23:08:52 +00:00
Dan Gohman 4ed1afa51d Change this overloaded use of Sched::Latency to be an overloaded
use of Sched::ILP instead, as Sched::Latency is going away.

llvm-svn: 142813
2011-10-24 17:55:11 +00:00
Bill Wendling 94e6643fce The different flavors of ARM have different valid subsets of registers. Check
that the set of callee-saved registers is correct for the specific platform.
<rdar://problem/10313708> & ctor_dtor_count & ctor_dtor_count-2

llvm-svn: 142706
2011-10-22 00:29:28 +00:00
Bill Wendling cf7bdf4438 Add missing operand. <rdar://problem/10313323>
llvm-svn: 142615
2011-10-20 20:37:11 +00:00
James Molloy 2d768fd379 Use literal pool loads instead of MOVW/MOVT for materializing global addresses when optimizing for size.
On spec/gcc, this caused a codesize improvement of ~1.9% for ARM mode and ~4.9% for Thumb(2) mode. This is
codesize including literal pools.

The pools themselves doubled in size for ARM mode and quintupled for Thumb mode, leaving suggestion that there
is still perhaps redundancy in LLVM's use of constant pools that could be decreased by sharing entries.

Fixes PR11087.

llvm-svn: 142530
2011-10-19 14:11:07 +00:00
Bill Wendling 2977a15ab1 Make sure we emit the 'movw' and 'movt' only if it's supported. Otherwise, use a constant pool.
llvm-svn: 142485
2011-10-19 09:24:02 +00:00
Bill Wendling 7c1634556d Remove some dead code.
llvm-svn: 142484
2011-10-19 09:04:11 +00:00
Bill Wendling 94f60018e0 Emit the MOVT instruction only if the # LPads is > 64K.
llvm-svn: 142460
2011-10-18 23:19:55 +00:00
Bill Wendling 64e6bfc16c For Thumb mode, we need to use a constant pool if the value is too large to be
used with the CMP instruction.

llvm-svn: 142458
2011-10-18 23:11:05 +00:00
Bill Wendling 4969dcdef9 Use the integer compare when the value is small enough. Use the "move into a
register and then compare against that" method when it's too large. We have to
move the value into the register in the "movw, movt" pair of instructions.

llvm-svn: 142440
2011-10-18 22:52:20 +00:00
Bill Wendling 85833f71c6 Use the integer compare when the value is small enough. Use the "move into a
register and then compare against that" method when it's too large. We have to
move the value into the register in the "movw, movt" pair of instructions.

llvm-svn: 142437
2011-10-18 22:49:07 +00:00
Bill Wendling 973c817cde The value we're comparing against may be too large for the ARM CMP
instruction. Move the value into a register and then use that for the CMP.
<rdar://problem/10305266>

llvm-svn: 142431
2011-10-18 22:11:18 +00:00
Bill Wendling b2a703d352 The immediate may be too large for the CMP instruction. Move it into a register
and use that in the CMP.
<rdar://problem/10305266>

llvm-svn: 142429
2011-10-18 21:55:58 +00:00
Andrew Trick 88b2450adc Use ARM/t2PseudoInst class from ARM/Thumb2 special adds/subs patterns.
Clean up the patterns, fix comments, and avoid confusing both tools
and coders. Note that the special adds/subs SelectionDAG nodes no
longer have the dummy cc_out operand.

llvm-svn: 142397
2011-10-18 19:18:52 +00:00
Bob Wilson 93b0f7b319 Use isIntN and isUIntN to check for valid signed/unsigned numbers.
llvm-svn: 142395
2011-10-18 18:46:49 +00:00
Andrew Trick 3f07c429b5 whitespace
llvm-svn: 142394
2011-10-18 18:40:53 +00:00
Bill Wendling 617075fcf6 A landing pad could have more than one predecessor. In that case, we want that
predecessor to remove the jump to it as well. Delay clearing the 'landing pad'
flag until after the jumps have been removed. (There is an implicit assumption
in several modules that an MBB which jumps to a landing pad has only two
successors.)
<rdar://problem/10304224>

llvm-svn: 142390
2011-10-18 18:30:49 +00:00
Bob Wilson 9258b76d8d Fix incorrect check for sign-extended constant BUILD_VECTOR.
<rdar://problem/10298332>

llvm-svn: 142371
2011-10-18 17:34:51 +00:00
Duncan Sands d278d35b13 Fix a bunch of unused variable warnings when doing a release
build with gcc-4.6.

llvm-svn: 142350
2011-10-18 12:44:00 +00:00
Bill Wendling 510fbcd440 Don't renumber the blocks here. This could cause problems later on if another
pass renumbers the blocks again.

llvm-svn: 142258
2011-10-17 21:32:56 +00:00
Bill Wendling f7f223f69e Add a call to EmitSjLjDispatchBlock.
Once the intrinsics are marked as having a custom inserter, it will call this
method to emit the dispatch table into the machine function.

llvm-svn: 142245
2011-10-17 20:37:20 +00:00
Bill Wendling 26d2780d07 Add comment explaining that the order of processing doesn't matter here.
llvm-svn: 142176
2011-10-17 05:25:09 +00:00