Commit Graph

17612 Commits

Author SHA1 Message Date
Devang Patel 46bda61a81 As per ARM docs, register Dx is described as DW_OP_regx(256+x) in DWARF.
llvm-svn: 129922
2011-04-21 17:51:06 +00:00
Justin Holewinski d74d88a861 PTX: Expand useable register space
llvm-svn: 129913
2011-04-21 16:08:02 +00:00
Che-Liang Chiou 14c48e5d66 ptx: fix parameter ordering
This patch depends on the prior fix r129908 that changes to use std::find,
rather than std::binary_search, on unordered array.

Patch by Dan Bailey

llvm-svn: 129909
2011-04-21 10:56:58 +00:00
Che-Liang Chiou cdc51569ee ptx: PTXMachineFunctionInfo no longer sort registers and so should not use std::binary_search
llvm-svn: 129908
2011-04-21 10:16:20 +00:00
Evan Cheng 5f1ba4cd2d Remove -use-divmod-libcall. Let targets opt in when they are available.
llvm-svn: 129884
2011-04-20 22:20:12 +00:00
Eli Friedman c93d399eed Revert r129846; it's breaking a buildbot. See
http://google1.osuosl.org:8011/builders/llvm-x86_64-linux-checks/builds/825/steps/test.llvm.stage2/logs/st.ll

llvm-svn: 129869
2011-04-20 19:00:08 +00:00
Jakob Stoklund Olesen 0e34c1dfac Prefer cheap registers for busy live ranges.
On the x86-64 and thumb2 targets, some registers are more expensive to encode
than others in the same register class.

Add a CostPerUse field to the TableGen register description, and make it
available from TRI->getCostPerUse. This represents the cost of a REX prefix or a
32-bit instruction encoding required by choosing a high register.

Teach the greedy register allocator to prefer cheap registers for busy live
ranges (as indicated by spill weight).

llvm-svn: 129864
2011-04-20 18:19:48 +00:00
Stuart Hastings 7850af6ea0 Excise unintended hunk in 129858. <rdar://problem/7662569>
llvm-svn: 129862
2011-04-20 18:09:26 +00:00
Stuart Hastings 45fe3c38c5 ARM byval support. Will be enabled by another patch to the FE. <rdar://problem/7662569>
llvm-svn: 129858
2011-04-20 16:47:52 +00:00
Justin Holewinski 7d8895e767 PTX: Add intrinsics to list of built-in intrinsics, which allows them to be
used by Clang.  To help Clang integration, the PTX target has been split
     into two targets: ptx32 and ptx64, depending on the desired pointer size.

- Add GCCBuiltin class to all intrinsics
- Split PTX target into ptx32 and ptx64

llvm-svn: 129851
2011-04-20 15:37:17 +00:00
Che-Liang Chiou 6586f84685 ptx: add integer div and rem instruction
Patched by Dan Bailey

llvm-svn: 129848
2011-04-20 09:28:55 +00:00
Che-Liang Chiou 5a952b3c67 ptx: add floating-point comparison to setp
Patched by Dan Bailey

llvm-svn: 129847
2011-04-20 09:28:20 +00:00
Che-Liang Chiou 49160f9a71 ptx: fix parameter ordering
Patched by Dan Bailey

llvm-svn: 129846
2011-04-20 09:27:19 +00:00
Nick Lewycky 4dae63e35b This should always be signed chars, so use int8_t. This fixes a miscompile when
llvm is built with unsigned chars where an immediate such as 0xff would be zero
extended to 64-bits, turning "cmp $0xff,%eax" into
"cmp $0xffffffffffffffff,%eax".

llvm-svn: 129845
2011-04-20 03:19:42 +00:00
Rafael Espindola e473aaf540 Remove unused arguments.
llvm-svn: 129844
2011-04-20 03:08:09 +00:00
Daniel Dunbar cd01ed5bd6 ADT/Triple: Renambe isOSX... methods to isMacOSX for consistency with the OS
triple component.

llvm-svn: 129838
2011-04-20 00:14:25 +00:00
Johnny Chen dc62e59776 Fix typo in the comment.
llvm-svn: 129837
2011-04-19 23:58:52 +00:00
Daniel Dunbar 2b9b0e3748 ADT/Triple: Move a variety of clients to using isOSDarwin() and isOSWindows()
predicates.

llvm-svn: 129816
2011-04-19 21:14:45 +00:00
Daniel Dunbar 100455a3c8 Target/X86: Eliminate uses of getDarwinVers().
llvm-svn: 129813
2011-04-19 21:04:12 +00:00
Daniel Dunbar 44b530369d Target/X86: Add getTargetTriple() accessor.
llvm-svn: 129812
2011-04-19 21:01:47 +00:00
Daniel Dunbar e3de896b5e Target/PPC: Kill off DarwinVers, which is now dead.
llvm-svn: 129811
2011-04-19 20:59:24 +00:00
Daniel Dunbar f954a0f028 Target/PPC: Eliminate a use of getDarwinVers().
llvm-svn: 129810
2011-04-19 20:57:03 +00:00
Daniel Dunbar a37aab2515 Target/PPC: Add a TargetTriple field.
llvm-svn: 129809
2011-04-19 20:54:28 +00:00
Daniel Dunbar 9483bb6bf3 Target: Eliminate a use of getDarwinMajorNumber().
llvm-svn: 129803
2011-04-19 20:44:08 +00:00
Eric Christopher c721b0db6d Remove some duplicate op action entries and reorganize.
llvm-svn: 129781
2011-04-19 18:49:19 +00:00
Bob Wilson 0858c3aaed This patch combines several changes from Evan Cheng for rdar://8659675.
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Enable these fp vmlx codegen changes for Cortex-A9.

llvm-svn: 129775
2011-04-19 18:11:57 +00:00
Bob Wilson d04a83f8f2 Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637.
llvm-svn: 129774
2011-04-19 18:11:52 +00:00
Bob Wilson a2881ee8a4 Avoid some 's' 16-bit instruction which partially update CPSR
(and add false dependency) when it isn't dependent on last CPSR defining
instruction. rdar://8928208

llvm-svn: 129773
2011-04-19 18:11:49 +00:00
Bob Wilson df612ba006 Avoid write-after-write issue hazards for Cortex-A9.
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.

This is currently disabled by default.  We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.

llvm-svn: 129772
2011-04-19 18:11:45 +00:00
Bob Wilson 3e5944d96b Some single-precision VFP instructions can execute in either the VPF or Neon
pipelines, at least on Cortex-A9.

llvm-svn: 129771
2011-04-19 18:11:38 +00:00
Bob Wilson f33715e554 Improvements for the Cortex-A9 scheduling itineraries.
llvm-svn: 129770
2011-04-19 18:11:36 +00:00
Eli Friedman ee92a6b332 Add support for FastISel'ing varargs calls.
llvm-svn: 129765
2011-04-19 17:22:22 +00:00
Chris Lattner 91328b317b Implement support for x86 fastisel of small fixed-sized memcpys, which are generated
en-mass for C++ PODs.  On my c++ test file, this cuts the fast isel rejects by 10x 
and shrinks the generated .s file by 5%

llvm-svn: 129755
2011-04-19 05:52:03 +00:00
Chris Lattner 34a08c2344 tidy up
llvm-svn: 129753
2011-04-19 05:15:59 +00:00
Chris Lattner 5f4b783426 Implement support for fast isel of calls of i1 arguments, even though they are illegal,
when they are a truncate from something else.  This eliminates fully half of all the 
fastisel rejections on a test c++ file I'm working with, which should make a substantial
improvement for -O0 compile of c++ code.

This fixed rdar://9297003 - fast isel bails out on all functions taking bools

llvm-svn: 129752
2011-04-19 05:09:50 +00:00
Chris Lattner d7f7c93914 Handle i1/i8/i16 constant integer arguments to calls by prepromoting them.
Before we would bail out on i1 arguments all together, now we just bail on
non-constant ones.  Also, we used to emit extraneous code.  e.g. test12 was:

	movb	$0, %al
	movzbl	%al, %edi
	callq	_test12

and test13 was:
	movb	$0, %al
	xorl	%edi, %edi
	movb	%al, 7(%rsp)
	callq	_test13f

Now we get:

	movl	$0, %edi
	callq	_test12
and:
	movl	$0, %edi
	callq	_test13f

llvm-svn: 129751
2011-04-19 04:42:38 +00:00
Chris Lattner c59290a34c be layout aware, to produce:
testb	$1, %al
	je	LBB0_2
## BB#1:                                ## %if.then
	movb	$0, %al

instead of:

	testb	$1, %al
	jne	LBB0_1
	jmp	LBB0_2
LBB0_1:                                 ## %if.then
	movb	$0, %al

how 'bout that.

llvm-svn: 129749
2011-04-19 04:26:32 +00:00
Chris Lattner 2c8a4c3b1b fix rdar://9297006 - fast isel bails out on trunc to i1 -> bools cry,
a common cause of fast isel rejects on c++ code.

llvm-svn: 129748
2011-04-19 04:22:17 +00:00
Evan Cheng 7d6cd4902e Change A9 scheduling itineraries VLD* / VST* entries default to "aligned". That
is, it assumes addresses are 64-bit aligned (which should be the more common
case). If the alignment is found not to be aligned, then getOperandLatency()
would adjust the operand latency computation by one to compensate for it.
rdar://9294833

llvm-svn: 129742
2011-04-19 01:21:49 +00:00
Evan Cheng 4079133796 Do not lose mem_operands while lowering VLD / VST intrinsics.
llvm-svn: 129738
2011-04-19 00:04:03 +00:00
Jim Grosbach ddac5dd269 Trim a few unneeded includes.
llvm-svn: 129723
2011-04-18 21:35:54 +00:00
Eric Christopher 2e3fbaab39 Invert the meaning of printAliasInstr's return value. It now returns
true on success and false on failure. Update callers.

llvm-svn: 129722
2011-04-18 21:28:11 +00:00
Sean Callanan 5d73033e0f Small fix to the ARM AsmParser to ensure that a
superclass variable is instantiated properly.

llvm-svn: 129713
2011-04-18 20:20:44 +00:00
Chris Lattner 80254a53cc Add a new bit that ImmLeaf's can opt into, which allows them to duck out of
the generated FastISel.  X86 doesn't need to generate code to match ADD16ri8 
since ADD16ri will do just fine.  This is a small codesize win in the generated
instruction selector.

llvm-svn: 129692
2011-04-18 06:36:55 +00:00
Chris Lattner c479e0631f switch the rest of the x86 immediate patterns over to ImmLeaf,
simplifying them and exposing more information to tblgen.  It would be nice
if other target authors adopted this as well, particularly arm since it has fastisel.

llvm-svn: 129676
2011-04-17 22:12:55 +00:00
Chris Lattner 2ff8c1a25f now that predicates have a decent abstraction layer on them, introduce a new
kind of predicate: one that is specific to imm nodes.  The predicate function
specified here just checks an int64_t directly instead of messing around with
SDNode's.  The virtue of this is that it means that fastisel and other things
can reason about these predicates.

llvm-svn: 129675
2011-04-17 22:05:17 +00:00
Chris Lattner 514e292b72 Rework our internal representation of node predicates to expose more
structure and fix some fixmes.  We now have a TreePredicateFn class
that handles all of the decoding of these things.  This is an internal
cleanup that has no impact on the code generated by tblgen.

llvm-svn: 129670
2011-04-17 21:38:24 +00:00
Chris Lattner b53ccb8e36 1. merge fast-isel-shift-imm.ll into fast-isel-x86-64.ll
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the 
   shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
   instead of FastEmit_ri to simplify code.

llvm-svn: 129666
2011-04-17 20:23:29 +00:00
Chris Lattner eb729d48ff fix an x86 fast isel issue where we'd completely give up on folding an address
when we have a global variable base an an index.  Instead, just give up on
folding the global variable.

Before we'd geenrate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	addq	%rdi, %rax
	movzbl	(%rax), %eax
	ret

now we generate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	movzbl	(%rax,%rdi), %eax
	ret

The difference is even more significant when there is a scale
involved.

This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64

llvm-svn: 129664
2011-04-17 17:47:38 +00:00
Chris Lattner 4832660b4d fix an oversight which caused us to compile the testcase (and other
less trivial things) into a dummy lea.  Before we generated:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	ret

now we produce:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	ret

This is part of rdar://9289558

llvm-svn: 129662
2011-04-17 17:12:08 +00:00