Commit Graph

220 Commits

Author SHA1 Message Date
Bill Schmidt 732eb91f05 This is just a clean-up patch that simplifies the initial-exec TLS logic by
avoiding use of machine operand flags.  No change in observable behavior, so
no new test cases.

llvm-svn: 170141
2012-12-13 18:45:54 +00:00
Bill Schmidt 24b8dd6eb7 This patch implements local-dynamic TLS model support for the 64-bit
PowerPC target.  This is the last of the four models, so we now have 
full TLS support.

This is mostly a straightforward extension of the general dynamic model.
I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the
register copy following ADDI_TLSLD_L; otherwise everything above the
ADDIS_DTPREL_HA appeared dead and was removed.

As before, there are new test cases to test the assembly generation, and
the relocations output during integrated assembly.  The expected code
gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll.

There are a couple of things I think can be done more efficiently in the
overall TLS code, so there will likely be a clean-up patch forthcoming;
but for now I want to be sure the functionality is in place.

Bill

llvm-svn: 170003
2012-12-12 19:29:35 +00:00
Bill Schmidt c56f1d34bc This patch implements the general dynamic TLS model for 64-bit PowerPC.
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:

     Instruction                            Relocation            Symbol
  addis ra,r2,x@got@tlsgd@ha           R_PPC64_GOT_TLSGD16_HA       x
  addi  r3,ra,x@got@tlsgd@l            R_PPC64_GOT_TLSGD16_L        x
  bl __tls_get_addr(x@tlsgd)           R_PPC64_TLSGD                x
                                       R_PPC64_REL24           __tls_get_addr
  nop
  <use address in r3>

The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation.  This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr.  Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation.  So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.

Most of the code is pretty straightforward.  I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call.  Something in the 
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations.  This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().

Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.

Comments welcome!

Thanks,
Bill

llvm-svn: 169910
2012-12-11 20:30:11 +00:00
Bill Schmidt ca4a0c9dbd This patch introduces initial-exec model support for thread-local storage
on 64-bit PowerPC ELF.

The patch includes code to handle external assembly and MC output with the
integrated assembler.  It intentionally does not support the "old" JIT.

For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:

 Code sequence            Relocation                  Symbol
  ld 9,x@got@tprel(2)      R_PPC64_GOT_TPREL16_DS      x
  add 9,9,x@tls            R_PPC64_TLS                 x

The register 9 is arbitrary here.  The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x.  It will replace x@tls with the thread-pointer
register (13).

The two test cases verify correct assembly output and relocation output
as just described.

PowerPC-specific selection node variants are added for the two
instructions above:  LD_GOT_TPREL and ADD_TLS.  These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS.  LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.

The rest of the processing is straightforward.

llvm-svn: 169281
2012-12-04 16:18:08 +00:00
Bill Schmidt 34627e3434 This patch implements medium code model support for 64-bit PowerPC.
The default for 64-bit PowerPC is small code model, in which TOC entries
must be addressable using a 16-bit offset from the TOC pointer.  Additionally,
only TOC entries are addressed via the TOC pointer.

With medium code model, TOC entries and data sections can all be addressed
via the TOC pointer using a 32-bit offset.  Cooperation with the linker
allows 16-bit offsets to be used when these are sufficient, reducing the
number of extra instructions that need to be executed.  Medium code model
also does not generate explicit TOC entries in ".section toc" for variables
that are wholly internal to the compilation unit.

Consider a load of an external 4-byte integer.  With small code model, the
compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

With medium model, it instead generates:

	addis 3, 2, .LC1@toc@ha
	ld 3, .LC1@toc@l(3)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the
32-bit offset of ei's TOC entry from the TOC base pointer.  Similarly,
.LC1@toc@l is a relocation requesting the lower 16 bits.  Note that if
the linker determines that ei's TOC entry is within a 16-bit offset of
the TOC base pointer, it will replace the "addis" with a "nop", and
replace the "ld" with the identical "ld" instruction from the small
code model example.

Consider next a load of a function-scope static integer.  For small code
model, the compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc test_fn_static.si[TC],test_fn_static.si
	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

For medium code model, the compiler generates:

	addis 3, 2, test_fn_static.si@toc@ha
	addi 3, 3, test_fn_static.si@toc@l
	lwz 4, 0(3)

	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

Again, the linker may replace the "addis" with a "nop", calculating only
a 16-bit offset when this is sufficient.

Note that it would be more efficient for the compiler to generate:

	addis 3, 2, test_fn_static.si@toc@ha
        lwz 4, test_fn_static.si@toc@l(3)

The current patch does not perform this optimization yet.  This will be
addressed as a peephole optimization in a later patch.

For the moment, the default code model for 64-bit PowerPC will remain the
small code model.  We plan to eventually change the default to medium code
model, which matches current upstream GCC behavior.  Note that the different
code models are ABI-compatible, so code compiled with different models will
be linked and execute correctly.

I've tested the regression suite and the application/benchmark test suite in
two ways:  Once with the patch as submitted here, and once with additional
logic to force medium code model as the default.  The tests all compile
cleanly, with one exception.  The mandel-2 application test fails due to an
unrelated ABI compatibility with passing complex numbers.  It just so happens
that small code model was incredibly lucky, in that temporary values in 
floating-point registers held the expected values needed by the external
library routine that was called incorrectly.  My current thought is to correct
the ABI problems with _Complex before making medium code model the default,
to avoid introducing this "regression."

Here are a few comments on how the patch works, since the selection code
can be difficult to follow:

The existing logic for small code model defines three pseudo-instructions:
LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for
constant pool addresses.  These are expanded by SelectCodeCommon().  The
pseudo-instruction approach doesn't work for medium code model, because
we need to generate two instructions when we match the same pattern.
Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY
node for medium code model, and generates an ADDIStocHA followed by either
a LDtocL or an ADDItocL.  These new node types correspond naturally to
the sequences described above.

The addis/ld sequence is generated for the following cases:
 * Jump table addresses
 * Function addresses
 * External global variables
 * Tentative definitions of global variables (common linkage)

The addis/addi sequence is generated for the following cases:
 * Constant pool entries
 * File-scope static global variables
 * Function-scope static variables

Expanding to the two-instruction sequences at select time exposes the
instructions to subsequent optimization, particularly scheduling.

The rest of the processing occurs at assembly time, in
PPCAsmPrinter::EmitInstruction.  Each of the instructions is converted to
a "real" PowerPC instruction.  When a TOC entry needs to be created, this
is done here in the same manner as for the existing LDtoc, LDtocJTI, and
LDtocCPT pseudo-instructions (I factored out a new routine to handle this).

I had originally thought that if a TOC entry was needed for LDtocL or
ADDItocL, it would already have been generated for the previous ADDIStocHA.
However, at higher optimization levels, the ADDIStocHA may appear in a 
different block, which may be assembled textually following the block
containing the LDtocL or ADDItocL.  So it is necessary to include the
possibility of creating a new TOC entry for those two instructions.

Note that for LDtocL, we generate a new form of LD called LDrs.  This
allows specifying the @toc@l relocation for the offset field of the LD
instruction (i.e., the offset is replaced by a SymbolLo relocation).
When the peephole optimization described above is added, we will need
to do similar things for all immediate-form load and store operations.

The seven "mcm-n.ll" test cases are kept separate because otherwise the
intermingling of various TOC entries and so forth makes the tests fragile
and hard to understand.

The above assumes use of an external assembler.  For use of the
integrated assembler, new relocations are added and used by
PPCELFObjectWriter.  Testing is done with "mcm-obj.ll", which tests for
proper generation of the various relocations for the same sequences
tested with the external assembler.

llvm-svn: 168708
2012-11-27 17:35:46 +00:00
Ulrich Weigand 0f79500af5 Fix wrong PowerPC instruction opcodes for:
- lwaux
 - lhzux
 - stbu

llvm-svn: 167863
2012-11-13 19:21:31 +00:00
Ulrich Weigand 0117718580 Fix instruction encoding for "bd(n)z" on PowerPC,
by using a new instruction format BForm_1.

llvm-svn: 167861
2012-11-13 19:15:52 +00:00
Ulrich Weigand 84ee76acfe Fix instruction encoding for "isel" on PowerPC,
using a new instruction format AForm_4.

llvm-svn: 167860
2012-11-13 19:14:19 +00:00
Adhemerval Zanella 0f9cff1ab8 PowerPC: Fix for rldcl/rldicl/rldicr MC emission
This patch fixes the rldcl/rldicl/rldicr instruction emission. The issue is
the MDForm_1 instruction defines the PowerISA MB field from 'rldicl'
with the name MBE, but RLDCL/RLDICL/RLDICR definition uses as 'MB'.

It end up by generatint the 'rldicl' enconding at 
'lib/Target/PowerPC/PPCGenMCCodeEmitter.inc' to use the fourth argument as the
third. The patch changes it by adjusting to use the fourth argument as
intended.

Fixes PR14180.

llvm-svn: 166770
2012-10-26 12:09:58 +00:00
Adhemerval Zanella 1be10dc732 This patch fixes the MC object emission of 'nop' for external function calls
and also fixes the R_PPC64_TOC16 and R_PPC64_TOC16_DS relocation offset.
The 'nop' is needed so a restore TOC instruction (ld r2,40(r1)) can be placed
by the linker to correct restore the TOC of previous function.

Current code has two issues: it defines in PPCInstr64Bit.td file a LDinto_toc
and LDtoc_restore as a DSForm_1 with DS_RA=0 where it should be
DS=2 (the 8 bytes displacement of the TOC saving). It also wrongly emits a
MC intruction using an uint32_t value while the PPC::BL8_NOP_ELF
and PPC::BLA8_NOP_ELF are both uint64_t (because of the following 'nop').

This patch corrects the remaining ExecutionEngine using MCJIT:

ExecutionEngine/2002-12-16-ArgTest.ll
ExecutionEngine/2003-05-07-ArgumentTest.ll
ExecutionEngine/2005-12-02-TailCallBug.ll
ExecutionEngine/hello.ll
ExecutionEngine/hello2.ll
ExecutionEngine/test-call.ll

llvm-svn: 166682
2012-10-25 14:29:13 +00:00
Will Schmidt 4a67f2e2a7 - add tokens to PPCInstrInfo.td and PPCInstr64Bit.td to resolve
"Instruction 'foo' has no tokens" errors during llvm-tblgen
-gen-asm-matcher attempts.  At this time, the added
tokens are "#comment" style rather than the actual mnemonic.  This will
be revisited once the rest of the base asmparser bits get straightened
out for ppc64-elf-linux.

llvm-svn: 165237
2012-10-04 18:14:28 +00:00
Hal Finkel efe4a44106 Move the PPC TOC defs into the PPC64 InstrInfo file.
Since TOC is just defined for PPC64, move its definition to PPC64 td file.

Patch by Adhemerval Zanella.

llvm-svn: 163234
2012-09-05 19:22:27 +00:00
Hal Finkel 679c73cb33 Split several PPC instruction classes.
Slight reorganisation of PPC instruction classes for scheduling. No
functionality change for existing subtargets.
 - Clearly separate load/store-with-update instructions from regular loads and stores.
 - Split IntRotateD -> IntRotateD and IntRotateDI
 - Split out fsub and fadd from FPGeneral -> FPAddSub
 - Update existing itineraries

Patch by Tobias von Koch.

llvm-svn: 162729
2012-08-28 02:49:14 +00:00
Hal Finkel 686f2ee226 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727
2012-08-28 02:10:33 +00:00
Roman Divacky ace4707ea6 Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.

llvm-svn: 162562
2012-08-24 16:26:02 +00:00
Hal Finkel 895a5f5d12 Add a comment about mftb vs. mfspr on PPC.
Thanks to Alex Rosenberg for the suggestion.

llvm-svn: 161428
2012-08-07 17:04:20 +00:00
Hal Finkel 33e529d56b MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

llvm-svn: 161346
2012-08-06 21:21:44 +00:00
Hal Finkel 70381a7b18 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

llvm-svn: 161302
2012-08-04 14:10:46 +00:00
Jakob Stoklund Olesen ed6c0408fa Remove variable_ops from call instructions in most targets.
Call instructions are no longer required to be variadic, and
variable_ops should only be used for instructions that encode a variable
number of arguments, like the ARM stm/ldm instructions.

llvm-svn: 160189
2012-07-13 20:44:29 +00:00
Hal Finkel 460e94d842 Add support for the PPC isel instruction.
The isel (integer select) instruction is supported on the 440 and A2
embedded cores and on the POWER7.

llvm-svn: 159045
2012-06-22 23:10:08 +00:00
Hal Finkel ca542beffe Add support for generating reg+reg (indexed) pre-inc loads on PPC.
llvm-svn: 158823
2012-06-20 15:43:03 +00:00
Hal Finkel 1cc27e44a4 Add support for generating reg+reg preinc stores on PPC.
PPC will now generate STWUX and friends.

llvm-svn: 158698
2012-06-19 02:34:32 +00:00
Hal Finkel 8c33dde666 Split out the PPC instruction class IntSimple from IntGeneral.
On the POWER7, adds and logical operations can also be handled
in the load/store pipelines. We'll call these IntSimple.

llvm-svn: 158366
2012-06-12 19:01:24 +00:00
Hal Finkel 2edfbddcf0 Improve ext/trunc patterns on PPC64.
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283
2012-06-09 22:10:19 +00:00
Hal Finkel 96c2d4d945 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Roman Divacky e3f15c98d1 Implement local-exec TLS on PowerPC.
llvm-svn: 157935
2012-06-04 17:36:38 +00:00
Hal Finkel 601f555eee Add a missing PPC 64-bit stwu pattern.
This seems to fix the remaining compile-time failures on PPC64 when
compiling with -enable-ppc-preinc.

llvm-svn: 157159
2012-05-20 17:11:24 +00:00
Hal Finkel 59607e63cb Split the LdStGeneral PPC itin. class into LdStLoad and LdStStore.
Loads and stores can have different pipeline behavior, especially on
embedded chips. This change allows those differences to be expressed.
Except for the 440 scheduler, there are no functionality changes.
On the 440, the latency adjustment is only by one cycle, and so this
probably does not affect much. Nevertheless, it will make a larger
difference in the future and this removes a FIXME from the 440 itin.

llvm-svn: 153821
2012-04-01 04:44:16 +00:00
Hal Finkel 51861b4855 Fix dynamic linking on PPC64.
Dynamic linking on PPC64 has had problems since we had to move the top-down
hazard-detection logic post-ra. For dynamic linking to work there needs to be
a nop placed after every call. It turns out that it is really hard to guarantee
that nothing will be placed in between the call (bl) and the nop during post-ra
scheduling. Previous attempts at fixing this by placing logic inside the
hazard detector only partially worked.

This is now fixed in a different way: call+nop codegen-only instructions. As far
as CodeGen is concerned the pair is now a single instruction and cannot be split.
This solution works much better than previous attempts.

The scoreboard hazard detector is also renamed to be more generic, there is currently
no cpu-specific logic in it.

llvm-svn: 153816
2012-03-31 14:45:15 +00:00
Roman Divacky ef21be2cda Convert PowerPC to register mask operands.
llvm-svn: 152122
2012-03-06 16:41:49 +00:00
Hal Finkel a3e6ed2161 X11/X2 loads around indirect calls on ppc64 should not be deleted.
llvm-svn: 151374
2012-02-24 17:54:01 +00:00
Jia Liu b22310fda6 Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878
2012-02-18 12:03:15 +00:00
Hal Finkel ac9df3d411 make CR spill and restore 64-bit clean (no functional change), and fix some other problems found with -verify-machineinstrs
llvm-svn: 146024
2011-12-07 06:34:06 +00:00
Roman Divacky a4a59aebd9 Fix wrong usages of CTR/MCTR where CTR8/MCTR8 was meant.
- Check for MTCTR8 in addition to MTCTR when looking up a hazard.

- When lowering an indirect call use CTR8 when targeting 64bit.

- Introduce BCTR8 that uses CTR8 and use it on 64bit when expanding ISD::BRIND.

The last change fixes PR8487. With those changes, we are able to compile a
running "ls" and "sh" on FreeBSD/PowerPC64.

llvm-svn: 132552
2011-06-03 15:47:49 +00:00
Cameron Zwarich dadd73390f Fix PR8828 by removing the explicit def in MovePCToLR as well as the pointless
piclabel operand. The operand in the tablegen definition doesn't actually turn
into an MI operand, so it just confuses anything checking the TargetInstrDesc
for the number of operands. It suffices to just have an implicit def of LR.

llvm-svn: 131626
2011-05-19 02:56:28 +00:00
Jakob Stoklund Olesen 86e1a65ce5 PowerPC atomic pseudos clobber CR0, they don't read it.
llvm-svn: 128829
2011-04-04 17:07:09 +00:00
Chris Lattner efacb9ee42 split out an encoder for memri operands, allowing a relocation to be plopped
into the immediate field.  This allows us to encode stuff like this:

        lbz r3, lo16(__ZL4init)(r4)     ; globalopt.cpp:5
                                        ; encoding: [0x88,0x64,A,A]
                                        ;   fixup A - offset: 0, value: lo16(__ZL4init), kind: fixup_ppc_lo16

        stw r3, lo16(__ZL1s)(r5)        ; globalopt.cpp:6
                                        ; encoding: [0x90,0x65,A,A]
                                        ;   fixup A - offset: 0, value: lo16(__ZL1s), kind: fixup_ppc_lo16

With this, we should have a completely function MCCodeEmitter for PPC, wewt.

llvm-svn: 119134
2010-11-15 08:22:03 +00:00
Chris Lattner 8f4444d003 add support for encoding the lo14 forms used for a few PPC64 addressing
modes.  For example, we now get:

	ld r3, lo16(_G)(r3)             ; encoding: [0xe8,0x63,A,0bAAAAAA00]
                                        ;   fixup A - offset: 0, value: lo16(_G), kind: fixup_ppc_lo14

llvm-svn: 119133
2010-11-15 08:02:41 +00:00
Chris Lattner 6566112e9c implement the start of support for lo16 and ha16, allowing us to get stuff like:
lis r4, ha16(__ZL4init)         ; encoding: [0x3c,0x80,A,A]
                                        ;   fixup A - offset: 0, value: ha16(__ZL4init), kind: fixup_ppc_ha16

llvm-svn: 119127
2010-11-15 06:33:39 +00:00
Chris Lattner aa4d03d1f5 remove asmstrings (which can never be printed) from pseudo
instructions, allowing is to eliminate some dead operand 
printing methods from the instprinter.

llvm-svn: 119113
2010-11-15 03:48:58 +00:00
Chris Lattner 7077efe894 move the pic base symbol stuff up to MachineFunction
since it is trivial and will be shared between ppc and x86.
This substantially simplifies the X86 backend also.

llvm-svn: 119089
2010-11-14 22:48:15 +00:00
Chris Lattner 94f0c14cb0 reimplement ppc asmprinter "toc" handling to use a VariantKind
on the operand, required for .o file writing and fixing 
the PowerPC/mult-alt-generic-powerpc64.ll failure with the new
instprinter.

llvm-svn: 119087
2010-11-14 22:22:59 +00:00
Chris Lattner f159afc951 remove a bogus pattern, which had the same pattern as STDU
but codegen'd differently.  This really wanted to use some
sort of subreg to get the low 4 bytes of the G8RC register
or something.  However, it's invalid and nothing is testing
it, so I'm just zapping the bogosity.

llvm-svn: 97345
2010-02-27 21:15:32 +00:00
Chris Lattner 986ab3fb1d Eliminate some uses of immAllOnes, just use -1, it does
the same thing and is more efficient for the matcher.

llvm-svn: 96712
2010-02-21 03:12:16 +00:00
Tilmann Scheller 79fef9349c Add support for calls through function pointers in the 64-bit PowerPC SVR4 ABI.
Patch contributed by Ken Werner of IBM!

llvm-svn: 91680
2009-12-18 13:00:15 +00:00
Bob Wilson f84f7105f7 Add PowerPC codegen for indirect branches.
llvm-svn: 86050
2009-11-04 21:31:18 +00:00
Dan Gohman 453d64c9f5 Rename usesCustomDAGSchedInserter to usesCustomInserter, and update a
bunch of associated comments, because it doesn't have anything to do
with DAGs or scheduling. This is another step in decoupling MachineInstr
emitting from scheduling.

llvm-svn: 85517
2009-10-29 18:10:34 +00:00
Dale Johannesen 5e9a5c3664 Model the carry bit on ppc32. Without this we could
move a SUBFC (etc.) below the SUBFE (etc.) that consumed
the carry bit.  Add missing ADDIC8, noticed along the way.

llvm-svn: 82266
2009-09-18 20:15:22 +00:00
Tilmann Scheller d1aaa3243a Add support for the PowerPC 64-bit SVR4 ABI.
The Link Register is volatile when using the 32-bit SVR4 ABI.
Make it possible to use the 64-bit SVR4 ABI.
Add non-volatile registers for the 64-bit SVR4 ABI.
Make sure r2 is a reserved register when using the 64-bit SVR4 ABI.
Update PPCFrameInfo for the 64-bit SVR4 ABI.
Add FIXME for 64-bit Darwin PPC.
Insert NOP instruction after direct function calls.
Emit official procedure descriptors.
Create TOC entries for GlobalAddress references.
Spill 64-bit non-volatile registers to the correct slots.
Only custom lower VAARG when using the 32-bit SVR4 ABI.
Use simple VASTART lowering for the 64-bit SVR4 ABI.

llvm-svn: 79091
2009-08-15 11:54:46 +00:00
Tilmann Scheller 773f14c008 Refactor ABI code in the PowerPC backend.
Make CalculateParameterAndLinkageAreaSize() Darwin-specific.
Remove SVR4 specific code from LowerCALL_Darwin() and LowerFORMAL_ARGUMENTS_Darwin().
Rename MachoABI to DarwinABI for consistency.
Rename ELF ABI to SVR4 ABI for consistency.
Factor out common call return lowering between the Darwin and SVR4 ABI.
Factor out common call lowering between the Darwin and SVR4 ABI.

llvm-svn: 74766
2009-07-03 06:47:08 +00:00
Dan Gohman 69cc2cbbff Rename isSimpleLoad to canFoldAsLoad, to better reflect its meaning.
llvm-svn: 60487
2008-12-03 18:15:48 +00:00
Dan Gohman ae3ba45eb2 Add a sanity-check to tablegen to catch the case where isSimpleLoad
is set but mayLoad is not set. Fix all the problems this turned up.

Change code to not use isSimpleLoad instead of mayLoad unless it
really wants isSimpleLoad.

llvm-svn: 60459
2008-12-03 02:30:17 +00:00
Dale Johannesen 98aa9d3e49 Add a RM pseudoreg for the rounding mode, which
allows ppcf128->int conversion to work with
DeadInstructionElimination.  This is now turned
off but RM is harmless.  It does not do a complete
job of modeling the rounding mode.

Revert marking MFCR as using all 7 CR subregisters;
while correct, this caused the problem in PR 2964,
plus the local RA crash noted in the comments.
This was needed to make DeadInstructionElimination,
but as we are not running that, it is backed out
for now.  Eventually it should go back in and the
other problems fixed where they're broken.

llvm-svn: 58391
2008-10-29 18:26:45 +00:00
Dale Johannesen e395d78657 Mark defs and uses of CTR and LR correctly.
Prevents DeadMachineInstructionElim from thinking
things like MTCTR are dead (fixes massive
testsuite breakage at -O0).

llvm-svn: 58043
2008-10-23 20:41:28 +00:00
Dan Gohman effb894453 Rename ConstantSDNode::getValue to getZExtValue, for consistency
with ConstantInt. This led to fixing a bug in TargetLowering.cpp
using getValue instead of getAPIntValue.

llvm-svn: 56159
2008-09-12 16:56:44 +00:00
Dale Johannesen d4eb0521e4 Implement 32 & 64 bit versions of PPC atomic
binary primitives.

llvm-svn: 55343
2008-08-25 22:34:37 +00:00
Dale Johannesen 765065c982 Remove PPC-specific lowering for atomics; the
generic stuff works fine.

Mark rewritten cmp-and-swap as not using CR1.

llvm-svn: 55336
2008-08-25 21:09:52 +00:00
Dale Johannesen dec51704ed Rewrite ppc code generated for __sync_{bool|val}_compare_and_swap
so that lwarx and stwcx are always executed the same number of times.
This is important for performance, I'm told.

llvm-svn: 55163
2008-08-22 03:49:10 +00:00
Evan Cheng 32e376f354 Implement llvm.atomic.cmp.swap.i32 on PPC. Patch by Gary Benson!
llvm-svn: 53505
2008-07-12 02:23:19 +00:00
Arnold Schwaighofer be0de34ede Tail call optimization improvements:
Move platform independent code (lowering of possibly overwritten
arguments, check for tail call optimization eligibility) from
target X86ISelectionLowering.cpp to TargetLowering.h and
SelectionDAGISel.cpp.

Initial PowerPC tail call implementation:

Support ppc32 implemented and tested (passes my tests and
test-suite llvm-test).  
Support ppc64 implemented and half tested (passes my tests).
On ppc tail call optimization is performed if 
  caller and callee are fastcc
  call is a tail call (in tail call position, call followed by ret)
  no variable argument lists or byval arguments
  option -tailcallopt is enabled
Supported:
 * non pic tail calls on linux/darwin
 * module-local tail calls on linux(PIC/GOT)/darwin(PIC)
 * inter-module tail calls on darwin(PIC)
If constraints are not met a normal call will be emitted.

A test checking the argument lowering behaviour on x86-64 was added.

llvm-svn: 50477
2008-04-30 09:16:33 +00:00
Evan Cheng 5102bd9359 64-bit atomic operations.
llvm-svn: 49949
2008-04-19 02:30:38 +00:00
Evan Cheng 0e7b00d79f Replace all target specific implicit def instructions with a target independent one: TargetInstrInfo::IMPLICIT_DEF.
llvm-svn: 48380
2008-03-15 00:03:38 +00:00
Chris Lattner 20b5a2b037 Add support for ppc64 shifts with 7-bit (oversized) shift amount (e.g. PPCshl).
llvm-svn: 48027
2008-03-07 20:18:24 +00:00
Chris Lattner a4ce4f6987 rename isLoad -> isSimpleLoad due to evan's desire to have such a predicate.
llvm-svn: 45667
2008-01-06 23:38:27 +00:00
Chris Lattner 10324d0175 rename isStore -> mayStore to more accurately reflect what it captures.
llvm-svn: 45656
2008-01-06 08:36:04 +00:00
Chris Lattner a348f55ec6 Change the 'isStore' inferrer to look for 'SDNPMayStore'
instead of "ISD::STORE".  This allows us to mark target-specific dag
nodes as storing (such as ppc byteswap stores).  This allows us to remove
more explicit isStore flags from the .td files.

Finally, add a warning for when a .td file contains an explicit 
isStore and tblgen is able to infer it.

llvm-svn: 45654
2008-01-06 06:44:58 +00:00
Chris Lattner e20f380fbf remove some isStore flags that are now inferred automatically.
llvm-svn: 45652
2008-01-06 05:53:26 +00:00
Chris Lattner f3ebc3f3d2 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00
Evan Cheng ec271b104c Temporary solution: added a different set of BCTRL_Macho / BCTRL_ELF with right callee-saved defs set for ppc64.
llvm-svn: 43248
2007-10-23 06:42:42 +00:00
Evan Cheng 3e18e504ae Remove (somewhat confusing) Imp<> helper, use let Defs = [], Uses = [] instead.
llvm-svn: 41863
2007-09-11 19:55:27 +00:00
Evan Cheng 4dbd9f254a Fix for PR1613: added 64-bit rotate left PPC instructions and patterns.
llvm-svn: 41711
2007-09-04 20:20:29 +00:00
Evan Cheng 58c3c30921 Some out operands were incorrectly specified as input operands.
llvm-svn: 40697
2007-08-01 23:07:38 +00:00
Evan Cheng ac1591be42 No more noResults.
llvm-svn: 40132
2007-07-21 00:34:19 +00:00
Evan Cheng 9081ab8127 Oops. These stores actually produce results.
llvm-svn: 40074
2007-07-20 00:20:46 +00:00
Evan Cheng 94b5a80b93 Change instruction description to split OperandList into OutOperandList and
InOperandList. This gives one piece of important information: # of results
produced by an instruction.
An example of the change:
def ADD32rr  : I<0x01, MRMDestReg, (ops GR32:$dst, GR32:$src1, GR32:$src2),
                 "add{l} {$src2, $dst|$dst, $src2}",
                 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
=>
def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
                 "add{l} {$src2, $dst|$dst, $src2}",
                 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;

llvm-svn: 40033
2007-07-19 01:14:50 +00:00
Chris Lattner 3e549e9d5f add support for 128-bit add/sub on ppc64
llvm-svn: 37158
2007-05-17 06:52:46 +00:00
Nicolas Geoffray b3e99a18ee The PPC64 ELF ABI is "intended to use the same structure layout and calling convention rules
as the 64-bit PowerOpen ABI" (Reference http://www.linux-foundation.org/spec/ELF/ppc64/).
Change all ELF tests to ELF32.

llvm-svn: 35624
2007-04-03 12:35:28 +00:00
Nicolas Geoffray fbfc451ba9 The ELF ABI specifies F1-F8 registers as argument registers for double, not
F1-F10. This affects only ELF, not MachO.

llvm-svn: 35622
2007-04-03 10:27:07 +00:00
Chris Lattner 8810241ebc Fix CodeGen/PowerPC/2007-03-24-cntlzd.ll
llvm-svn: 35329
2007-03-25 04:44:03 +00:00
Nicolas Geoffray 89d81878d2 Differentiate between the MachO and the ELF ABI the CALL instruction.
llvm-svn: 34667
2007-02-27 13:01:19 +00:00
Chris Lattner 84ab9a556c one important bugfix: PPC32 didn't have both elf and macho support for
external symbols and global addresses.  Add the missing ones.

one important workaround: PPCISD::CALL is matched by both PPCcall_ELF
and PPCcall_Macho, disable the _ELF patterns for now.

llvm-svn: 34601
2007-02-25 19:20:53 +00:00
Chris Lattner 43df5b335c implement support for the linux/ppc function call ABI. Patch by
Nicolas Geoffray!

llvm-svn: 34574
2007-02-25 05:34:32 +00:00
Jim Laskey a0850e98ee Patterns no longer needed due to fix in the DAG combiner.
llvm-svn: 32612
2006-12-15 21:39:31 +00:00
Jim Laskey 73d307d12d Not all test cases are created equal. This fix is needed.
llvm-svn: 32605
2006-12-15 18:51:01 +00:00
Jim Laskey 38b1d53afe Not needed. Misinterpreted error message from other bug (Missing load/store
relocations.)

llvm-svn: 32604
2006-12-15 18:45:32 +00:00
Jim Laskey 36d826dca2 Provide 64-bit support for i64 sextload<i8>.
llvm-svn: 32600
2006-12-15 14:34:11 +00:00
Jim Laskey 095e6f3044 Reduce number of instructions to load 64-bit constants.
llvm-svn: 32481
2006-12-12 13:23:43 +00:00
Chris Lattner 43c0eb839c implement sextinreg i8->i64 and i16->i64
llvm-svn: 32293
2006-12-06 21:46:13 +00:00
Jim Laskey 48850c10c0 This is a general clean up of the PowerPC ABI. Address several problems and
bugs including making sure that the TOS links back to the previous frame,
that the maximum call frame size is not included twice when using frame
pointers, no longer growing the frame on calls, double storing of SP and
a cleaner/faster dynamic alloca.

llvm-svn: 31792
2006-11-16 22:43:37 +00:00
Chris Lattner 30055b9208 fix a regression that I introduced. stdu should scale the offset by 4
before printing it.

llvm-svn: 31791
2006-11-16 21:45:30 +00:00
Chris Lattner e742d9a4b7 add ppc64 r+i stores with update.
llvm-svn: 31776
2006-11-16 00:57:19 +00:00
Chris Lattner 5771156be0 Stop using isTwoAddress, switching to operand constraints instead.
Tell the codegen emitter that specific operands are not to be encoded, fixing
JIT regressions w.r.t. pre-inc loads and stores (e.g. lwzu, which we generate
even when general preinc loads are not enabled).

llvm-svn: 31770
2006-11-15 23:24:18 +00:00
Chris Lattner 474b5b7c95 fix ldu/stu jit encoding. Swith 64-bit preinc load instrs to use memri
addrmodes.

llvm-svn: 31757
2006-11-15 19:55:13 +00:00
Chris Lattner 0e117c7e9d Fix the PPC regressions last night
llvm-svn: 31752
2006-11-15 17:40:51 +00:00
Chris Lattner 44dbdbe5cf Rework PPC64 calls. Now we have a LR8/CTR8 register which the PPC64 calls
clobber.  This allows LR8 to be save/restored correctly as a 64-bit quantity,
instead of handling it as a 32-bit quantity.  This unbreaks ppc64 codegen when
the code is actually located above the 4G boundary.

llvm-svn: 31734
2006-11-14 18:44:47 +00:00
Chris Lattner 0d550cc56c implement proper PPC64 prolog/epilog codegen.
llvm-svn: 31684
2006-11-11 19:05:28 +00:00
Chris Lattner 2ff632c54b Mark operands as symbol lo instead of imm32 so that they print lo(x) around
globals.

llvm-svn: 31672
2006-11-11 04:51:36 +00:00
Chris Lattner c9fa36d706 implement preinc support for r+i loads on ppc64
llvm-svn: 31654
2006-11-10 23:58:45 +00:00
Evan Cheng ab51cf2e78 Merge ISD::TRUNCSTORE to ISD::STORE. Switch to using StoreSDNode.
llvm-svn: 30945
2006-10-13 21:14:26 +00:00
Evan Cheng e71fe34d75 Reflects ISD::LOAD / ISD::LOADX / LoadSDNode changes.
llvm-svn: 30844
2006-10-09 20:57:25 +00:00
Chris Lattner 78370606d0 Shift amounts are always 32-bits, even in 64-bit mode. This fixes
CodeGen/PowerPC/2006-09-28-shift_64.ll

llvm-svn: 30652
2006-09-28 20:48:45 +00:00
Chris Lattner b00b6c2e86 Make the implicit def instructions look like other instrs.
llvm-svn: 29174
2006-07-18 16:33:26 +00:00
Chris Lattner 96aecb5d76 Add missing PPC64 extload/truncstores
llvm-svn: 29140
2006-07-14 04:42:02 +00:00
Chris Lattner ca9c488528 Don't match 64-bit bitfield inserts into rlwimi's. todo add rldimi. :)
llvm-svn: 28944
2006-06-27 21:08:52 +00:00
Chris Lattner a2af3f47ea Add a pattern for i64 sra. Print 8-byte units with a space between the .quad
and the data

llvm-svn: 28934
2006-06-27 20:07:26 +00:00
Chris Lattner 3b5873456e Add 64-bit MTCTR so that indirect calls work.
llvm-svn: 28931
2006-06-27 18:36:44 +00:00
Chris Lattner e27d51e0d8 Fix an incorrect store pattern. This fixes em3d.
llvm-svn: 28930
2006-06-27 18:22:50 +00:00
Chris Lattner d48ce27532 Implement 64-bit undef, sub, shl/shr, srem/urem
llvm-svn: 28929
2006-06-27 18:18:41 +00:00
Chris Lattner f7fd88356a Add zextload from i32 -> i64, with this, perimeter works.
llvm-svn: 28926
2006-06-27 17:30:08 +00:00
Chris Lattner 7ecbd301b1 Rearrange compares, add ADDI8, add sext from 32-to-64 bit register
llvm-svn: 28920
2006-06-26 23:53:10 +00:00
Chris Lattner 52a956da52 Rename OR4 -> OR. Move some PPC64-specific stuff to the 64-bit file
llvm-svn: 28889
2006-06-20 23:18:58 +00:00
Chris Lattner 9d65f3507e add some logical ops
llvm-svn: 28887
2006-06-20 23:11:59 +00:00
Chris Lattner d881f8257b Add some more immediate patterns. This allows us to compile:
void test6() {
  Y = 0xABCD0123BCDE4567;
}

into:

_test6:
        lis r2, -21555
        lis r3, ha16(_Y)
        ori r2, r2, 291
        rldicr r2, r2, 32, 31
        oris r2, r2, 48350
        ori r2, r2, 17767
        std r2, lo16(_Y)(r3)
        blr

llvm-svn: 28885
2006-06-20 23:03:01 +00:00
Chris Lattner 9834ad2fc6 Instead of li/xoris use li/oris. Note that this doesn't work if bit 15 is
set, so disable the pattern in that case.

llvm-svn: 28884
2006-06-20 22:38:59 +00:00
Chris Lattner 7e742e46ac Add some 64-bit logical ops.
Split imm16Shifted into a sext/zext form for 64-bit support.
Add some patterns for immediate formation.  For example, we now compile this:

static unsigned long long Y;
void test3() {
  Y = 0xF0F00F00;
}

into:

_test3:
        li r2, 3840
        lis r3, ha16(_Y)
        xoris r2, r2, 61680
        std r2, lo16(_Y)(r3)
        blr

GCC produces:

_test3:
        li r0,0
        lis r2,ha16(_Y)
        ori r0,r0,61680
        sldi r0,r0,16
        ori r0,r0,3840
        std r0,lo16(_Y)(r2)
        blr

llvm-svn: 28883
2006-06-20 22:34:10 +00:00
Chris Lattner 2d4e8f7e86 Add some patterns for globals, so we can now compile this:
static unsigned long long X, Y;
void test1() {
  X = Y;
}

into:

_test1:
        lis r2, ha16(_Y)
        lis r3, ha16(_X)
        ld r2, lo16(_Y)(r2)
        std r2, lo16(_X)(r3)
        blr

llvm-svn: 28879
2006-06-20 21:23:06 +00:00
Chris Lattner 94d18df658 Add some patterns for ppc64
llvm-svn: 28866
2006-06-20 00:38:36 +00:00
Chris Lattner 638ee4ee15 Upgrade some load/store instructions to use the proper addressing mode stuff.
llvm-svn: 28841
2006-06-16 21:29:41 +00:00
Chris Lattner a5190ae7a9 fix some assumptions that pointers can only be 32-bits. With this, we can
now compile:

static unsigned long X;
void test1() {
  X = 0;
}

into:

_test1:
        lis r2, ha16(_X)
        li r3, 0
        stw r3, lo16(_X)(r2)
        blr

Totally amazing :)

llvm-svn: 28839
2006-06-16 21:01:35 +00:00
Chris Lattner b429983988 Split 64-bit instructions out into a separate .td file
llvm-svn: 28838
2006-06-16 20:22:01 +00:00