Commit Graph

3425 Commits

Author SHA1 Message Date
Krzysztof Parzyszek 2680b53d90 Add registration for PPC-specific passes to allow the IR to be dumped
via -print-after-all.

llvm-svn: 175058
2013-02-13 17:40:07 +00:00
Bill Schmidt 62fe7a5b17 Refine fix to bug 15041.
Thanks to help from Nadav and Hal, I have a more reasonable (and even
correct!) approach.  This specifically penalizes the insertelement
and extractelement operations for the performance hit that will occur
on PowerPC processors.

llvm-svn: 174725
2013-02-08 18:19:17 +00:00
Bill Schmidt b3cece13cf Constrain PowerPC autovectorization to fix bug 15041.
Certain vector operations don't vectorize well with the current
PowerPC implementation.  Element insert/extract performs poorly
without VSX support because Altivec requires going through memory.
SREM, UREM, and VSELECT all produce bad scalar code.

There's a lot of work to do for the cost model before
autovectorization will be tuned well, and this is not an attempt to
address the larger problem.

llvm-svn: 174660
2013-02-07 20:33:57 +00:00
Bill Schmidt ef17c14254 PPC calling convention cleanup.
Most of PPCCallingConv.td is used only by the 32-bit SVR4 ABI.  Rename
things to clarify this.  Also delete some code that's been commented out
for a long time.

llvm-svn: 174526
2013-02-06 17:33:58 +00:00
Jakob Stoklund Olesen 8660a8c0fc Move MRI liveouts to PowerPC return instructions.
llvm-svn: 174409
2013-02-05 18:12:00 +00:00
Jakob Stoklund Olesen bf034dbd32 Avoid using MRI::liveout_iterator for computing VRSAVEs.
The liveout lists are about to be removed from MRI, this is the only
place they were used after register allocation.

Get the live out V registers directly from the return instructions
instead.

llvm-svn: 174399
2013-02-05 17:40:36 +00:00
Benjamin Kramer c35d526489 Disable a couple more vector splat optimizations on PPC.
I didn't see those because the test case used "not grep". FileCheck the test and
XFAIL it, preserving the old optimization, so this can be fixed eventually.

llvm-svn: 174330
2013-02-04 15:52:32 +00:00
Benjamin Kramer 548ffa274a SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors.
This required disabling a PowerPC optimization that did the following:
input:
x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16>
lowered to:
tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8>
x = ADD tmp, tmp

The add now gets folded immediately and we're back at the BUILD_VECTOR we
started from. I don't see a way to fix this currently so I left it disabled
for now.

Fix some trivially foldable X86 tests too.

llvm-svn: 174325
2013-02-04 15:19:18 +00:00
NAKAMURA Takumi 80159432de PPCDarwinAsmPrinter::EmitStartOfAsmFile(): Add checking range in CPUDirectives[].
llvm-svn: 174298
2013-02-04 00:47:38 +00:00
NAKAMURA Takumi 3d591ae0b9 PPCDarwinAsmPrinter::EmitStartOfAsmFile(): Add possible elements in CPUDirectives[].
llvm-svn: 174297
2013-02-04 00:47:33 +00:00
Bill Schmidt cc99a2f61d Add notes about future PowerPC features
llvm-svn: 174232
2013-02-01 23:10:09 +00:00
Bill Schmidt 52742c25ae LLVM enablement for some older PowerPC CPUs
llvm-svn: 174230
2013-02-01 22:59:51 +00:00
Chad Rosier df782d2225 [PEI] Pass the frame index operand number to the eliminateFrameIndex function.
Each target implementation was needlessly recomputing the index.
Part of rdar://13076458

llvm-svn: 174083
2013-01-31 20:02:54 +00:00
Hal Finkel e1df90958d PPC QPX requires a 32-byte aligned stack
On systems which support the QPX vector instructions, the stack must be
32-byte aligned.

llvm-svn: 173993
2013-01-30 23:43:27 +00:00
Hal Finkel b3fc509b23 Initialize hasQPX in PPCSubtarget
This should have gone in with r173973.

llvm-svn: 173984
2013-01-30 22:43:44 +00:00
Hal Finkel efb305e54c Add definitions for the PPC a2q core marked as having QPX available
This is the first commit of a large series which will add support for the
QPX vector instruction set to the PowerPC backend. This instruction set is
used on the IBM Blue Gene/Q supercomputers.

llvm-svn: 173973
2013-01-30 21:17:42 +00:00
Evan Cheng 0e88c7d897 Teach SDISel to combine fsin / fcos into a fsincos node if the following
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755
2013-01-29 02:32:37 +00:00
Hal Finkel 7f9e8d3eaa Add isBGQ method to PPCSubtarget
This function will be used in future commits.

llvm-svn: 173729
2013-01-29 00:22:47 +00:00
Dmitri Gribenko c451bdf9ff Remove unused variables, silences -Wunused-variable
llvm-svn: 173526
2013-01-25 23:17:21 +00:00
Hal Finkel 4e5ca9e578 Initial implementation of PPCTargetTransformInfo
This provides a place to add customized operation cost information and
control some other target-specific IR-level transformations.

The only non-trivial logic in this checkin assigns a higher cost to
unaligned loads and stores (covered by the included test case).

llvm-svn: 173520
2013-01-25 23:05:59 +00:00
Hal Finkel 53f4ba6ce3 More cleanup of PPC register definitions.
Uses the new !add TableGen operator to do more cleanup of the
PPC register definitions.

llvm-svn: 173446
2013-01-25 14:49:10 +00:00
Hal Finkel 41176f43c4 Start cleanup of PPC register definitions using foreach loops.
No functionality change intended.

This captures the first two cases GPR32/64. For the others, we need
an addition operator (if we have one, I've not yet found it).

Based on a suggestion made by Tom Stellard in the AArch64 review!

llvm-svn: 173366
2013-01-24 20:43:18 +00:00
Eli Bendersky f759526983 Fix powerpc test failure - forgot to initialize stack slot size for PPCLinuxMCAsmInfo
llvm-svn: 173275
2013-01-23 17:12:15 +00:00
Eli Bendersky 32aab2216d Clean up assignment of CalleeSaveStackSlotSize: get rid of the default and explicitly set this in every target that needs to change it from the default.
llvm-svn: 173270
2013-01-23 16:22:04 +00:00
Chandler Carruth 1fe21fc0b5 Sort all of the includes. Several files got checked in with mis-sorted
includes.

llvm-svn: 172891
2013-01-19 08:03:47 +00:00
Bill Schmidt dee1ef8f53 This patch fixes PR13626 by providing i128 support in the return
calling convention.  128-bit integers are now properly returned
in GPR3 and GPR4 on PowerPC.

llvm-svn: 172745
2013-01-17 19:34:57 +00:00
Bill Schmidt 6b2940b01e This patch fixes the PPC calling convention to handle returns of
_Complex float and _Complex long double, by simply increasing the
number of floating point registers available for return values.

The test case verifies that the correct registers are loaded.

llvm-svn: 172733
2013-01-17 17:45:19 +00:00
Adhemerval Zanella 1ae2248e14 PowerPC: EH adjustments
This patch adjust the r171506 to make all DWARF enconding pc-relative
for PPC64. It also adds the R_PPC64_REL32 relocation handling in MCJIT
(since the eh_frame will not generate PIC-relative relocation) and also
adds the emission of stubs created by the TTypeEncoding.

llvm-svn: 171979
2013-01-09 17:08:15 +00:00
Eric Christopher e3ab3d0e2c These functions have default arguments of 0 for the last arg. Use
them.

llvm-svn: 171933
2013-01-09 01:57:54 +00:00
Eli Bendersky 4d9ada036c Renamed MCInstFragment to MCRelaxableFragment and added some comments.
No change in functionality.

llvm-svn: 171822
2013-01-08 00:22:56 +00:00
Bill Schmidt 9b1e3e25dc This patch addresses bug 14678 by fixing two problems in medium code model
code generation.  Variables addressed through a GlobalAlias were not being
handled, and variables with available_externally linkage were treated
incorrectly.  The patch contains two new tests to verify the correct code
generation for these cases.

llvm-svn: 171778
2013-01-07 19:29:18 +00:00
Chandler Carruth 664e354de7 Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681
2013-01-07 01:37:14 +00:00
Adhemerval Zanella 9b0b781395 PowerPC: Fix eh_frame relocation for PIC
This patch fixes the PPC eh_frame definitions for the personality and 
frame unwinding for PIC objects. It makes PIC build correctly creates
relative relocations in the '.rela.eh_frame' segments and thus avoiding
a text relocation that generates a DT_TEXTREL segments in link phase.

llvm-svn: 171506
2013-01-04 19:08:13 +00:00
Chandler Carruth 9fb823bbd4 Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Bill Wendling 698e84fc4f Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Hal Finkel 1b5ff08d43 Expand PPC64 atomic load and store
Use of store or load with the atomic specifier on 64-bit types would
cause instruction-selection failures. As with the 32-bit case, these
can use the default expansion in terms of cmp-and-swap.

llvm-svn: 171072
2012-12-25 17:22:53 +00:00
Rafael Espindola fb8ac2df09 Undefine PPC harder.
This was causing a build failure while trying to build on ppc ubuntu 12.10 with
cmake.

llvm-svn: 170668
2012-12-20 05:13:09 +00:00
Benjamin Kramer c5071466d4 PowerPC: Expand VSELECT nodes.
There's probably a better expansion for those nodes than the default for
altivec, but this is better than crashing. VSELECTs occur in loop vectorizer
output.

llvm-svn: 170551
2012-12-19 15:49:14 +00:00
Bill Wendling 3d7b0b8ac7 Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future.
llvm-svn: 170502
2012-12-19 07:18:57 +00:00
Bill Schmidt a4f898448c This patch removes some nondeterminism from direct object file output
for TLS dynamic models on 64-bit PowerPC ELF.  The default sort routine
for relocations only sorts on the r_offset field; but with TLS, there
can be two relocations with the same r_offset.  For PowerPC, this patch
sorts secondarily on descending r_type, which matches the behavior
expected by the linker.

llvm-svn: 170237
2012-12-14 20:28:38 +00:00
Bill Schmidt 9f0b4ec0f5 This patch improves the 64-bit PowerPC InitialExec TLS support by providing
for a wider range of GOT entries that can hold thread-relative offsets.
This matches the behavior of GCC, which was not documented in the PPC64 TLS
ABI.  The ABI will be updated with the new code sequence.

Former sequence:

  ld 9,x@got@tprel(2)
  add 9,9,x@tls

New sequence:

  addis 9,2,x@got@tprel@ha
  ld 9,x@got@tprel@l(9)
  add 9,9,x@tls

Note that a linker optimization exists to transform the new sequence into
the shorter sequence when appropriate, by replacing the addis with a nop
and modifying the base register and relocation type of the ld.

llvm-svn: 170209
2012-12-14 17:02:38 +00:00
Bill Schmidt 9ed4dbcb75 This is another cleanup patch for 64-bit PowerPC TLS processing. I had
some hackery in place that hid my poor use of TblGen, which I've now sorted
out and cleaned up.  No change in observable behavior, so no new test cases.

llvm-svn: 170149
2012-12-13 20:57:10 +00:00
Bill Schmidt 732eb91f05 This is just a clean-up patch that simplifies the initial-exec TLS logic by
avoiding use of machine operand flags.  No change in observable behavior, so
no new test cases.

llvm-svn: 170141
2012-12-13 18:45:54 +00:00
Bill Schmidt 24b8dd6eb7 This patch implements local-dynamic TLS model support for the 64-bit
PowerPC target.  This is the last of the four models, so we now have 
full TLS support.

This is mostly a straightforward extension of the general dynamic model.
I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the
register copy following ADDI_TLSLD_L; otherwise everything above the
ADDIS_DTPREL_HA appeared dead and was removed.

As before, there are new test cases to test the assembly generation, and
the relocations output during integrated assembly.  The expected code
gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll.

There are a couple of things I think can be done more efficiently in the
overall TLS code, so there will likely be a clean-up patch forthcoming;
but for now I want to be sure the functionality is in place.

Bill

llvm-svn: 170003
2012-12-12 19:29:35 +00:00
Evan Cheng 962711ee71 Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I
mention the inline memcpy / memset expansion code is a mess?

This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.

llvm-svn: 169959
2012-12-12 02:34:41 +00:00
Evan Cheng c3d1aca657 - Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term.
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.

llvm-svn: 169954
2012-12-12 01:32:07 +00:00
Bill Schmidt c56f1d34bc This patch implements the general dynamic TLS model for 64-bit PowerPC.
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:

     Instruction                            Relocation            Symbol
  addis ra,r2,x@got@tlsgd@ha           R_PPC64_GOT_TLSGD16_HA       x
  addi  r3,ra,x@got@tlsgd@l            R_PPC64_GOT_TLSGD16_L        x
  bl __tls_get_addr(x@tlsgd)           R_PPC64_TLSGD                x
                                       R_PPC64_REL24           __tls_get_addr
  nop
  <use address in r3>

The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation.  This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr.  Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation.  So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.

Most of the code is pretty straightforward.  I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call.  Something in the 
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations.  This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().

Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.

Comments welcome!

Thanks,
Bill

llvm-svn: 169910
2012-12-11 20:30:11 +00:00
Bill Schmidt ca4a0c9dbd This patch introduces initial-exec model support for thread-local storage
on 64-bit PowerPC ELF.

The patch includes code to handle external assembly and MC output with the
integrated assembler.  It intentionally does not support the "old" JIT.

For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:

 Code sequence            Relocation                  Symbol
  ld 9,x@got@tprel(2)      R_PPC64_GOT_TPREL16_DS      x
  add 9,9,x@tls            R_PPC64_TLS                 x

The register 9 is arbitrary here.  The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x.  It will replace x@tls with the thread-pointer
register (13).

The two test cases verify correct assembly output and relocation output
as just described.

PowerPC-specific selection node variants are added for the two
instructions above:  LD_GOT_TPREL and ADD_TLS.  These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS.  LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.

The rest of the processing is straightforward.

llvm-svn: 169281
2012-12-04 16:18:08 +00:00
Chandler Carruth 802d755533 Sort includes for all of the .h files under the 'lib' tree. These were
missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224
2012-12-04 07:12:27 +00:00
Chandler Carruth ed0881b2a6 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Adhemerval Zanella 812410f2d1 This patch fixes the Altivec addend construction for the fused multiply-add
instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero
result when resulting product is -0.0.

The -0.0 vector addend to vmaddfp is generated by a creating a vector
with full bits sets and then shifting each elements by 31-bits to the
left, resulting in a vector of 0x80000000 (or -0.0 as float).

The 'buildvec_canonicalize.ll' was adjusted to reflect this change and
the 'vec_mul.ll' was complemented with the float vector multiplication
test.

llvm-svn: 168998
2012-11-30 13:05:44 +00:00
Ulrich Weigand 9e159fd7a0 Fix initial frame state on powerpc64.
The createPPCMCAsmInfo routine used PPC::R1 as the initial frame
pointer register, but on PPC64 the 32-bit R1 register does not
have a corresponding DWARF number, causing invalid CIE initial
frame state to be emitted.  Fix by using PPC::X1 instead.

llvm-svn: 168799
2012-11-28 18:21:03 +00:00
Jakob Stoklund Olesen 9de596e650 Remove all references to TargetInstrInfoImpl.
This class has been merged into its super-class TargetInstrInfo.

llvm-svn: 168760
2012-11-28 02:35:17 +00:00
Bill Schmidt e0a68a562b This patch makes medium code model the default for 64-bit PowerPC ELF.
When the CodeGenInfo is to be created for the PPC64 target machine,
a default code-model selection is converted to CodeModel::Medium
provided we are not targeting the Darwin OS.  Defaults for Darwin
are unaffected.

llvm-svn: 168747
2012-11-27 23:36:26 +00:00
Bill Schmidt 34627e3434 This patch implements medium code model support for 64-bit PowerPC.
The default for 64-bit PowerPC is small code model, in which TOC entries
must be addressable using a 16-bit offset from the TOC pointer.  Additionally,
only TOC entries are addressed via the TOC pointer.

With medium code model, TOC entries and data sections can all be addressed
via the TOC pointer using a 32-bit offset.  Cooperation with the linker
allows 16-bit offsets to be used when these are sufficient, reducing the
number of extra instructions that need to be executed.  Medium code model
also does not generate explicit TOC entries in ".section toc" for variables
that are wholly internal to the compilation unit.

Consider a load of an external 4-byte integer.  With small code model, the
compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

With medium model, it instead generates:

	addis 3, 2, .LC1@toc@ha
	ld 3, .LC1@toc@l(3)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the
32-bit offset of ei's TOC entry from the TOC base pointer.  Similarly,
.LC1@toc@l is a relocation requesting the lower 16 bits.  Note that if
the linker determines that ei's TOC entry is within a 16-bit offset of
the TOC base pointer, it will replace the "addis" with a "nop", and
replace the "ld" with the identical "ld" instruction from the small
code model example.

Consider next a load of a function-scope static integer.  For small code
model, the compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc test_fn_static.si[TC],test_fn_static.si
	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

For medium code model, the compiler generates:

	addis 3, 2, test_fn_static.si@toc@ha
	addi 3, 3, test_fn_static.si@toc@l
	lwz 4, 0(3)

	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

Again, the linker may replace the "addis" with a "nop", calculating only
a 16-bit offset when this is sufficient.

Note that it would be more efficient for the compiler to generate:

	addis 3, 2, test_fn_static.si@toc@ha
        lwz 4, test_fn_static.si@toc@l(3)

The current patch does not perform this optimization yet.  This will be
addressed as a peephole optimization in a later patch.

For the moment, the default code model for 64-bit PowerPC will remain the
small code model.  We plan to eventually change the default to medium code
model, which matches current upstream GCC behavior.  Note that the different
code models are ABI-compatible, so code compiled with different models will
be linked and execute correctly.

I've tested the regression suite and the application/benchmark test suite in
two ways:  Once with the patch as submitted here, and once with additional
logic to force medium code model as the default.  The tests all compile
cleanly, with one exception.  The mandel-2 application test fails due to an
unrelated ABI compatibility with passing complex numbers.  It just so happens
that small code model was incredibly lucky, in that temporary values in 
floating-point registers held the expected values needed by the external
library routine that was called incorrectly.  My current thought is to correct
the ABI problems with _Complex before making medium code model the default,
to avoid introducing this "regression."

Here are a few comments on how the patch works, since the selection code
can be difficult to follow:

The existing logic for small code model defines three pseudo-instructions:
LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for
constant pool addresses.  These are expanded by SelectCodeCommon().  The
pseudo-instruction approach doesn't work for medium code model, because
we need to generate two instructions when we match the same pattern.
Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY
node for medium code model, and generates an ADDIStocHA followed by either
a LDtocL or an ADDItocL.  These new node types correspond naturally to
the sequences described above.

The addis/ld sequence is generated for the following cases:
 * Jump table addresses
 * Function addresses
 * External global variables
 * Tentative definitions of global variables (common linkage)

The addis/addi sequence is generated for the following cases:
 * Constant pool entries
 * File-scope static global variables
 * Function-scope static variables

Expanding to the two-instruction sequences at select time exposes the
instructions to subsequent optimization, particularly scheduling.

The rest of the processing occurs at assembly time, in
PPCAsmPrinter::EmitInstruction.  Each of the instructions is converted to
a "real" PowerPC instruction.  When a TOC entry needs to be created, this
is done here in the same manner as for the existing LDtoc, LDtocJTI, and
LDtocCPT pseudo-instructions (I factored out a new routine to handle this).

I had originally thought that if a TOC entry was needed for LDtocL or
ADDItocL, it would already have been generated for the previous ADDIStocHA.
However, at higher optimization levels, the ADDIStocHA may appear in a 
different block, which may be assembled textually following the block
containing the LDtocL or ADDItocL.  So it is necessary to include the
possibility of creating a new TOC entry for those two instructions.

Note that for LDtocL, we generate a new form of LD called LDrs.  This
allows specifying the @toc@l relocation for the offset field of the LD
instruction (i.e., the offset is replaced by a SymbolLo relocation).
When the peephole optimization described above is added, we will need
to do similar things for all immediate-form load and store operations.

The seven "mcm-n.ll" test cases are kept separate because otherwise the
intermingling of various TOC entries and so forth makes the tests fragile
and hard to understand.

The above assumes use of an external assembler.  For use of the
integrated assembler, new relocations are added and used by
PPCELFObjectWriter.  Testing is done with "mcm-obj.ll", which tests for
proper generation of the various relocations for the same sequences
tested with the external assembler.

llvm-svn: 168708
2012-11-27 17:35:46 +00:00
Benjamin Kramer ebf576d31d Decouple MCInstBuilder from the streamer per Eli's request.
llvm-svn: 168597
2012-11-26 18:05:52 +00:00
Benjamin Kramer 4e629f7964 Add MCInstBuilder, a utility class to simplify MCInst creation similar to MachineInstrBuilder.
Simplify some repetitive code with it. No functionality change.

llvm-svn: 168587
2012-11-26 13:34:22 +00:00
Benjamin Kramer 1694f91266 PPC: Reinstate the fatal error when trying to emit a macho file.
llvm-svn: 168543
2012-11-24 15:23:49 +00:00
Benjamin Kramer dd76a93af5 PPC: MCize most of the darwin PIC emission.
The last remaining bit is "bcl 20, 31, AnonSymbol", which I couldn't find the
instruction definition for. Only whitespace changes in assembly output.

llvm-svn: 168541
2012-11-24 13:18:25 +00:00
Benjamin Kramer 8abfec3967 PPC: Share applyFixup between ELF and Darwin.
llvm-svn: 168540
2012-11-24 13:18:17 +00:00
Benjamin Kramer 69dc3fe461 PPC: Simplify code with Twines.
No functionality change.

llvm-svn: 168539
2012-11-24 13:18:11 +00:00
Joe Abbey cceda898b8 Using const cast to alleviate a warning.
A PR is being filed to address some code issues here.

llvm-svn: 168185
2012-11-16 19:38:42 +00:00
Adhemerval Zanella bdface5699 PowerPC: Lowering floor intrinsic for Altivec
This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and
llvm.nearbyint to Altivec instruction when using 4 single-precision
float vectors.

llvm-svn: 168086
2012-11-15 20:56:03 +00:00
Craig Topper c8a2adf1ca Make a bunch of floating point operations on vectors Expand so that instruction selection won't fail.
llvm-svn: 168028
2012-11-15 08:02:19 +00:00
Craig Topper 61d045781a Add llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint intrinsics.
llvm-svn: 168025
2012-11-15 06:51:10 +00:00
Craig Topper c4343f2c45 Set FFLOOR of vectors to expand to keep intruction selection from failing.
llvm-svn: 167922
2012-11-14 08:11:25 +00:00
Ulrich Weigand 8557850053 Add (some) PowerPC TLS relocation types to ELF.h and
generate them from PPCELFObjectWriter::getRelocTypeInner
as appropriate.

llvm-svn: 167864
2012-11-13 19:24:36 +00:00
Ulrich Weigand 0f79500af5 Fix wrong PowerPC instruction opcodes for:
- lwaux
 - lhzux
 - stbu

llvm-svn: 167863
2012-11-13 19:21:31 +00:00
Ulrich Weigand a82389b3d0 Fix wrong PowerPC instruction encodings due to
operand field name mismatches in:
 - AForm_3  (fmul, fmuls)
 - XFXForm_5 (mtcrf)
 - XFLForm (mtfsf)

llvm-svn: 167862
2012-11-13 19:19:46 +00:00
Ulrich Weigand 0117718580 Fix instruction encoding for "bd(n)z" on PowerPC,
by using a new instruction format BForm_1.

llvm-svn: 167861
2012-11-13 19:15:52 +00:00
Ulrich Weigand 84ee76acfe Fix instruction encoding for "isel" on PowerPC,
using a new instruction format AForm_4.

llvm-svn: 167860
2012-11-13 19:14:19 +00:00
Ulrich Weigand 2c93acdfbf Make TOC order deterministic by using MapVector instead of DenseMap.
llvm-svn: 167737
2012-11-12 19:13:24 +00:00
Ulrich Weigand 339d0597d3 On PowerPC64, integer return values (as well as arguments) are supposed
to be extended to a full register.   This is modeled in the IR by marking
the return value (or argument) with a signext or zeroext attribute.

However, while these attributes are respected for function arguments,
they are currently ignored for function return values by the PowerPC
back-end.  This patch updates PPCCallingConv.td to ask for the promotion
to i64, and fixes LowerReturn and LowerCallResult to implement it.

The new test case verifies that both arguments and return values are
properly extended when passing them; and also that the optimizers
understand incoming argument and return values are in fact guaranteed
by the ABI to be extended.

The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll,
since the test case used a "ret" instruction to create a use of an i32
value at the end of the function (to set up data flow as required for
what the test is intended to test).  Since there's now an implicit
promotion to i64, that data flow no longer works as expected.  To fix
this, this patch now adds an extra "add" to ensure we have an appropriate
use of the i32 value.

llvm-svn: 167396
2012-11-05 19:39:45 +00:00
Hal Finkel 4f24c621d9 Add support for the PowerPC-specific inline asm Z constraint and y modifier.
The Z constraint specifies an r+r memory address, and the y modifier expands
to the "r, r" in the asm string. For this initial implementation, the base
register is forced to r0 (which has the special meaning of 0 for r+r addressing
on PowerPC) and the full address is taken in the second register. In the
future, this should be improved.

llvm-svn: 167388
2012-11-05 18:18:42 +00:00
Adhemerval Zanella c4182d1890 [PATCH] PowerPC: Expand load extend vector operations
This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for
vector types when altivec is enabled.

llvm-svn: 167386
2012-11-05 17:15:56 +00:00
Chandler Carruth 5da3f0512e Revert the majority of the next patch in the address space series:
r165941: Resubmit the changes to llvm core to update the functions to
         support different pointer sizes on a per address space basis.

Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.

However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.

In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.

In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.

This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.

llvm-svn: 167222
2012-11-01 09:14:31 +00:00
Chandler Carruth 7ec5085e01 Revert the series of commits starting with r166578 which introduced the
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.

These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.

Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)

After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.

Summary of reverted revisions:

r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
         Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
         since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
         on the address space.
llvm-svn: 167221
2012-11-01 08:07:29 +00:00
Bill Schmidt 9953cf294b This patch addresses an ABI compatibility issue with empty aggregate
parameters.  Examples of these are:

  struct { } a;
  union { } b[256];
  int a[0];

An empty aggregate has an address, although dereferencing that address is
pointless.  When passed as a parameter, an empty aggregate does not consume
a protocol register, nor does it consume a doubleword in the parameter save
area.  Passing an empty aggregate by reference passes an address just as
for any other aggregate.  Returning an empty aggregate uses GPR3 as a hidden
address of the return value location, just as for any other aggregate.

The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and
PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate
parameters passed by value.  The handling of return values and by-reference
parameters was already correct.

Built on powerpc64-unknown-linux-gnu and tested with no new regressions.
A test case is included to test proper handling of empty aggregate
parameters on both sides of the function call protocol.

llvm-svn: 167090
2012-10-31 01:15:05 +00:00
Adhemerval Zanella 5c043aeb1b PowerPC: Expand FSRQT for vector types
This patch expands FSQRT for floating point vector types when altivec is
used.

llvm-svn: 167034
2012-10-30 18:29:42 +00:00
Adhemerval Zanella 56775e0f13 PowerPC: More support for Altivec compare operations
This patch adds more support for vector type comparisons using altivec.
It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector
types for comparison operators ==, !=, >, >=, <, and <=.

llvm-svn: 167015
2012-10-30 13:50:19 +00:00
Bill Schmidt bd4ac26973 This patch solves a problem with passing varargs parameters under the PPC64
ELF ABI.

A varargs parameter consisting of a single-precision floating-point value,
or of a single-element aggregate containing a single-precision floating-point
value, must be passed in the low-order (rightmost) four bytes of the
doubleword stack slot reserved for that parameter.  If there are GPR protocol
registers remaining, the parameter must also be mirrored in the low-order
four bytes of the reserved GPR.

Prior to this patch, such parameters were being passed in the high-order
four bytes of the stack slot and the mirrored GPR.

The patch adds a new test case to verify the correct code generation.

llvm-svn: 166968
2012-10-29 21:18:16 +00:00
Ulrich Weigand 0de4a1e4ae Allow i32/i64 for 'f' constraint on PowerPC.
This fixes PR12757.

llvm-svn: 166943
2012-10-29 17:49:34 +00:00
NAKAMURA Takumi 4bd79920be PPCSubtarget.h: Add explicit braces.
llvm-svn: 166932
2012-10-29 15:51:42 +00:00
NAKAMURA Takumi 70b25de24e PPCSubtarget.h: Whitespace.
llvm-svn: 166931
2012-10-29 15:51:35 +00:00
Bill Schmidt bbc661e572 This patch adds alignment information for long double to the 64-bit PowerPC
ELF subtarget.

The existing logic is used as a fallback to avoid any changes to the Darwin
ABI.  PPC64 ELF now has two possible data layout strings: one for FreeBSD,
which requires 8-byte alignment, and a default string that requires
16-byte alignment.

I've added a test for PPC64 Linux to verify the 16-byte alignment.  If
somebody wants to add a separate test for FreeBSD, that would be great.

Note that there is a companion patch to update the alignment information
in Clang, which I am committing now as well.

llvm-svn: 166928
2012-10-29 14:59:36 +00:00
Adhemerval Zanella 0f9cff1ab8 PowerPC: Fix for rldcl/rldicl/rldicr MC emission
This patch fixes the rldcl/rldicl/rldicr instruction emission. The issue is
the MDForm_1 instruction defines the PowerISA MB field from 'rldicl'
with the name MBE, but RLDCL/RLDICL/RLDICR definition uses as 'MB'.

It end up by generatint the 'rldicl' enconding at 
'lib/Target/PowerPC/PPCGenMCCodeEmitter.inc' to use the fourth argument as the
third. The patch changes it by adjusting to use the fourth argument as
intended.

Fixes PR14180.

llvm-svn: 166770
2012-10-26 12:09:58 +00:00
Adhemerval Zanella 1be10dc732 This patch fixes the MC object emission of 'nop' for external function calls
and also fixes the R_PPC64_TOC16 and R_PPC64_TOC16_DS relocation offset.
The 'nop' is needed so a restore TOC instruction (ld r2,40(r1)) can be placed
by the linker to correct restore the TOC of previous function.

Current code has two issues: it defines in PPCInstr64Bit.td file a LDinto_toc
and LDtoc_restore as a DSForm_1 with DS_RA=0 where it should be
DS=2 (the 8 bytes displacement of the TOC saving). It also wrongly emits a
MC intruction using an uint32_t value while the PPC::BL8_NOP_ELF
and PPC::BLA8_NOP_ELF are both uint64_t (because of the following 'nop').

This patch corrects the remaining ExecutionEngine using MCJIT:

ExecutionEngine/2002-12-16-ArgTest.ll
ExecutionEngine/2003-05-07-ArgumentTest.ll
ExecutionEngine/2005-12-02-TailCallBug.ll
ExecutionEngine/hello.ll
ExecutionEngine/hello2.ll
ExecutionEngine/test-call.ll

llvm-svn: 166682
2012-10-25 14:29:13 +00:00
Bill Schmidt 6ed3b99f43 This patch addresses a PPC64 ELF issue with passing parameters consisting of
structs having size 3, 5, 6, or 7.  Such a struct must be passed and received
as right-justified within its register or memory slot.  The problem is only
present for structs that are passed in registers.

Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register.  This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot.  Essentially I had accidentally
accounted for the right-adjustment twice.

In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.

The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this.  I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.

I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions.  I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.

llvm-svn: 166680
2012-10-25 13:38:09 +00:00
Adhemerval Zanella f2aceda854 Initial TOC support for PowerPC64 object creation
This patch adds initial PPC64 TOC MC object creation using the small mcmodel
(a single 64K TOC) adding the some TOC relocations (R_PPC64_TOC,
R_PPC64_TOC16, and R_PPC64_TOC16DS).

The addition of 'undefinedExplicitRelSym' hook on 'MCELFObjectTargetWriter'
is meant to avoid the creation of an unreferenced ".TOC." symbol (used in
the .odp creation) as well to set the R_PPC64_TOC relocation target as the
temporary ".TOC." symbol. On PPC64 ABI, the R_PPC64_TOC relocation should
not point to any symbol.

llvm-svn: 166677
2012-10-25 12:27:42 +00:00
Nadav Rotem 2289f2c932 Implement a basic VectorTargetTransformInfo interface to be used by the loop and bb vectorizers for modeling the cost of instructions.
llvm-svn: 166593
2012-10-24 17:22:41 +00:00
Micah Villmow 12d9127833 Add in support for getIntPtrType to get the pointer type based on the address space.
This checkin also adds in some tests that utilize these paths and updates some of the
clients.

llvm-svn: 166578
2012-10-24 15:52:52 +00:00
Bill Schmidt 57d6de5fd9 This is another TLC patch for separating code for the Darwin and ELF ABIs
for the PowerPC target, and factoring the results.  This will ease future
maintenance of both subtargets.

PPCTargetLowering::LowerCall_Darwin_Or_64SVR4() has grown a lot of special-case
code for the different ABIs, making maintenance difficult.  This is getting
worse as we repair errors in the 64-bit ELF ABI implementation, while avoiding
changes to the Darwin ABI logic.  This patch splits the routine into
LowerCall_Darwin() and LowerCall_64SVR4(), allowing both versions to be
significantly simplified.  I've factored out chunks of similar code where it
made sense to do so.  I also performed similar factoring on
LowerFormalArguments_Darwin() and LowerFormalArguments_64SVR4().

There are no functional changes in this patch, and therefore no new test
cases have been developed.

Built and tested on powerpc64-unknown-linux-gnu with no new regressions.

llvm-svn: 166480
2012-10-23 15:51:16 +00:00
Nadav Rotem 5dc203e8f4 Reapply the TargerTransformInfo changes, minus the changes to LSR and Lowerinvoke.
llvm-svn: 166248
2012-10-18 23:22:48 +00:00
Ulrich Weigand d34b5bd610 This patch fixes failures in the SingleSource/Regression/C/uint64_to_float
test case on PowerPC caused by rounding errors when converting from a 64-bit
integer to a single-precision floating point. The reason for this are
double-rounding effects, since on PowerPC we have to convert to an
intermediate double-precision value first, which gets rounded to the
final single-precision result.

The patch fixes the problem by preparing the 64-bit integer so that the
first conversion step to double-precision will always be exact, and the
final rounding step will result in the correctly-rounded single-precision
result.  The generated code sequence is equivalent to what GCC would generate.

When -enable-unsafe-fp-math is in effect, that extra effort is omitted
and we accept possible rounding errors (just like GCC does as well).

llvm-svn: 166178
2012-10-18 13:16:11 +00:00
Bob Wilson d6d9ccca38 Temporarily revert the TargetTransform changes.
The TargetTransform changes are breaking LTO bootstraps of clang.  I am
working with Nadav to figure out the problem, but I am reverting it for now
to get our buildbots working.

This reverts svn commits: 165665 165669 165670 165786 165787 165997
and I have also reverted clang svn 165741

llvm-svn: 166168
2012-10-18 05:43:52 +00:00
Bill Schmidt 48081cad0d This patch addresses PR13949.
For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8
bytes are to be passed in the low-order bits ("right-adjusted") of the
doubleword register or memory slot assigned to them.  A previous patch
addressed this for aggregates passed in registers.  However, small
aggregates passed in the overflow portion of the parameter save area are
still being passed left-adjusted.

The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the
caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on
the callee side.  The main fix on the callee side simply extends
existing logic for 1- and 2-byte objects to 1- through 7-byte objects,
and correcting a constant left over from 32-bit code.  There is also a
fix to a bogus calculation of the offset to the following argument in
the parameter save area.

On the caller side, again a constant left over from 32-bit code is
fixed.  Additionally, some code for 1, 2, and 4-byte objects is
duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only.  The
LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to
handle both ABIs, and I propose to separate this into two functions in a
future patch, at which time the duplication can be removed.

The patch adds a new test (structsinmem.ll) to demonstrate correct
passing of structures of all seven sizes.  Eight dummy parameters are
used to force these structures to be in the overflow portion of the
parameter save area.

As a side effect, this corrects the case when aggregates passed in
registers are saved into the first eight doublewords of the parameter
save area:  Previously they were stored left-justified, and now are
properly stored right-justified.  This requires changing the expected
output of existing test case structsinregs.ll.

llvm-svn: 166022
2012-10-16 13:30:53 +00:00
Micah Villmow 4bb926d91d Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis.
llvm-svn: 165941
2012-10-15 16:24:29 +00:00
Adhemerval Zanella ef206f19a4 PowerPC: add EmitTCEntry class for TOC creation
This patch replaces the EmitRawText by a EmitTCEntry class (specialized for
each Streamer) in PowerPC64 TOC entry creation.

llvm-svn: 165940
2012-10-15 15:43:14 +00:00
Micah Villmow 0c61134d8d Revert 165732 for further review.
llvm-svn: 165747
2012-10-11 21:27:41 +00:00
Micah Villmow 083189730e Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly.
llvm-svn: 165726
2012-10-11 17:21:41 +00:00
Bill Schmidt 22162470ba This patch addresses PR13947.
For function calls on the 64-bit PowerPC SVR4 target, each parameter
is mapped to as many doublewords in the parameter save area as
necessary to hold the parameter.  The first 13 non-varargs
floating-point values are passed in registers; any additional
floating-point parameters are passed in the parameter save area.  A
single-precision floating-point parameter (32 bits) must be mapped to
the second (rightmost, low-order) word of its assigned doubleword
slot.

Currently LLVM violates this ABI requirement by mapping such a
parameter to the first (leftmost, high-order) word of its assigned
doubleword slot.  This is internally self-consistent but will not
interoperate correctly with libraries compiled with an ABI-compliant
compiler.

This patch corrects the problem by adjusting the parameter addressing
on both sides of the calling convention.

llvm-svn: 165714
2012-10-11 15:38:20 +00:00
Nadav Rotem e10328737d Add a new interface to allow IR-level passes to access codegen-specific information.
llvm-svn: 165665
2012-10-10 22:04:55 +00:00
Bill Schmidt b9bc47409d When generating spill and reload code for vector registers on PowerPC,
the compiler makes use of GPR0.  However, there are two flavors of
GPR0 defined by the target:  the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0).  The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.

This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.

llvm-svn: 165658
2012-10-10 21:25:01 +00:00
Bill Schmidt 38d9458720 The PowerPC VRSAVE register has been somewhat of an odd beast since
the Altivec extensions were introduced.  Its use is optional, and
allows the compiler to communicate to the operating system which
vector registers should be saved and restored during a context switch.
In practice, this information is ignored by the various operating
systems using the SVR4 ABI; the kernel saves and restores the entire
register state.  Setting the VRSAVE register is no longer performed by
the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux
systems.  It seems best to avoid this logic within LLVM as well.

This patch avoids generating code to update and restore VRSAVE for the
PowerPC SVR4 ABIs (32- and 64-bit).  The code remains in place for the
Darwin ABI.

llvm-svn: 165656
2012-10-10 20:54:15 +00:00
Bill Wendling c9b22d735a Create enums for the different attributes.
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488
2012-10-09 07:45:08 +00:00
Adhemerval Zanella fe3f793cec PR12716: PPC crashes on vector compare
Vector compare using altivec 'vcmpxxx' instructions have as third argument
a vector register instead of CR one, different from integer and float-point
compares. This leads to a failure in code generation, where 'SelectSETCC'
expects a DAG with a CR register and gets vector register instead.

This patch changes the behavior by just returning a DAG with the 
vector compare instruction based on the type. The patch also adds a testcase
for all vector types llvm defines.

It also included a fix on signed 5-bits predicates printing, where
signed values were not handled correctly as signed (char are unsigned by
default for PowerPC). This generates 'vspltisw' (vector splat)
instruction with SIM out of range.

llvm-svn: 165419
2012-10-08 18:59:53 +00:00
Adhemerval Zanella 22b9fd2f2e PowerPC: Fix object creation with PPC::MTCRF8 instruction
llvm-svn: 165411
2012-10-08 18:25:11 +00:00
Adhemerval Zanella 5c6e08435e Add floating-point to and from integer conversion
This patch add altivec support for v4i32 to v4f32 and for v4f32 to
v4i32 vector rounding conversion.

llvm-svn: 165409
2012-10-08 17:27:24 +00:00
Micah Villmow cdfe20b97f Move TargetData to DataLayout.
llvm-svn: 165402
2012-10-08 16:38:25 +00:00
Bill Schmidt d1fa36f903 This patch splits apart PPCISelLowering::LowerFormalArguments_Darwin_Or_64SVR4
into separate versions for the Darwin and 64-bit SVR4 ABIs.  This will
facilitate doing more major surgery on the 64-bit SVR4 ABI in the near future.

llvm-svn: 165336
2012-10-05 21:27:08 +00:00
Will Schmidt 314c6c4c2b - Mark the BCC and BLR defs as isCodeGenOnly per error output from
llvm-tblgen -gen-asm-matcher.

 PPCInstrInfo.td |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

llvm-svn: 165315
2012-10-05 15:16:11 +00:00
Will Schmidt 4a67f2e2a7 - add tokens to PPCInstrInfo.td and PPCInstr64Bit.td to resolve
"Instruction 'foo' has no tokens" errors during llvm-tblgen
-gen-asm-matcher attempts.  At this time, the added
tokens are "#comment" style rather than the actual mnemonic.  This will
be revisited once the rest of the base asmparser bits get straightened
out for ppc64-elf-linux.

llvm-svn: 165237
2012-10-04 18:14:28 +00:00
Will Schmidt 2247f8afbd test commit / whitespace
llvm-svn: 165233
2012-10-04 16:20:24 +00:00
Sylvestre Ledru 91ce36c986 Revert 'Fix a typo 'iff' => 'if''. iff is an abreviation of if and only if. See: http://en.wikipedia.org/wiki/If_and_only_if Commit 164767
llvm-svn: 164768
2012-09-27 10:14:43 +00:00
Sylvestre Ledru 721cffd53a Fix a typo 'iff' => 'if'
llvm-svn: 164767
2012-09-27 09:59:43 +00:00
Bill Wendling 863bab689a Remove the `hasFnAttr' method from Function.
The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725
2012-09-26 21:48:26 +00:00
Roman Divacky ca10389bfe Specify MachinePointerInfo as refering to the argument value and offset of the
store when handling byval arguments. Thus preventing reordering of the store
with load with post-RA scheduler.

llvm-svn: 164553
2012-09-24 20:47:19 +00:00
Bill Schmidt 019cc6fe03 Small structs for PPC64 SVR4 must be passed right-justified in registers.
lib/Target/PowerPC/PPCISelLowering.{h,cpp}
 Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4.
 Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4.
 Rename LowerCall_SVR4 to LowerCall_32SVR4.
 Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4.

test/CodeGen/PowerPC/structsinregs.ll
 New test.

llvm-svn: 164228
2012-09-19 15:42:13 +00:00
Roman Divacky 09adf3decc Fix the isLocalCall() by checking for linker weakness as well.
llvm-svn: 164155
2012-09-18 18:27:49 +00:00
Roman Divacky 0be33598ce Avoid symbol name clash when filling TOC.
Patch by Adhemerval Zanella.

llvm-svn: 164141
2012-09-18 17:10:37 +00:00
Roman Divacky d4f6f421a9 On PPC64 emit the environment pointer. Patch by Adhemerval Zanella.
llvm-svn: 164139
2012-09-18 16:55:29 +00:00
Roman Divacky 762930637c Optimize local func calls to not emit nop for TOC restoration.
Patch by Adhemerval Zanella.

llvm-svn: 164138
2012-09-18 16:47:58 +00:00
Roman Divacky 5dd4ccb402 When creating MCAsmBackend pass the CPU string as well. In X86AsmBackend
store this and use it to not emit long nops when the CPU is geode which
doesnt support them.

Fixes PR11212.

llvm-svn: 164132
2012-09-18 16:08:49 +00:00
Craig Topper 462c31b3da Change unsigned to uint32_t to match base class declaration and other targets.
llvm-svn: 164001
2012-09-16 18:10:23 +00:00
Craig Topper a60c0f1163 Use LLVM_DELETED_FUNCTION in place of 'DO NOT IMPLEMENT' comments.
llvm-svn: 163974
2012-09-15 17:09:36 +00:00
Michael Liao abb87d4857 Fix PR11985
- BlockAddress has no support of BA + offset form and there is no way to
  propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
  simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
  support BA + offset addressing.

llvm-svn: 163743
2012-09-12 21:43:09 +00:00
Roman Divacky 10a448d45a Enable exceptions handling on PPC64 now that cr misaligned spilling
was fixed in r163713.

llvm-svn: 163715
2012-09-12 15:29:32 +00:00
Roman Divacky c9e23d93ae This patch corrects logic in PPCFrameLowering for save and restore of
nonvolatile condition register fields across calls under the SVR4 ABIs.                                            
                                                                                                                   
 * With the 64-bit ABI, the save location is at a fixed offset of 8 from                                           
the stack pointer.  The frame pointer cannot be used to access this                                                
portion of the stack frame since the distance from the frame pointer may                                           
change with alloca calls.                                                                                          
                                                                                                                   
 * With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas.  This is an optional slot, so it must only be created                                           
if any of CR2, CR3, and CR4 were modified.                                                                      
                                                                                                                   
 * For both ABIs, save/restore logic is generated only if one of the     
nonvolatile CR fields were modified.                                   

I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h.  Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.


Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.

Patch by William J. Schmidt!

llvm-svn: 163713
2012-09-12 14:47:47 +00:00
Benjamin Kramer 47f9ec92cb MC: Overhaul handling of .lcomm
- Darwin lied about not supporting .lcomm and turned it into zerofill in the
  asm parser. Push the zerofill-conversion down into macho-specific code.
- This makes the tri-state LCOMMType enum superfluous, there are no targets
  without .lcomm.
- Do proper error reporting when trying to use .lcomm with alignment on a target
  that doesn't support it.
- .comm and .lcomm alignment was parsed in bytes on COFF, should be power of 2.
- Fixes PR13755 (.lcomm crashes on ELF).

llvm-svn: 163395
2012-09-07 17:25:13 +00:00
Hal Finkel efe4a44106 Move the PPC TOC defs into the PPC64 InstrInfo file.
Since TOC is just defined for PPC64, move its definition to PPC64 td file.

Patch by Adhemerval Zanella.

llvm-svn: 163234
2012-09-05 19:22:27 +00:00
Roman Divacky 2be394bdcd Remove always true checks. Noticed by Adhemerval Zanella.
llvm-svn: 163117
2012-09-03 16:55:42 +00:00
NAKAMURA Takumi ac49029fd9 PPCISelLowering.cpp: Fix r162725.
[Tobias von Koch] What's happening here is that the CR6SET/CR6UNSET is breaking the chain of register copies glued to the function call (BL_SVR4 node). The scheduler then moves other instructions in between those and the function call, which isn't good!

Right. That's the case where there is no chain of register copies before the call, so InFlag == 0... Attached is a new revision of the patch which should fix this for good.

llvm-svn: 162916
2012-08-30 15:52:29 +00:00
NAKAMURA Takumi 8ad54e04d2 PPCISelLowering.cpp: Whitespace.
llvm-svn: 162915
2012-08-30 15:52:23 +00:00
Hal Finkel 1859d26528 Reserve space for the mandatory traceback fields on PPC64.
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.

Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields.  GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function.  DWARF information is used for everything
else.  We need the extra 8 bytes of pad so the size field is found in
the right place.

As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest.  IBM's proprietary OSes do
make use of the full traceback table facility.

Patch by Bill Schmidt.

llvm-svn: 162854
2012-08-29 20:22:24 +00:00
Roman Divacky 8c4b6a307e Emit word of zeroes after the last instruction as a start of the mandatory
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.

llvm-svn: 162778
2012-08-28 19:06:55 +00:00
Hal Finkel 742b535e40 Add PPC Freescale e500mc and e5500 subtargets.
Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

llvm-svn: 162764
2012-08-28 16:12:39 +00:00
Hal Finkel 679c73cb33 Split several PPC instruction classes.
Slight reorganisation of PPC instruction classes for scheduling. No
functionality change for existing subtargets.
 - Clearly separate load/store-with-update instructions from regular loads and stores.
 - Split IntRotateD -> IntRotateD and IntRotateDI
 - Split out fsub and fadd from FPGeneral -> FPAddSub
 - Update existing itineraries

Patch by Tobias von Koch.

llvm-svn: 162729
2012-08-28 02:49:14 +00:00
Hal Finkel 686f2ee226 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727
2012-08-28 02:10:33 +00:00
Hal Finkel 5ab378037f Eliminate redundant CR moves on PPC32.
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

llvm-svn: 162725
2012-08-28 02:10:27 +00:00
Hal Finkel e39526a789 Optimize zext on PPC64.
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

llvm-svn: 162724
2012-08-28 02:10:15 +00:00
Richard Smith 228e6d4cf3 Fix integer undefined behavior due to signed left shift overflow in LLVM.
Reviewed offline by chandlerc.

llvm-svn: 162623
2012-08-24 23:29:28 +00:00
Roman Divacky ace4707ea6 Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.

llvm-svn: 162562
2012-08-24 16:26:02 +00:00
Jakob Stoklund Olesen a954e92053 Add missing SDNPSideEffect flags.
llvm-svn: 162557
2012-08-24 14:43:27 +00:00
Roman Divacky 2039a987c4 Revert r162034, r162035 and r162037.
llvm-svn: 162039
2012-08-16 19:07:59 +00:00
Roman Divacky 9d38fc8ddc Define and handle additional fixup kinds. By Adhemerval Zanella.
llvm-svn: 162037
2012-08-16 18:37:52 +00:00
Roman Divacky 1faf5b07c6 Fix typo and grammar. By Adhemerval Zanella.
llvm-svn: 162032
2012-08-16 18:19:29 +00:00
Jakob Stoklund Olesen 978c1280a5 Don't use getNextOperandForReg().
This way of using getNextOperandForReg() was unlikely to work as
intended. We don't give any guarantees about the order of operands in
the use-def chains, so looking only at operands following a given
operand in the chain doesn't make sense.

llvm-svn: 161542
2012-08-08 23:44:04 +00:00
Hal Finkel 895a5f5d12 Add a comment about mftb vs. mfspr on PPC.
Thanks to Alex Rosenberg for the suggestion.

llvm-svn: 161428
2012-08-07 17:04:20 +00:00
Hal Finkel 33e529d56b MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

llvm-svn: 161346
2012-08-06 21:21:44 +00:00
Hal Finkel 70381a7b18 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

llvm-svn: 161302
2012-08-04 14:10:46 +00:00
Gabor Greif a1529b6ca4 allow 'make CPPFLAGS=<something>' work again
this makes this hack a bit more bearable
for poor souls who need to pass custom
preprocessor flags to the build process

llvm-svn: 161240
2012-08-03 13:31:24 +00:00
Jakob Stoklund Olesen ed6c0408fa Remove variable_ops from call instructions in most targets.
Call instructions are no longer required to be variadic, and
variable_ops should only be used for instructions that encode a variable
number of arguments, like the ARM stm/ldm instructions.

llvm-svn: 160189
2012-07-13 20:44:29 +00:00
Evan Cheng 39e90029a2 Target option DisableJumpTables is a gross hack. Move it to TargetLowering instead.
llvm-svn: 159611
2012-07-02 22:39:56 +00:00
Bob Wilson bbd38dd9c0 Add all codegen passes to the PassManager via TargetPassConfig.
This is a preliminary step toward having TargetPassConfig be able to
start and stop the compilation at specified passes for unit testing
and debugging.  No functionality change.

llvm-svn: 159567
2012-07-02 19:48:31 +00:00
Bill Wendling e38859dc8e Move lib/Analysis/DebugInfo.cpp to lib/VMCore/DebugInfo.cpp and
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.

The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.

llvm-svn: 159312
2012-06-28 00:05:13 +00:00
Jack Carter 5e69cffed5 There are a number of generic inline asm operand modifiers that
up to r158925 were handled as processor specific. Making them 
generic and putting tests for these modifiers in the CodeGen/Generic
directory caused a number of targets to fail. 

This commit addresses that problem by having the targets call 
the generic routine for generic modifiers that they don't currently
have explicit code for.

For now only generic print operands 'c' and 'n' are supported.vi


Affected files:

    test/CodeGen/Generic/asm-large-immediate.ll
    lib/Target/PowerPC/PPCAsmPrinter.cpp
    lib/Target/NVPTX/NVPTXAsmPrinter.cpp
    lib/Target/ARM/ARMAsmPrinter.cpp
    lib/Target/XCore/XCoreAsmPrinter.cpp
    lib/Target/X86/X86AsmPrinter.cpp
    lib/Target/Hexagon/HexagonAsmPrinter.cpp
    lib/Target/CellSPU/SPUAsmPrinter.cpp
    lib/Target/Sparc/SparcAsmPrinter.cpp
    lib/Target/MBlaze/MBlazeAsmPrinter.cpp
    lib/Target/Mips/MipsAsmPrinter.cpp
    
MSP430 isn't represented because it did not even run with
the long existing 'c' modifier and it was not apparent what
needs to be done to get it inline asm ready.

Contributer: Jack Carter
llvm-svn: 159203
2012-06-26 13:49:27 +00:00
NAKAMURA Takumi 704de074b8 llvm/lib: [CMake] Add explicit dependency to intrinsics_gen.
llvm-svn: 159112
2012-06-24 13:32:01 +00:00
Craig Topper 2361cd9897 Silence an unused variable warning on release builds.
llvm-svn: 159074
2012-06-23 08:09:30 +00:00
Hal Finkel 460e94d842 Add support for the PPC isel instruction.
The isel (integer select) instruction is supported on the 440 and A2
embedded cores and on the POWER7.

llvm-svn: 159045
2012-06-22 23:10:08 +00:00
Hal Finkel 0a479ae7d1 Convert the PPC backend to use the new FMA infrastructure.
The existing contraction patterns are replaced with fma/fneg.
Overall functionality should be the same.

llvm-svn: 158955
2012-06-22 00:49:52 +00:00
Hal Finkel a86b0f20dd Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc stores on PPC.
Thanks to Tobias von Koch for pointing out this problem.

llvm-svn: 158932
2012-06-21 20:10:48 +00:00
Hal Finkel ca542beffe Add support for generating reg+reg (indexed) pre-inc loads on PPC.
llvm-svn: 158823
2012-06-20 15:43:03 +00:00
Lang Hames 39fb1d08dc Add DAG-combines for aggressive FMA formation.
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
      AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
        OR
      UnsafeFPMath option (-enable-unsafe-fp-math)
    are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
    the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).

If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.

llvm-svn: 158757
2012-06-19 22:51:23 +00:00
Jakob Stoklund Olesen 0f855e4263 Implement PPCInstrInfo::isCoalescableExtInstr().
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.

This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.

This is related to PR5997.

llvm-svn: 158743
2012-06-19 21:14:34 +00:00
Hal Finkel d465810f7c Mark most PPC register classes to avoid write-after-write.
For processors with the G5-like instruction-grouping scheme, this helps avoid
early group termination due to a write-after-write dependency within the group.
It should also help on pipelined embedded cores.

On POWER7, over the test suite, this gives an average 0.5% speedup. The largest
speedups are:

SingleSource/Benchmarks/Stanford/Quicksort - 33%
MultiSource/Applications/d/make_dparser - 21%
MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12%
MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12%

Largest slowdowns:

SingleSource/Benchmarks/Stanford/Bubblesort - 23%
MultiSource/Benchmarks/Prolangs-C++/city/city - 21%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16%
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13%

llvm-svn: 158719
2012-06-19 13:57:17 +00:00
Hal Finkel 1cc27e44a4 Add support for generating reg+reg preinc stores on PPC.
PPC will now generate STWUX and friends.

llvm-svn: 158698
2012-06-19 02:34:32 +00:00
Hal Finkel 6261c2dc28 Cleanup trip-count finding for PPC CTR loops (and some bug fixes).
This cleans up the method used to find trip counts in order to form CTR loops on PPC.
This refactoring allows the pass to find loops which have a constant trip count but also
happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different
classes of loops that are currently ignored.

In addition, we now search through all potential induction operations instead of just the first.
Also, we check the predicate code on the conditional branch and abort the transformation if the
code is not EQ or NE, and we then make sure that the branch to be transformed matches the
condition register defined by the comparison (multiple possible comparisons will be considered).

llvm-svn: 158607
2012-06-16 20:34:07 +00:00
Hal Finkel 9898614854 Add another missing 64-bit itinerary definition for the PPC A2 core.
llvm-svn: 158393
2012-06-13 05:55:09 +00:00
Hal Finkel 79c39da135 Add some missing 64-bit itinerary definitions for the PPC A2 core.
llvm-svn: 158373
2012-06-12 20:32:29 +00:00
Hal Finkel 8c33dde666 Split out the PPC instruction class IntSimple from IntGeneral.
On the POWER7, adds and logical operations can also be handled
in the load/store pipelines. We'll call these IntSimple.

llvm-svn: 158366
2012-06-12 19:01:24 +00:00
Hal Finkel f1cc96ab50 Fixes for PPC host detection and features.
POWER4 is a 64-bit CPU (better matched to the 970).
The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750).

Patch by Andreas Tobler.

llvm-svn: 158363
2012-06-12 16:39:23 +00:00
Hal Finkel 59b0ee8a56 Reapply r158337, this time properly protect Darwin/PPC host CPU use with __ppc__.
Original commit message:
Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().

Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).

llvm-svn: 158349
2012-06-12 03:03:13 +00:00
Jakob Stoklund Olesen f8f128606c Revert r158337 "Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName()."
This commit broke most of the PowerPC unit tests when running on
Intel/Apple.

llvm-svn: 158345
2012-06-12 00:58:40 +00:00
Hal Finkel 23c699e497 Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).

llvm-svn: 158337
2012-06-11 23:14:31 +00:00
Hal Finkel bddc916f2b Enable MFOCRF generation on the PPC A2 core.
llvm-svn: 158324
2012-06-11 19:57:04 +00:00
Hal Finkel bfd3d08d18 Rename the PPC target feature gpul to mfocrf.
The PPC target feature gpul (IsGigaProcessor) was only used for one thing:
To enable the generation of the MFOCRF instruction. Furthermore, this
instruction is available on other PPC cores outside of the G5 line. This
feature now corresponds to the HasMFOCRF flag.

No functionality change.

llvm-svn: 158323
2012-06-11 19:57:01 +00:00
Hal Finkel 25d4c568d3 Add A2 to the list of PPC CPUs recognized by Linux host CPU-type detection.
llvm-svn: 158322
2012-06-11 19:56:57 +00:00
Hal Finkel 2c09058f19 Emit the two-operand form of the PPC mfcr instruction as mfocrf.
This is necessary on Linux and supported on Darwin, see PR2604.

llvm-svn: 158315
2012-06-11 15:43:15 +00:00
Hal Finkel ba671c0ea7 Add local CPU detection for Linux PPC.
This functionality mirrors that available on PPC/Darwin.

llvm-svn: 158314
2012-06-11 15:43:13 +00:00
Hal Finkel f2b9c38d6f Add POWER6 and POWER7 CPU types to the PPC backend.
No functional change; these will be used by upcoming scheduler enhancements.

llvm-svn: 158313
2012-06-11 15:43:08 +00:00
Hal Finkel 4e9f1a859f Enable ILP scheduling for all nodes by default on PPC.
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).

Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)

Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%

llvm-svn: 158296
2012-06-10 19:32:29 +00:00
Hal Finkel a8100281ae Use critical anti-dep. breaking on all PPC targets, but also add other register classes.
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.

This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%

Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%

llvm-svn: 158294
2012-06-10 11:15:36 +00:00
Hal Finkel 2edfbddcf0 Improve ext/trunc patterns on PPC64.
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283
2012-06-09 22:10:19 +00:00
Hal Finkel eb50c2d4a4 Enable tail merging on PPC.
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).

Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%

Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%

This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.

llvm-svn: 158259
2012-06-09 03:14:50 +00:00
Hal Finkel 41e6fd1df9 Remove the TODO statement in the PPC README re: CTR loops
As Chris points out, this can now be removed!

TODO: check if the associated section on viterbi's inner loop can also be removed.
llvm-svn: 158224
2012-06-08 20:02:09 +00:00
Hal Finkel c6b5debb40 Enable PPC CTR loop formation by default.
Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223
2012-06-08 19:19:53 +00:00
Hal Finkel 3d32ad3a7f Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable.
Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.

Thanks again to Jakob for suggesting this fix to the CTR loop problem!

llvm-svn: 158221
2012-06-08 19:02:08 +00:00
Hal Finkel 821e00121c Disable the PPC CTR-Loops pass by default.
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.

llvm-svn: 158206
2012-06-08 15:38:25 +00:00
Hal Finkel 8b01503ee5 Fix a bug in the new PPC CTR-Loops pass.
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
    %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16

llvm-svn: 158205
2012-06-08 15:38:23 +00:00
Hal Finkel 96c2d4d945 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Roman Divacky c856653fb3 PPC32 uses R2 as the TLS register. Fix the copy and paste.
llvm-svn: 158004
2012-06-05 17:14:17 +00:00
Roman Divacky e3f15c98d1 Implement local-exec TLS on PowerPC.
llvm-svn: 157935
2012-06-04 17:36:38 +00:00
Hal Finkel 1de9bf01e4 Fix a copy-and-paste duplication error in the PPC 440 and A2 schedules (no functionality change).
llvm-svn: 157912
2012-06-04 02:39:52 +00:00
Hal Finkel 595817eebe Enable generating PPC pre-increment (r+imm) instructions by default.
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.

llvm-svn: 157911
2012-06-04 02:21:00 +00:00
Justin Holewinski aa58397b3c Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall
to pass around a struct instead of a large set of individual values.  This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.

NV_CONTRIB

llvm-svn: 157479
2012-05-25 16:35:28 +00:00
Hal Finkel 601f555eee Add a missing PPC 64-bit stwu pattern.
This seems to fix the remaining compile-time failures on PPC64 when
compiling with -enable-ppc-preinc.

llvm-svn: 157159
2012-05-20 17:11:24 +00:00
Hal Finkel 66b0c93553 Add a FIXME about access to negative stack-pointer offsets on PPC32.
The current code will generate a prologue which starts with something like:
        mflr 0
        stw 31, -4(1)
        stw 0, 4(1)
        stwu 1, -16(1)

But under the PPC32 SVR4 ABI, access to negative offsets from R1 is not allowed.

This was pointed out by Peter Bergner.

llvm-svn: 157133
2012-05-19 21:52:55 +00:00
Jim Grosbach c3b0427921 Allow MCCodeEmitter access to the target MCRegisterInfo.
Add the MCRegisterInfo to the factories and constructors.

Patch by Tom Stellard <Tom.Stellard@amd.com>.

llvm-svn: 156828
2012-05-15 17:35:52 +00:00
Roman Divacky e07cc042f6 Mark .opd @progbits, thus avoiding a warning from asm.
llvm-svn: 156494
2012-05-09 18:24:23 +00:00
Jakob Stoklund Olesen 3c52f0281f Add an MF argument to TRI::getPointerRegClass() and TII::getRegClass().
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).

So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.

Patch by Yiannis Tsiouris!

llvm-svn: 156328
2012-05-07 22:10:26 +00:00