Commit Graph

4059 Commits

Author SHA1 Message Date
David Blaikie 908f4d4bf5 Spread some const around for non-mutating uses of MCSymbolData.
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.

llvm-svn: 207124
2014-04-24 16:59:40 +00:00
Evgeniy Stepanov 0a951b775e Create MCTargetOptions.
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.

Patch by Yuri Gorshenin.

llvm-svn: 206971
2014-04-23 11:16:03 +00:00
Chandler Carruth 84e68b2994 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842
2014-04-22 02:41:26 +00:00
Chandler Carruth d174b72a28 [cleanup] Lift using directives, DEBUG_TYPE definitions, and even some
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...

llvm-svn: 206838
2014-04-22 02:03:14 +00:00
Chandler Carruth e96dd8975f [Modules] Make Support/Debug.h modular. This requires it to not change
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.

This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:

- Header files that need to provide a DEBUG_TYPE for some inline code
  can do so by defining the macro before their inline code and undef-ing
  it afterward so the macro does not escape.

- We no longer have rampant ODR violations due to including headers with
  different DEBUG_TYPE definitions. This may be mostly an academic
  violation today, but with modules these types of violations are easy
  to check for and potentially very relevant.

Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.

The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.

llvm-svn: 206822
2014-04-21 22:55:11 +00:00
Nick Lewycky aad475b324 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Lang Hames a1bc0f5662 [MC] Require an MCContext when constructing an MCDisassembler.
This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.

This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).

llvm-svn: 206241
2014-04-15 04:40:56 +00:00
Hal Finkel 0192cbac66 [PowerPC] [Constant Hoisting] Enable constant hoisting on PPC
Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:

MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).

llvm-svn: 206141
2014-04-13 23:02:40 +00:00
Hal Finkel d9963c75da [PowerPC] Fix rlwimi isel when mask is not constant
We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.

This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.

llvm-svn: 206136
2014-04-13 17:10:58 +00:00
Hal Finkel 34974ed503 [PowerPC] Implement some additional TLI callbacks
Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const

Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:

SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup

MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown

llvm-svn: 206120
2014-04-12 21:52:38 +00:00
NAKAMURA Takumi 98905d3f85 LLVMBuild.txt: Reformat.
llvm-svn: 205961
2014-04-10 11:16:17 +00:00
Hal Finkel a775e51274 [PowerPC] Don't return false from PPC::isVSLDOIShuffleMask
PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.

Noticed by inspection; no test case (yet).

llvm-svn: 205787
2014-04-08 19:00:27 +00:00
Hal Finkel 41e9b1d559 [PowerPC] Remove unused TM member variable to unbreak build
Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]"

llvm-svn: 205660
2014-04-05 00:16:28 +00:00
Hal Finkel de0b413ec0 [PowerPC] Adjust load/store costs in PPCTTI
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.

Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown

Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup

Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.

llvm-svn: 205658
2014-04-04 23:51:18 +00:00
Hal Finkel b1308d525c [PowerPC] PPCTTI Cleanup
Remove the declaration of an unimplemented function.

llvm-svn: 205657
2014-04-04 23:51:11 +00:00
Hal Finkel fbf7e2a1a1 [PowerPC] Add a full condition code register to make the "cc" clobber work
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.

llvm-svn: 205630
2014-04-04 15:15:57 +00:00
Craig Topper 840beec2d0 Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Hal Finkel f823380a44 [PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost
PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.

llvm-svn: 205476
2014-04-02 22:43:49 +00:00
Jim Grosbach 36c4953348 Simplify resolveFrameIndex() signature.
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.

llvm-svn: 205453
2014-04-02 19:28:18 +00:00
Hal Finkel 9e0baa6d3a [PowerPC] Add some missing VSX bitcast patterns
llvm-svn: 205352
2014-04-01 19:24:27 +00:00
Hal Finkel b4240ca0f4 [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.

llvm-svn: 205231
2014-03-31 17:48:16 +00:00
Hal Finkel 4c8f634f23 [PowerPC] Correct P7 dispatch unit allocation for vector instructions
llvm-svn: 205222
2014-03-31 17:02:10 +00:00
Hal Finkel 5c0d1454d6 [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.

For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.

llvm-svn: 205146
2014-03-30 13:22:59 +00:00
Hal Finkel 777c9dd90a [PowerPC] Handle v2i64 comparisons
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.

llvm-svn: 205106
2014-03-29 16:04:40 +00:00
Hal Finkel e8fba98735 [PowerPC] VSX instruction latency corrections
The vector divide and sqrt instructions have high latencies, and the scalar
comparisons are like all of the others. On the P7, permutations take an extra
cycle over purely-simple vector ops.

llvm-svn: 205096
2014-03-29 13:20:31 +00:00
Rafael Espindola 5904e12bfa Completely rewrite ELFObjectWriter::RecordRelocation.
I started trying to fix a small issue, but this code has seen a small fix too
many.

The old code was fairly convoluted. Some of the issues it had:

* It failed to check if a symbol difference was in the some section when
  converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
  relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
  requiring a quiet nasty work around in ARM.
* It was missing comments.

llvm-svn: 205076
2014-03-29 06:26:49 +00:00
Hal Finkel 19be506a5e [PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

llvm-svn: 205075
2014-03-29 05:29:01 +00:00
Hal Finkel 2583b06310 [PowerPC] Fix VSX permutation isel
Not only did I invert the indices when I wrote the code, but I also did the
same thing when I wrote the regression test. Oops.

llvm-svn: 205046
2014-03-28 20:24:55 +00:00
Hal Finkel 7811c6188e [PowerPC] v2[fi]64 need to be explicitly passed in VSX registers
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.

llvm-svn: 205041
2014-03-28 19:58:11 +00:00
Hal Finkel c6fc9b8960 [PowerPC] Use a small cleanup pass to remove VSX self copies
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

This adds a small cleanup pass to remove these prior to post-RA scheduling.

llvm-svn: 204980
2014-03-27 23:12:31 +00:00
Hal Finkel 9dcb3583d5 [PowerPC] Don't remove self VSX copies in PPCInstrInfo::copyPhysReg
Because of how the allocation of VSX registers interacts with the call-lowering
code, we sometimes end up generating self VSX copies. Specifically, things like
this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

The problem is that ExpandPostRAPseudos always assumes that *some* instruction
has been inserted, and adds implicit defs to it. This is a problem if no copy
was inserted because it can cause subtle problems during post-RA scheduling.
These self copies will have to be removed some other way.

llvm-svn: 204976
2014-03-27 22:46:28 +00:00
Hal Finkel 82569b6366 [PowerPC] Fix v2f64 vector extract and related patterns
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.

llvm-svn: 204971
2014-03-27 22:22:48 +00:00
Hal Finkel ad801b7459 [PowerPC] Expand v2i64 shifts
These operations need to be expanded during legalization so that isel does not
crash. In theory, we might be able to custom lower some of these. That,
however, would need to be follow-up work.

llvm-svn: 204963
2014-03-27 21:26:33 +00:00
Rafael Espindola c03f44ca8a Remove another unused argument.
llvm-svn: 204961
2014-03-27 20:49:35 +00:00
Rafael Espindola 9ab380122a Remove unused argument.
llvm-svn: 204956
2014-03-27 20:41:17 +00:00
Rafael Espindola 24a669d225 Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Hal Finkel df3e34d944 [PowerPC] Generate VSX permutations for v2[fi]64 vectors
llvm-svn: 204873
2014-03-26 22:58:37 +00:00
Hal Finkel 6e28e6aaaf [PowerPC] VSX loads and stores support unaligned access
I've not yet updated PPCTTI because I'm not sure what the actual relative cost
is compared to the aligned uses.

llvm-svn: 204848
2014-03-26 19:39:09 +00:00
Hal Finkel 7279f4b00d [PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions
llvm-svn: 204843
2014-03-26 19:13:54 +00:00
Hal Finkel ea76a44584 [PowerPC] Remove some dead VSX v4f32 store patterns
These patterns are dead (because v4f32 stores are currently promoted to v4i32
and stored using Altivec instructions), and also are likely not correct
(because they'd store the vector elements in the opposite order from that
assumed by the rest of the Altivec code).

llvm-svn: 204839
2014-03-26 18:26:36 +00:00
Hal Finkel 9281c9a38b [PowerPC] Use VSX vector load/stores for v2[fi]64
These instructions have access to the complete VSX register file. In addition,
they "swap" the order of the elements so that element 0 (the scalar part) comes
first in memory and element 1 follows at a higher address.

llvm-svn: 204838
2014-03-26 18:26:30 +00:00
Hal Finkel a6c8b51212 [PowerPC] Add v2i64 as a legal VSX type
v2i64 needs to be a legal VSX type because it is the SetCC result type from
v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations.

This fixes the lowering for v2f64 VSELECT.

llvm-svn: 204828
2014-03-26 16:12:58 +00:00
Hal Finkel 732f0f73a7 [PowerPC] Lower VSELECT using xxsel when VSX is available
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.

llvm-svn: 204801
2014-03-26 12:49:28 +00:00
Rafael Espindola 65481d7b97 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Hal Finkel bd4de9d478 [PowerPC] Generate logical vector VSX instructions
These instructions are essentially the same as their Altivec counterparts, but
have access to the larger VSX register file.

llvm-svn: 204782
2014-03-26 04:55:40 +00:00
Rafael Espindola 3b712a84a9 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Hal Finkel 174e590966 [PowerPC] Select between VSX A-type and M-type FMA instructions just before RA
The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:

 1. This will eliminate an otherwise-necessary copy of the addend

 2. One of the product operands is killed by the instruction

The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.

As a simple example, if we have:

%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
  ...
  %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
                        %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
  ...

We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:

  %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16

is replaced by:

  %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
                        %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9

and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9

llvm-svn: 204768
2014-03-25 23:29:21 +00:00
Hal Finkel 6c32ff31d0 [PowerPC] Correct commutable indices for VSX FMA instructions
Although the first two operands are the ones that can be swapped, the tied
input operand is listed before them, so we need to adjust for that.

I have a test case for this, but it goes along with an upcoming commit (so it
will come soon).

llvm-svn: 204748
2014-03-25 19:26:43 +00:00
Hal Finkel 25e0454f10 [PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructions
TableGen will create a lookup table for the A-type FMA instructions providing
their corresponding M-form opcodes. This will be used by upcoming commits.

llvm-svn: 204746
2014-03-25 18:55:11 +00:00
Ulrich Weigand cae3a17a21 [PowerPC] Generate little-endian object files
As a first step towards real little-endian code generation, this patch
changes the PowerPC MC layer to actually generate little-endian object
files.  This involves passing the little-endian flag through the various
layers, including down to createELFObjectWriter so we actually get basic
little-endian ELF objects, emitting instructions in little-endian order,
and handling fixups and relocations as appropriate for little-endian.

The bulk of the patch is to update most test cases in test/MC/PowerPC
to verify both big- and little-endian encodings.  (The only test cases
*not* updated are those that create actual big-endian ABI code, like
the TLS tests.)

Note that while the object files are now little-endian, the generated
code itself is not yet updated, in particular, it still does not adhere
to the ELFv2 ABI.

llvm-svn: 204634
2014-03-24 18:16:09 +00:00
Will Schmidt 114777e47f [PPC64LE] ELFv2 ABI updates for the .opd section
[PPC64LE] ELFv2 ABI updates for the .opd section
The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as
such, does not have a ".opd" section.  This is keyed off a _CALL_ELF=2
macro check.

The CALL_ELF check is not clearly documented at this time.  The basis
for usage in this patch is from the gcc thread here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html

> Adding comment from Uli:
Looks good to me.  I think the old-style JIT doesn't really work
anyway for 64-bit, but at least with this patch LLVM will compile
and link again on a ppc64le host ...

llvm-svn: 204614
2014-03-24 16:04:15 +00:00
Hal Finkel e01d32107c [PowerPC] Mark many instructions as commutative
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.

Before this change, we had the following commutative instructions:

 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 XSADDDP
 XSMULDP
 XVADDDP
 XVADDSP
 XVMULDP
 XVMULSP

After:

 ADD4
 ADD4o
 ADD8
 ADD8o
 ADDC
 ADDC8
 ADDC8o
 ADDCo
 ADDE
 ADDE8
 ADDE8o
 ADDEo
 AND
 AND8
 AND8o
 ANDo
 CRAND
 CREQV
 CRNAND
 CRNOR
 CROR
 CRXOR
 EQV
 EQV8
 EQV8o
 EQVo
 FADD
 FADDS
 FADDSo
 FADDo
 FMADD
 FMADDS
 FMADDSo
 FMADDo
 FMSUB
 FMSUBS
 FMSUBSo
 FMSUBo
 FMUL
 FMULS
 FMULSo
 FMULo
 FNMADD
 FNMADDS
 FNMADDSo
 FNMADDo
 FNMSUB
 FNMSUBS
 FNMSUBSo
 FNMSUBo
 MULHD
 MULHDU
 MULHDUo
 MULHDo
 MULHW
 MULHWU
 MULHWUo
 MULHWo
 MULLD
 MULLDo
 MULLW
 MULLWo
 NAND
 NAND8
 NAND8o
 NANDo
 NOR
 NOR8
 NOR8o
 NORo
 OR
 OR8
 OR8o
 ORo
 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 VADDCUW
 VADDFP
 VADDSBS
 VADDSHS
 VADDSWS
 VADDUBM
 VADDUBS
 VADDUHM
 VADDUHS
 VADDUWM
 VADDUWS
 VAND
 VAVGSB
 VAVGSH
 VAVGSW
 VAVGUB
 VAVGUH
 VAVGUW
 VMADDFP
 VMAXFP
 VMAXSB
 VMAXSH
 VMAXSW
 VMAXUB
 VMAXUH
 VMAXUW
 VMHADDSHS
 VMHRADDSHS
 VMINFP
 VMINSB
 VMINSH
 VMINSW
 VMINUB
 VMINUH
 VMINUW
 VMLADDUHM
 VMULESB
 VMULESH
 VMULEUB
 VMULEUH
 VMULOSB
 VMULOSH
 VMULOUB
 VMULOUH
 VNMSUBFP
 VOR
 VXOR
 XOR
 XOR8
 XOR8o
 XORo
 XSADDDP
 XSMADDADP
 XSMAXDP
 XSMINDP
 XSMSUBADP
 XSMULDP
 XSNMADDADP
 XSNMSUBADP
 XVADDDP
 XVADDSP
 XVMADDADP
 XVMADDASP
 XVMAXDP
 XVMAXSP
 XVMINDP
 XVMINSP
 XVMSUBADP
 XVMSUBASP
 XVMULDP
 XVMULSP
 XVNMADDADP
 XVNMADDASP
 XVNMSUBADP
 XVNMSUBASP
 XXLAND
 XXLNOR
 XXLOR
 XXLXOR

This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.

llvm-svn: 204609
2014-03-24 15:07:28 +00:00
Hal Finkel 32854b0439 [PowerPC] Don't schedule VSX copy legalization unless VSX is enabled
There is no need to schedule this extra pass if it will have nothing to do.

llvm-svn: 204594
2014-03-24 09:51:41 +00:00
Hal Finkel bbad2332e3 [PowerPC] Update comment re: VSX copy-instruction selection
I've done some experimentation with this, and it looks like using the
lower-latency (but lower throughput) copy instruction is essentially always the
right thing to do.

My assumption is that, in order to be relatively sure that the higher-latency
copy will increase throughput, we'd want to have it unlikely to be in-flight
with its use. On the P7, the global completion table (GCT) can hold a maximum
of 120 instructions, shared among all active threads (up to 4), giving 30
instructions per thread.  So specifically, I'd require at least that many
instructions between the copy and the use before the high-latency variant is
used.

Trying this, however, over the entire test suite resulted in zero cases where
the high-latency form would be preferable. This may be a consequence of the
fact that the scheduler views copies as free, and so they tend to end up close
to their uses. For this experiment I created a function:

  unsigned chooseVSXCopy(MachineBasicBlock &MBB,
                         MachineBasicBlock::iterator I,
                         unsigned DestReg, unsigned SrcReg,
                         unsigned StartDist = 1,
                         unsigned Depth = 3) const;

with an implementation like:

  if (!Depth)
    return PPC::XXLOR;

  const unsigned MaxDist = 30;
  unsigned Dist = StartDist;
  for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) {
    if (J->isTransient() && !J->isCopy())
      continue;

    if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI))
      return PPC::XXLOR;

    ++Dist;
  }

  // We've exceeded the required distance for the high-latency form, use it.
  if (Dist > MaxDist)
    return PPC::XVCPSGNDP;

  // If this is only an exit block, use the low-latency form.
  if (MBB.succ_empty())
    return PPC::XXLOR;

  // We've reached the end of the block, check the successor blocks (up to some
  // depth), and use the high-latency form if that is okay with all successors.
  for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) {
    if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg,
                      Dist, --Depth) == PPC::XXLOR)
      return PPC::XXLOR;
  }

  // All of our successor blocks seem okay with the high-latency variant, so
  // we'll use it.
  return PPC::XVCPSGNDP;

and then changed the copy opcode selection from:
    Opc = PPC::XXLOR;
to:
    Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg);

In conclusion, I'm removing the FIXME from the comment, because I believe that
there is, at least absent other examples, nothing to fix.

llvm-svn: 204591
2014-03-24 09:36:36 +00:00
Nuno Lopes 31617266ea remove a bunch of unused private methods
found with a smarter version of -Wunused-member-function that I'm playwing with.
Appologies in advance if I removed someone's WIP code.

 include/llvm/CodeGen/MachineSSAUpdater.h            |    1 
 include/llvm/IR/DebugInfo.h                         |    3 
 lib/CodeGen/MachineSSAUpdater.cpp                   |   10 --
 lib/CodeGen/PostRASchedulerList.cpp                 |    1 
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp    |   10 --
 lib/IR/DebugInfo.cpp                                |   12 --
 lib/MC/MCAsmStreamer.cpp                            |    2 
 lib/Support/YAMLParser.cpp                          |   39 ---------
 lib/TableGen/TGParser.cpp                           |   16 ---
 lib/TableGen/TGParser.h                             |    1 
 lib/Target/AArch64/AArch64TargetTransformInfo.cpp   |    9 --
 lib/Target/ARM/ARMCodeEmitter.cpp                   |   12 --
 lib/Target/ARM/ARMFastISel.cpp                      |   84 --------------------
 lib/Target/Mips/MipsCodeEmitter.cpp                 |   11 --
 lib/Target/Mips/MipsConstantIslandPass.cpp          |   12 --
 lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp              |   21 -----
 lib/Target/NVPTX/NVPTXISelDAGToDAG.h                |    2 
 lib/Target/PowerPC/PPCFastISel.cpp                  |    1 
 lib/Transforms/Instrumentation/AddressSanitizer.cpp |    2 
 lib/Transforms/Instrumentation/BoundsChecking.cpp   |    2 
 lib/Transforms/Instrumentation/MemorySanitizer.cpp  |    1 
 lib/Transforms/Scalar/LoopIdiomRecognize.cpp        |    8 -
 lib/Transforms/Scalar/SCCP.cpp                      |    1 
 utils/TableGen/CodeEmitterGen.cpp                   |    2 
 24 files changed, 2 insertions(+), 261 deletions(-)

llvm-svn: 204560
2014-03-23 17:09:26 +00:00
Hal Finkel 4a912250fa [PowerPC] Make use of VSX f64 <-> i64 conversion instructions
When VSX is available, these instructions should be used in preference to the
older variants that only have access to the scalar floating-point registers.

llvm-svn: 204559
2014-03-23 05:35:00 +00:00
Hal Finkel 55805eb562 [PowerPC] Fix the VSX v2f64 return register
v2f64 values, like other 128-bit values, are returned under VSX in register
vs34 (Altivec register v2).

llvm-svn: 204543
2014-03-22 18:24:43 +00:00
Alexey Samsonov 94bc422d7a Add llvm_unreachable after fully-covered switches to appease GCC
llvm-svn: 204318
2014-03-20 07:30:40 +00:00
Rafael Espindola 7fadc0ea7d Look through variables when computing relocations.
Given

bar = foo + 4
	.long bar

MC would eat the 4. GNU as includes it in the relocation. The rule seems to be
that a variable that defines a symbol is used in the relocation and one that
does not define a symbol is evaluated and the result included in the relocation.

Fixing this unfortunately required some other changes:

* Since the variable is now evaluated, it would prevent the ELF writer from
  noticing the weakref marker the elf streamer uses. This patch then replaces
  that with a VariantKind in MCSymbolRefExpr.

* Using VariantKind then requires us to look past other VariantKind to see

	.weakref	bar,foo
	call	bar@PLT

  doing this also fixes

	zed = foo +2
	call zed@PLT

  so that is a good thing.

* Looking past VariantKind means that the relocation selection has to use
  the fixup instead of the target.

This is a reboot of the previous fixes for MC. I will watch the sanitizer
buildbot and wait for a build before adding back the previous fixes.

llvm-svn: 204294
2014-03-20 02:12:01 +00:00
Bill Schmidt ff9622ef0e Fix PR19144: Incorrect offset generated for int-to-fp conversion at -O0.
When converting a signed 32-bit integer to double-precision floating point on
hardware without a lfiwax instruction, we have to instead use a lfd followed
by fcfid.  We were erroneously offsetting the address by 4 bytes in
preparation for either a lfiwax or lfiwzx when generating the lfd.  This fixes
that silly error.

This was not caught in the test suite since the conversion tests were run with
-mcpu=pwr7, which implies availability of lfiwax.  I've added another test
case for older hardware that checks the code we expect in the absence of
lfiwax and other flavors of fcfid.  There are fewer tests in this test case
because we punt to DAG selection in more cases on older hardware.  (We must
generate complex fiddly sequences in those cases, and there is marginal
benefit in duplicating that logic in fast-isel.)

llvm-svn: 204155
2014-03-18 14:32:50 +00:00
Craig Topper 26696314d5 [C++11] Mark the target fast isel classes as 'final' so that the compiler can de-virtualize some of the internal calls.
llvm-svn: 204123
2014-03-18 07:27:13 +00:00
Ulrich Weigand f445399870 [ppc64] Avoid copy relocs in named rodata sections
Commit r181723 introduced code to avoid placing initialized variables
needing relocations into the .rodata section, which avoid copy relocs
that do not work as expected on ppc64 function references.

The same treatment is also needed for *named* .rodata.XXX sections.
This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal
to modify "Kind" *before* calling the default SelectSectionForGlobal
routine, instead of first calling the default routine and then just
checking for the (main) .rodata section afterwards.

llvm-svn: 203921
2014-03-14 12:45:22 +00:00
Owen Anderson 16c6bf49b7 Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&.  At this point they almost behave like normal iterators!

Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.

llvm-svn: 203865
2014-03-13 23:12:04 +00:00
Hal Finkel 27774d9274 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768
2014-03-13 07:58:58 +00:00
Hal Finkel 5457bd08cb [TableGen] Optionally forbid overlap between named and positional operands
There are currently two schemes for mapping instruction operands to
instruction-format variables for generating the instruction encoders and
decoders for the assembler and disassembler respectively: a) to map by name and
b) to map by position.

In the long run, we'd like to remove the position-based scheme and use only
name-based mapping. Unfortunately, the name-based scheme currently cannot deal
with complex operands (those with suboperands), and so we currently must use
the position-based scheme for those. On the other hand, the position-based
scheme cannot deal with (register) variables that are split into multiple
ranges. An upcoming commit to the PowerPC backend (adding VSX support) will
require this capability. While we could teach the position-based scheme to
handle that, since we'd like to move away from the position-based mapping
generally, it seems silly to teach it new tricks now. What makes more sense is
to allow for partial transitioning: use the name-based mapping when possible,
and only use the position-based scheme when necessary.

Now the problem is that mixing the two sensibly was not possible: the
position-based mapping would map based on position, but would not skip those
variables that were mapped by name. Instead, the two sets of assignments would
overlap. However, I cannot currently change the current behavior, because there
are some backends that rely on it [I think mistakenly, but I'll send a message
to llvmdev about that]. So I've added a new TableGen bit variable:
noNamedPositionallyEncodedOperands, that can be used to cause the
position-based mapping to skip variables mapped by name.

llvm-svn: 203767
2014-03-13 07:57:54 +00:00
Roman Divacky a26f9a6a42 Allow exclamation and tilde to be parsed as a part of the ppc asm operand.
llvm-svn: 203699
2014-03-12 19:25:57 +00:00
Rafael Espindola 3d5d464df8 Try harder to evaluate expressions when printing assembly.
When printing assembly we don't have a Layout object, but we can still
try to fold some constants.

Testcase by Ulrich Weigand.

llvm-svn: 203677
2014-03-12 16:55:59 +00:00
Will Schmidt acae468c8e Update the datalayout string for ppc64LE.
Update the datalayout string for ppc64LE.

llvm-svn: 203664
2014-03-12 14:59:17 +00:00
Patrik Hagglund 1da3512166 Replace '#include ValueTypes.h' with forward declarations.
In some cases the include is pushed "downstream" (or removed if
unused).

llvm-svn: 203644
2014-03-12 08:00:24 +00:00
Chandler Carruth aee3ca6cfd [TTI] There is actually no realistic way to pop TTI implementations off
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.

This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.

llvm-svn: 203437
2014-03-10 02:45:14 +00:00
Rafael Espindola 24a542fd5c Don't avoid cfi instructions on the bg/p.
The integrated assembler now works for ppc. Since this was the last use of the
bg/p predicate and Hal says that it is now dead, drop the predicate too.

llvm-svn: 203269
2014-03-07 19:04:12 +00:00
David Majnemer 7b58305ff6 MC: Remove superfluous section attribute flag definitions
Summary:
llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same
definitions for the section flags.  Instead, grab the definitions out of
support.

No functionality change.

Reviewers: grosbach, Bigcheese, rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2998

llvm-svn: 203211
2014-03-07 07:36:05 +00:00
Rafael Espindola b1f25f1b93 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Hal Finkel 6daf2aa140 The PPC global base register cannot be r0
The global base register cannot be r0 because it might end up as the first
argument to addi or addis. Fixes PR18316.

I don't have a small stable test case.

llvm-svn: 203054
2014-03-06 01:28:23 +00:00
Chandler Carruth 9a4c9e597b [Layering] Move DebugInfo.h into the IR library where its implementation
already lives.

llvm-svn: 203046
2014-03-06 00:46:21 +00:00
Hal Finkel 7f908e8ef4 Fixup PPC Darwin i1 argument handling
Like on other targets, we need to zero_extend/truncate i1 args before copying
them to GPRs.

llvm-svn: 203045
2014-03-06 00:45:19 +00:00
Hal Finkel 2a9d318e4a When using CR bit registers on PPC32, handle the i1 vaarg case
When copying an i1 value into a GPR for a vaarg call, we need to explicitly
zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be
generated).

llvm-svn: 203041
2014-03-06 00:23:33 +00:00
Hal Finkel 6a56b21729 With PPC CR bit registers, handle int_to_fp on older cores
On cores without fpcvt support, we cannot promote int_to_fp i1 operations,
because there is nothing to promote them to. The most straightforward
implementation of this uses a select to choose between the two possible
resulting floating-point values (and that's what is done here).

llvm-svn: 203015
2014-03-05 22:14:00 +00:00
Joerg Sonnenberger cce644a633 Enable integrated assembler on OpenBSD/PPC32 by default, too.
From Brad Smith.

llvm-svn: 202967
2014-03-05 11:37:04 +00:00
Will Schmidt 3503018e19 [PowerPC] support powerpc64le as syntax-checking target (pass2)
Register the Asm Printer for the ppc64le target.

This fills in a spot that was missed in an earlier change (r187179).

llvm-svn: 202861
2014-03-04 16:51:52 +00:00
Chandler Carruth 4220e9c154 [Modules] Move ValueHandle into the IR library where Value itself lives.
Move the test for this class into the IR unittests as well.

This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.

llvm-svn: 202821
2014-03-04 11:17:44 +00:00
Chandler Carruth 03eb0de93d [Modules] Move GetElementPtrTypeIterator into the IR library. As its
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.

Another step of modularizing the support library.

llvm-svn: 202815
2014-03-04 10:40:04 +00:00
Hal Finkel 6aca2373f2 Add a PPC inline asm constraint type for single CR bits
Now that the PowerPC backend can track individual CR bits as first-class
registers, we should also have a way of allocating them for inline asm
statements. Because these registers are only one bit, if an output variable is
implicitly cast to a larger integer size, we'll get an any_extend to that
larger type (this is part of the existing target-independent logic). As a
result, regardless of the size of the output type, only the first bit is
meaningful.

The constraint identifier "wc" has been chosen for this purpose. Although gcc
does not currently support allocating individual CR bits, this identifier
choice has been coordinated with the gcc PowerPC team, and will be marked as
reserved for this purpose in the gcc constraints.md file.

llvm-svn: 202657
2014-03-02 18:23:39 +00:00
Benjamin Kramer b6d0bd48bd [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Craig Topper 73156025e0 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
Craig Topper 77dfe45f81 Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.
llvm-svn: 202618
2014-03-02 08:08:51 +00:00
Hal Finkel 46043edc56 Remove extra truncs/exts around i32 bit operations on PPC64
This generalizes the code to eliminate extra truncs/exts around i1 bit
operations to also do the same on PPC64 for i32 bit operations. This eliminates
a fairly prevalent code wart:

int foo(int a) {
  return a == 5 ? 7 : 8;
}

On PPC64, because of the extension implied by the ABI, this would generate:

	cmplwi 0, 3, 5
	li 12, 8
	li 4, 7
	isel 3, 4, 12, 2
	rldicl 3, 3, 0, 32
	blr

where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At
least for the single-BB case (which is all that the DAG combine mechanism can
handle), this unnecessary extension is no longer generated.

llvm-svn: 202600
2014-03-01 21:36:57 +00:00
Hal Finkel b998915ee1 Swap PPC isel operands to allow for 0-folding
The PPC isel instruction can fold 0 into the first operand (thus eliminating
the need to materialize a zero-containing register when the 'true' result of
the isel is 0). When the isel is fed by a bit register operation that we can
invert, do so as part of the bit-register-operation peephole routine.

llvm-svn: 202469
2014-02-28 06:11:16 +00:00
Hal Finkel 5cae2168c7 Trying to unbreak the darwin11 builder
The CR bit tracking code broke PPC/Darwin; trying to get it working again...

(the darwin11 builder, which defaults to the darwin ABI when running PPC tests,
asserted when running test/CodeGen/PowerPC/inverted-bool-compares.ll)

llvm-svn: 202459
2014-02-28 01:17:25 +00:00
Hal Finkel b39a0475c0 Try to unbreak the C++11 build
Cannot use negative numbers in case statements without running afoul of -Wc++11-narrowing.

llvm-svn: 202455
2014-02-28 00:45:27 +00:00
Hal Finkel 940ab934d4 Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

 - Reduction in register pressure (because we no longer need GPRs to store
   boolean values).

 - Logical operations on booleans can be handled more efficiently; we used to
   have to move all results from comparisons into GPRs, perform promoted
   logical operations in GPRs, and then move the result back into condition
   register bits to be used by conditional branches. This can be very
   inefficient, because the throughput of these CR <-> GPR moves have high
   latency and low throughput (especially when other associated instructions
   are accounted for).

 - On the POWER7 and similar cores, we can increase total throughput by using
   the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
  trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451
2014-02-28 00:27:01 +00:00
Aaron Ballman 3d69c5cae4 Silencing an MSVC signed comparison warning.
llvm-svn: 202295
2014-02-26 20:22:20 +00:00
Hal Finkel 22304046c7 Account for 128-bit integer operations in PPCCTRLoops
We need to abort the formation of counter-register-based loops where there are
128-bit integer operations that might become function calls.

llvm-svn: 202192
2014-02-25 20:51:50 +00:00
Rafael Espindola 935125126c Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola aeff8a9c05 Make some DataLayout pointers const.
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
2014-02-24 23:12:18 +00:00
Rafael Espindola 612886fc8c Rename a few more DataLayout variables.
llvm-svn: 201833
2014-02-21 01:53:35 +00:00
Rafael Espindola a3ad4e693c move getNameWithPrefix and getSymbol to TargetMachine.
TargetLoweringBase is implemented in CodeGen, so before this patch we had
a dependency fom Target to CodeGen. This would show up as a link failure of
llvm-stress when building with -DBUILD_SHARED_LIBS=ON.

This fixes pr18900.

llvm-svn: 201711
2014-02-19 20:30:41 +00:00
Rafael Espindola daeafb4c2a Add back r201608, r201622, r201624 and r201625
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.

They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.

llvm-svn: 201700
2014-02-19 17:23:20 +00:00
Daniel Jasper 7e198ad862 Revert r201622 and r201608.
This causes the LLVMgold plugin to segfault. More information on the
replies to r201608.

llvm-svn: 201669
2014-02-19 12:26:01 +00:00
Rafael Espindola 09dcc6a536 Fix PR18743.
The IR
@foo = private constant i32 42

is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.

One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.

What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.

One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).

llvm-svn: 201608
2014-02-18 22:24:57 +00:00
Rafael Espindola ea09c595a6 Rename a DebugLoc variable to DbgLoc and a DataLayout to DL.
This is quiet a bit less confusing now that TargetData was renamed DataLayout.

llvm-svn: 201606
2014-02-18 22:05:46 +00:00
Daniel Sanders 753e17629d Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.

Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
  (fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
  (should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
  (should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
  to be enabled regardless of default setting or -no-integrated-as.
  (should fix SystemZ buildbots)

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201333
2014-02-13 14:44:26 +00:00
Daniel Sanders abe212a3b8 Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
It introduced multiple test failures in the buildbots.

llvm-svn: 201241
2014-02-12 15:39:20 +00:00
Daniel Sanders a7d504cf58 Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201237
2014-02-12 14:44:54 +00:00
Rafael Espindola fa0f72837f Pass the Mangler by reference.
It is never null and it is not used in casts, so there is no reason to use a
pointer. This matches how we pass TM.

llvm-svn: 201025
2014-02-08 14:53:28 +00:00
Rafael Espindola 1070501586 Add LLVM_OVERRIDE to a few declarations.
llvm-svn: 201022
2014-02-08 06:07:27 +00:00
Rafael Espindola 4998280fdf Just returning false is the default.
llvm-svn: 200890
2014-02-06 00:03:15 +00:00
Matt Arsenault 25793a3f22 Add address space argument to allowsUnalignedMemoryAccess.
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
2014-02-05 23:15:53 +00:00
Rafael Espindola b4eec1daa1 Remove support for not using .loc directives.
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
2014-02-05 18:00:21 +00:00
Hal Finkel a7bbaf6de6 Replace PPC instruction-size code with MCInstrDesc getSize
As part of the cleanup done to enable the disassembler, the PPC instructions
now have a valid Size description field. This can now be used to replace some
custom logic in a few places to compute instruction sizes.

Patch by David Wiberg!

llvm-svn: 200623
2014-02-02 06:12:27 +00:00
David Woodhouse d2cca113df Delete MCSubtargetInfo data members from target MCCodeEmitter classes
The subtarget info is explicitly passed to the EncodeInstruction
method and we should use that subtarget info to influence any
encoding decisions.

llvm-svn: 200350
2014-01-28 23:13:25 +00:00
David Woodhouse 3fa98a65e9 Propagate MCSubtargetInfo through TableGen's getBinaryCodeForInstr()
llvm-svn: 200349
2014-01-28 23:13:18 +00:00
David Woodhouse 9784cef38d Explictly pass MCSubtargetInfo to MCCodeEmitter::EncodeInstruction()
llvm-svn: 200348
2014-01-28 23:13:07 +00:00
David Woodhouse e6c13e4abd Change MCStreamer EmitInstruction interface to take subtarget info
llvm-svn: 200345
2014-01-28 23:12:42 +00:00
Iain Sandoe 625b65a90c Provide a stub Target Streamer implementation for PPC MachO
At present, this handles .tc (error) and needs to be expanded to deal properly with .machine

llvm-svn: 200309
2014-01-28 11:03:17 +00:00
Hal Finkel 4e703bcecd Handle spilling the PPC GPRC_NOR0 register class
GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo
register). As a result, we also need to check for it in the spilling code.

llvm-svn: 200288
2014-01-28 05:32:58 +00:00
Eric Christopher 2037caf8b9 Revert r199871 and replace it with a simple check in the debug info
code to see if we're emitting a function into a non-default
text section. This is still a less-than-ideal solution, but more
contained than r199871 to determine whether or not we're emitting
code into an array of comdat sections.

llvm-svn: 200269
2014-01-28 00:49:26 +00:00
Rafael Espindola e41383f899 Pass a MCSubtargetInfo down to the TargetStreamer creation.
With this the target streamers will be able to know the target features that
are in use.

llvm-svn: 200135
2014-01-26 06:38:58 +00:00
Rafael Espindola 24ea09ef7d Construct the MCStreamer before constructing the MCTargetStreamer.
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
  reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.

llvm-svn: 200129
2014-01-26 06:06:37 +00:00
Rafael Espindola 6b9ee9bce3 Remove an easy use of EmitRawText from PPC.
This makes lib/Target/PowerPC EmitRawText free.

llvm-svn: 200065
2014-01-25 02:35:56 +00:00
Juergen Ributzka 3e752e7af9 Add final and owerride keywords to TargetTransformInfo's subclasses.
llvm-svn: 200021
2014-01-24 18:22:59 +00:00
Alp Toker cb40291100 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Eric Christopher 15abef6df9 Add a variable to track whether or not we've used a unique section,
e.g. linkonce, to TargetMachine and set it when we've done so
for ELF targets currently. This involved making TargetMachine
non-const in a TLOF use and propagating that change around - I'm
open to other ideas.

This will be used in a future commit to handle emitting debug
information with ranges.

llvm-svn: 199871
2014-01-23 06:47:25 +00:00
Rafael Espindola 28a85a84ac Fix pr18515.
My understanding (from reading just the llvm code) is that
* most ppc cpus have a "sync n" instruction and an msync alias that is "sync 0".
* "book e" cpus instead have a msync instruction and not the more
general "sync n"

This patch reflects that in the .td files, allowing a single codepath for
asm ond obj streamer and incidentelly fixes a crash when EmitRawText was
called on a obj streamer.

llvm-svn: 199832
2014-01-22 20:20:52 +00:00
Hal Finkel 3e4a34c8c3 Fix pointer info on PPC byval stores
For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters
from registers into the stack frame had MachinePointerInfo objects with
incorrect offsets. These offsets are relative to the object itself, not to the
stack frame base.

This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi.

llvm-svn: 199763
2014-01-21 20:15:58 +00:00
Rafael Espindola 4a1a360634 Make getTargetStreamer return a possibly null pointer.
This will allow it to be called from target independent parts of the main
streamer that don't know if there is a registered target streamer or not. This
in turn will allow targets to perform extra actions at specified points in the
interface: add extra flags for some labels, extra work during finalization, etc.

llvm-svn: 199174
2014-01-14 01:21:46 +00:00
Chandler Carruth 73523021d0 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth 5ad5f15cff [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Saleem Abdulrasool a6505ca4c2 correct target directive handling error handling
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics.  This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.

Conformance to the semantics improves diagnostics emitted for the invalid
directives.  X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.

llvm-svn: 199068
2014-01-13 01:15:39 +00:00
Chandler Carruth d48cdbf0c3 Put the functionality for printing a value to a raw_ostream as an
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.

This removes the 'Writer.h' header which contained only a single function
declaration.

llvm-svn: 198836
2014-01-09 02:29:41 +00:00
Rafael Espindola 894843cb4e Move the llvm mangler to lib/IR.
This makes it available to tools that don't link with target (like llvm-ar).

llvm-svn: 198708
2014-01-07 21:19:40 +00:00
Chandler Carruth 9aca918df9 Move the LLVM IR asm writer header files into the IR directory, as they
are part of the core IR library in order to support dumping and other
basic functionality.

Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.

Update all of the #includes to match.

All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.

llvm-svn: 198688
2014-01-07 12:34:26 +00:00
Chandler Carruth 8a8cd2bab9 Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Bill Wendling 13199b17f8 Remove unnecessary #includes.
llvm-svn: 198585
2014-01-06 06:00:00 +00:00
Bill Wendling 908bf814e7 Refactor function that checks that __builtin_returnaddress's argument is constant.
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.

llvm-svn: 198579
2014-01-06 00:43:20 +00:00
Bill Wendling df7dd28dc8 Emit an error message if the value passed to __builtin_returnaddress isn't a constant
__builtin_returnaddress requires that the value passed into is be a constant.
However, at -O0 even a constant expression may not be converted to a constant.
Emit an error message intead of crashing.

llvm-svn: 198531
2014-01-05 01:47:20 +00:00
Rafael Espindola 58873566b3 Make the llvm mangler depend only on DataLayout.
Before this patch any program that wanted to know the final symbol name of a
GlobalValue had to link with Target.

This patch implements a compromise solution where the mangler uses DataLayout.
This way, any tool that already links with Target (llc, clang) gets the exact
behavior as before and new IR files can be mangled without linking with Target.

With this patch the mangler is constructed with just a DataLayout and DataLayout
is extended to include the information the Mangler needs.

llvm-svn: 198438
2014-01-03 19:21:54 +00:00
Hal Finkel 860fa9052e [PPC] Fix comment to match function name
llvm-svn: 198362
2014-01-02 22:09:39 +00:00
Hal Finkel 1d429f2ee0 [PPC] Fix the scheduling of CR logicals on the P7
CR logicals (crand, crxor, etc.) on the P7 need to be in the first slot of each
dispatch group. The old itinerary entry was just wrong (but has not mattered
because we don't generate these instructions).

This will matter when, in an upcoming commit, we start generating these
instructions.

llvm-svn: 198359
2014-01-02 21:38:26 +00:00
Hal Finkel 77c8dc1da3 [PPC] Use the correct immediate operands on 64-bit instructions
Several of the 64-bit fixed-point instructions with immediate operands were
using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64)
operand definitions (u16imm instead of u16imm64, for example).

This error has had no effect so far, but would have caused type-checking
violations with an upcoming change.

llvm-svn: 198356
2014-01-02 21:26:59 +00:00
Roman Divacky bc1655b4b0 Use r2 when encoding tls on ppc32. Fixes PR18305.
llvm-svn: 197878
2013-12-22 10:45:37 +00:00
Roman Divacky 8854e76944 Add some comments.
llvm-svn: 197875
2013-12-22 09:48:38 +00:00
Roman Divacky 32143e2bda Implement initial-exec TLS for PPC32.
llvm-svn: 197824
2013-12-20 18:08:54 +00:00
Rafael Espindola 9ec26f395b Long doubles are required to be aligned to 128 bits and svr4 32 bits.
Clang was already getting this right.

llvm-svn: 197694
2013-12-19 16:23:59 +00:00
Hal Finkel 2345347eb9 Add a disassembler to the PowerPC backend
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.

Future work includes:

 - Generating more-informative mnemonics when possible (this may also be done
   in the printer).

 - Remove the dependence on positional "numbered" operand-to-variable mapping
   (for both encoding and decoding).

 - Internally using 64-bit instruction variants in 64-bit mode (if this turns
   out to matter).

llvm-svn: 197693
2013-12-19 16:13:01 +00:00
Rafael Espindola 988f35e999 Fix f64 and f128 for ppc-darwin.
This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a
structure are only 32 bit aligned.

The patch also drop -f128:64:128 from all ppc darwin, since f128 is
128 bit aligned.

llvm-svn: 197574
2013-12-18 15:06:25 +00:00
Rafael Espindola 382ee385fd One ppc32-darwin, a i64 inside a structure can have 32 bit alignment.
Thanks for Iain Sandoe for testing this with the original gcc.

Clang was already getting this right.

llvm-svn: 197572
2013-12-18 14:35:37 +00:00
Hal Finkel b4b99e545b Eliminate PPC instruction decoding ambiguities
The instruction definitions in the PPC backend have a number of variants
defined for the same instruction to represent differences between 64-bit and
32-bit semantics. In order to generate a disassembler for the PPC backend, we
need to mark all but one of these as CodeGen only.

No functionality change intended; this is prep work for PPC disassembly
support.

llvm-svn: 197535
2013-12-17 23:05:18 +00:00
Rafael Espindola 345d718d16 Fix the pointer size for the PS3 datalayout.
This will be tested from clang.

llvm-svn: 197501
2013-12-17 15:29:48 +00:00
Andrew Trick e339828b90 Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies.
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.

This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:

     %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
     %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
     %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>

Test case: cse-add-with-overflow.ll.

This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.

llvm-svn: 197465
2013-12-17 04:50:45 +00:00
Andrew Trick 9defbd882b whitespace
llvm-svn: 197464
2013-12-17 04:50:40 +00:00
Rafael Espindola bccb9d45ad The preferred alignment defaults to the abi alignment. Omit if it is the same.
llvm-svn: 197400
2013-12-16 18:01:51 +00:00
Rafael Espindola 8afbb28cea On DataLayout, omit the default of p:64:64:64.
llvm-svn: 197397
2013-12-16 17:15:29 +00:00
Hal Finkel 0a576d52fa Set has_asmparser in PowerPC/LLVMBuild.txt
PowerPC now has an asm parser (and has for many months now); indicate this in
PowerPC/LLVMBuild.txt.

llvm-svn: 197393
2013-12-16 15:48:09 +00:00
Iain Sandoe e0b4cb62f5 [Powerpc darwin] AsmParser Base implementation.
This is a base implementation of the powerpc-apple-darwin asm parser dialect.

* Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions.
* Enables parsing of {r,f,v}XX as register identifiers.
* Enables parsing of lo16() hi16() and ha16() as modifiers.

The changes to the test case are from David Fang (fangism).

llvm-svn: 197324
2013-12-14 13:34:02 +00:00
Rafael Espindola 1caa693a7b Assume defaults to produce smaller datalayout strings.
llvm-svn: 197249
2013-12-13 17:56:11 +00:00
Iain Sandoe 680385830f test commit.
Amend a comment.

llvm-svn: 197237
2013-12-13 15:46:48 +00:00
Gabor Greif 5fde43bf2e typo in comment
llvm-svn: 197136
2013-12-12 08:00:34 +00:00
Hal Finkel fa50630e43 Remove unused multiclass from PPCInstrInfo.td
llvm-svn: 197100
2013-12-12 00:23:29 +00:00
Hal Finkel ceb1f12d9a Improve instruction scheduling for the PPC POWER7
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.

significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):

speedups:

MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.55171% +/- 0.333168%

MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
	-17.5576% +/- 14.598%

MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	-29.5708% +/- 7.09058%

MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
	-34.9471% +/- 11.4391%

SingleSource/Benchmarks/BenchmarkGame/puzzle
	-25.1347% +/- 11.0104%

SingleSource/Benchmarks/Misc/flops-8
	-17.7297% +/- 9.79061%

SingleSource/Benchmarks/Shootout-C++/ary3
	-35.5018% +/- 23.9458%

SingleSource/Regression/C/uint64_to_float
	-56.3165% +/- 25.4234%

SingleSource/UnitTests/Vectorizer/gcc-loops
	-18.5309% +/- 6.8496%

regressions:

MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
	18.351% +/- 12.156%

SingleSource/Benchmarks/Shootout-C++/methcall
	27.3086% +/- 14.4733%

llvm-svn: 197099
2013-12-12 00:19:11 +00:00
Hal Finkel 94a6f380bb Fix the PPC subsumes-predicate check
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.

Fixes PR18003.

llvm-svn: 197089
2013-12-11 23:12:25 +00:00
NAKAMURA Takumi 8bc9bfaa5a Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Rafael Espindola 5b3585871b Move PPC's getDataLayoutString out of line and document it better.
llvm-svn: 196987
2013-12-11 00:09:06 +00:00
David Fang 1b01849f2d on darwin<10, fallback to .weak_definition (PPC,X86)
.weak_def_can_be_hidden was not yet supported by the system assembler

llvm-svn: 196970
2013-12-10 21:37:41 +00:00
NAKAMURA Takumi 396d4d3c7e Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
Rafael Espindola 117b20c492 Remove the isImplicitlyPrivate argument of getNameWithPrefix.
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.

This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).

llvm-svn: 196472
2013-12-05 05:53:12 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Hal Finkel 563cc05cae Remove PPCScoreboardHazardRecognizer
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.

llvm-svn: 196171
2013-12-02 23:52:46 +00:00
Rafael Espindola 5113d166f5 Refactor the setting of PrivateGlobalPrefix.
No functionality change.

llvm-svn: 196170
2013-12-02 23:39:26 +00:00
Rafael Espindola f4e6b29a03 Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile.
This allows it to be used in TargetLoweringObjectFileImpl.cpp.

llvm-svn: 196117
2013-12-02 16:25:47 +00:00
Rafael Espindola 957cf6f9e1 Remove dead code.
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.

Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.

llvm-svn: 196111
2013-12-02 15:36:37 +00:00
Rafael Espindola 50712a456d Change the default of AsmWriterClassName and isMCAsmWriter.
llvm-svn: 196065
2013-12-02 04:55:42 +00:00
Rafael Espindola 32635781bb Refactor for clarity and efficiency.
The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so
this should be a nop.

llvm-svn: 196059
2013-12-02 03:26:43 +00:00
Hal Finkel 42daeae9bd Add a scheduling model (with itinerary) for the PPC POWER7
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).

One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.

Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:

MultiSource/Benchmarks/BitBench/drop3/drop3
  with AA: N/A
  without AA: -28.7614% +/- 19.8356%
(significantly against AA)

MultiSource/Benchmarks/FreeBench/neural/neural
  with AA: -17.7406% +/- 11.2712%
  without AA: N/A
(significantly in favor of AA)

MultiSource/Benchmarks/SciMark2-C/scimark2
  with AA: -11.2079% +/- 1.80543%
  without AA: -11.3263% +/- 2.79651%

MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
  with AA: -41.8649% +/- 17.0053%
  without AA: -34.5256% +/- 23.7072%

MultiSource/Benchmarks/mafft/pairlocalalign
  with AA: 25.3016% +/- 17.8614%
  without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)

MultiSource/Benchmarks/sim/sim
  with AA: N/A
  without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)

SingleSource/Benchmarks/BenchmarkGame/Large/fasta
  with AA: 15.0664% +/- 6.70216%
  without AA: 12.7747% +/- 8.43043%

SingleSource/Benchmarks/BenchmarkGame/puzzle
  with AA: 82.2713% +/- 26.3567%
  without AA: 75.7525% +/- 41.1842%

SingleSource/Benchmarks/Misc/flops-2
  with AA: -37.1621% +/- 20.7964%
  without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)

These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.

llvm-svn: 195981
2013-11-30 20:55:12 +00:00
Hal Finkel 46402a4211 Split some PPC itinerary classes
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.

No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).

llvm-svn: 195980
2013-11-30 20:41:13 +00:00
Hal Finkel 1df3205e8c Adjust PPC A2 input operand latencies
On the PPC A2, instructions are only issued after their input operands are
ready. Model this by specifying that input operands are read at dispatch (0
cycles after issue). This changes all input operand latencies from 1 to 0.

Significant test-suite performance changes (these are 99.5% confidence
intervals on 6 runs for both before and after):

speedups:
MultiSource/Benchmarks/sim/sim
	-1.21915% +/- 0.175063%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
	-1.23946% +/- 1.05133%
SingleSource/Benchmarks/Misc/flops-2
	-1.24237% +/- 0.681362%
MultiSource/Applications/JM/lencod/lencod
	-1.33992% +/- 0.757498%
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
	-1.51802% +/- 1.21468%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
	-2.18818% +/- 1.28605%
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
	-2.21977% +/- 1.19499%
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
	-2.29822% +/- 0.671871%
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
	-2.40975% +/- 0.355931%
SingleSource/Benchmarks/Misc/fp-convert
	-2.41899% +/- 1.04751%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
	-2.50349% +/- 0.126765%
SingleSource/Benchmarks/Misc/flops-3
	-3.00214% +/- 0.700795%
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
	-3.56995% +/- 3.2929%
MultiSource/Applications/sgefa/sgefa
	-4.24908% +/- 2.00413%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
	-18.1294% +/- 3.96489%

regressions:
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	1.03249% +/- 0.178547%
MultiSource/Applications/hexxagon/hexxagon
	1.16597% +/- 0.285235%
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
	1.39576% +/- 1.07855%
SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
	1.71539% +/- 0.173182%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
	1.90013% +/- 0.866472%
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
	2.39854% +/- 1.05914%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
	2.4402% +/- 0.817904%
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
	5.87997% +/- 3.3172%
MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
	9.02643% +/- 5.79591%
MultiSource/Benchmarks/VersaBench/bmm/bmm
	10.3517% +/- 1.227%

Obviously, there are data points on both sides of this; but I think, overall,
this supports making the change.

llvm-svn: 195951
2013-11-29 07:04:59 +00:00
Hal Finkel 5a7162f36b Create a PPC440 SchedMachineModel
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.

llvm-svn: 195949
2013-11-29 06:32:17 +00:00
Hal Finkel 4035e8d86a Fixup PPC440 load/store operand latencies
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).

llvm-svn: 195948
2013-11-29 06:19:43 +00:00
Hal Finkel a10bd1d23a Adjust PPC440 operand latencies
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.

This could alter scheduling slightly, but I don't expect a large effect.

llvm-svn: 195947
2013-11-29 05:59:00 +00:00
Hal Finkel dd06369913 Don't model the fetch and decode units for the PPC440
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).

No functionality change intended.

llvm-svn: 195946
2013-11-29 05:58:38 +00:00
NAKAMURA Takumi 226e10edff [CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.

Explicit add_dependencies() have been pruned under lib/Target.

llvm-svn: 195929
2013-11-28 17:04:31 +00:00
NAKAMURA Takumi ce746c6c49 [CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen.
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.

llvm-svn: 195927
2013-11-28 17:04:04 +00:00
NAKAMURA Takumi 413518f1f8 [CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() sets them.
llvm-svn: 195921
2013-11-28 14:53:30 +00:00
Rafael Espindola 3e3a3f1f85 Use the mangler consistently instead of using getGlobalPrefix directly.
llvm-svn: 195911
2013-11-28 08:59:52 +00:00
Hal Finkel 92720ab1b2 Don't share functional units among the PPC itineraries
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name.  This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.

No functionality change intended.

llvm-svn: 195908
2013-11-28 06:05:59 +00:00
Hal Finkel 3e5a360ba3 Add IIC_ prefix to PPC instruction-class names
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.

Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.

No functionality change intended.

llvm-svn: 195890
2013-11-27 23:26:09 +00:00
Rafael Espindola c90584b6f6 Don't set GlobalPrefix to the default value.
llvm-svn: 195884
2013-11-27 21:57:54 +00:00
Hal Finkel 8081ae9134 Fix comment in PPCA2Model
llvm-svn: 195807
2013-11-27 03:12:56 +00:00
Hal Finkel 884bde3031 PPC popcnt[dw] do not have record forms
The instruction definitions incorrectly specified that popcntd and popcntw have
record forms; they do not. This mistake was causing invalid code generation.

llvm-svn: 195272
2013-11-20 20:54:55 +00:00
Hal Finkel 22498fa6e3 PPC: Optimize rldicl generation for masked shifts
Masking operations (where only some number of the low bits are being kept) are
selected to rldicl(x, 0, mb). If x is a logical right shift (which would become
rldicl(y, 64-n, n)), we might be able to fold the two instructions together:

  rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb

The right shift is really a left rotate followed by a mask, and if the explicit
mask is a more-restrictive sub-mask of the mask implied by the shift, only one
rldicl is needed.

llvm-svn: 195185
2013-11-20 01:10:15 +00:00
Juergen Ributzka d12ccbd343 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 195064
2013-11-19 00:57:56 +00:00
Alexey Samsonov 49109a279c Revert r194865 and r194874.
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
  Base *foo = new Child();
  delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.

llvm-svn: 194997
2013-11-18 09:31:53 +00:00
Juergen Ributzka dbedae89b9 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 194865
2013-11-15 22:34:48 +00:00
Bob Wilson 9f3e6b25ee Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow.  Results are different when:

    sext(a) + sext(b) != sext(a + b)

Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.

<rdar://problem/15292280>

Patch by Duncan Exon Smith!

llvm-svn: 194840
2013-11-15 19:09:27 +00:00
Hal Finkel c6a243987d Add PPC option for full register names in asm
On non-Darwin PPC systems, we currently strip off the register name prefix
prior to instruction printing. So instead of something like this:

  mr r3, r4

we print this:

  mr 3, 4

The first form is the default on Darwin, and is understood by binutils, but not
yet understood by our integrated assembler. Once our integrated-as understands
full register names as well, this temporary option will be replaced by tying
this functionality to the verbose-asm option. The numeric-only form is
compatible with legacy assemblers and tools, and is also gcc's default on most
PPC systems. On the other hand, it is harder to read, and there are some
analysis tools that expect full register names.

llvm-svn: 194384
2013-11-11 14:58:40 +00:00
Rui Ueyama 29d2910875 Use StringRef::startswith_lower. No functionality change.
llvm-svn: 193796
2013-10-31 19:59:55 +00:00
Rafael Espindola 79858aa3df Add a helper getSymbol to AsmPrinter.
llvm-svn: 193627
2013-10-29 17:07:16 +00:00
Eric Christopher 081efcc3ac Add support for the VSX target attribute. No functional change
as we don't actually use it to emit any code yet.

llvm-svn: 192837
2013-10-16 20:38:58 +00:00
Rafael Espindola 43c4e24fad Add a MCAsmInfoELF class and factor some code into it.
We had a MCAsmInfoCOFF, but no common class for all the ELF MCAsmInfos before.

llvm-svn: 192760
2013-10-16 01:34:32 +00:00
Rafael Espindola a17151ad5a Add a MCTargetStreamer interface.
This patch fixes an old FIXME by creating a MCTargetStreamer interface
and moving the target specific functions for ARM, Mips and PPC to it.

The ARM streamer is still declared in a common place because it is
used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are
completely hidden in the corresponding Target directories.

I will send an email to llvmdev with instructions on how to use this.

llvm-svn: 192181
2013-10-08 13:08:17 +00:00
Rafael Espindola e90fd9c5e0 Remove getEHExceptionRegister and getEHHandlerRegister.
They haven't been used for a long time. Patch by MathOnNapkins.

llvm-svn: 192099
2013-10-07 13:39:22 +00:00
Bill Schmidt cea1596205 [PowerPC] Fix PR17354: Generate nop after local calls for PIC code.
When generating code for shared libraries, even local calls may be
intercepted, so we need a nop after the call for the linker to fix up the
TOC.  Test case adapted from the one provided in PR17354.

llvm-svn: 191440
2013-09-26 17:09:28 +00:00
David Majnemer 7137420d94 PPC: Allow partial fills in writeNopData()
When asked to pad an irregular number of bytes, we should fill with
zeros.  This is consistent with the behavior specified in the AIX
Assembler Language Reference as well as other LLVM and binutils
assemblers.

N.B. There is a small deviation from binutils' PPC assembler:
when handling pads which are greater than 4 bytes but not mod 4,
binutils will not emit any NOP sequences at all and only use zeros.
This may or may not be a bug but there is no excellent rationale as to
why that behavior is important to emulate.  If that behavior is needed,
we can change writeNopData() to behave in the same way.

This fixes PR17352.

llvm-svn: 191426
2013-09-26 09:18:48 +00:00
David Majnemer 08249a31b2 PPC: Do not introduce ISD nodes for fctid and fctiw
llvm-svn: 191421
2013-09-26 05:22:11 +00:00
David Majnemer 6ad26d3364 PPC: Add support for fctid and fctiw
Encodings were checked against the Power ISA documents and double
checked against binutils.

This fixes PR17350.

llvm-svn: 191419
2013-09-26 04:11:24 +00:00
David Majnemer 0c58bc64a4 MC: Add support for treating $ as a reference to the PC
The binutils assembler supports a mode called DOLLAR_DOT which treats
the dollar sign token as a reference to the current program counter if
the dollar sign doesn't precede a constant or identifier.

This commit adds a new MCAsmInfo flag stating whether or not a given
target supports this interpretation of the dollar sign token; by
default, this flag is not enabled.

Further, enable this flag for PPC. The system assembler for AIX and
binutils both support using the dollar sign in this manner.

This fixes PR17353.

llvm-svn: 191368
2013-09-25 10:47:21 +00:00
David Majnemer 1ccd2f2aee MC: Remove vestigial PCSymbol field from AsmInfo
llvm-svn: 191362
2013-09-25 09:36:11 +00:00
Tim Northover 31d093c705 ISelDAG: spot chain cycles involving MachineNodes
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.

Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.

This should fix PR15840.

llvm-svn: 191165
2013-09-22 08:21:56 +00:00
Hal Finkel 25415c2b83 Correct the pre-increment load latencies in the PPC A2 itinerary
Pre-increment loads are microcoded on the A2, and the address increment occurs
only after the load completes. As a result, the latency of the GPR address
update is an additional 2 cycles on top of the load latency.

llvm-svn: 191156
2013-09-22 00:08:14 +00:00
Bill Schmidt bdae03f227 [PowerPC] Add a FIXME.
Documenting a design choice to generate only medium model sequences for TLS
addresses at this time.  Small and large code models could be supported if
necessary.

llvm-svn: 190883
2013-09-17 20:22:05 +00:00
Bill Schmidt bb381d7063 [PowerPC] Fix problems with large code model (PR17169).
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing.  This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this.  Two
test cases are also modified to reflect this requirement.

Fast-isel was not creating correct code for loading floating-point constants
using large code model.  This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model.  One test case is modified to reflect this requirement.

llvm-svn: 190882
2013-09-17 20:03:25 +00:00
Bill Schmidt c763c22469 [PowerPC] Fix PR17155 - Ignore COPY_TO_REGCLASS during emit.
Fast-isel generates a COPY_TO_REGCLASS for widening f32 to f64, which
is a nop on PPC64.  This is needed to keep the register class system
happy, but on the fast-isel path it is not removed before emit as it
is for DAG select.  Ignore this op when emitting instructions.

llvm-svn: 190795
2013-09-16 17:25:12 +00:00
Hal Finkel 40c34781b5 PPC: Don't restrict lvsl generation to after type legalization
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190771
2013-09-15 22:09:58 +00:00
Hal Finkel 31025a6325 Revert r190764: PPC: Don't restrict lvsl generation to after type legalization
This is causing test-suite failures.

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190765
2013-09-15 15:41:11 +00:00
Hal Finkel 2945d4e916 PPC: Don't restrict lvsl generation to after type legalization
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190764
2013-09-15 15:20:54 +00:00
Hal Finkel c3cfbf8677 Add missing break statement in PPCISelLowering
As it turns out, not a problem in practice, but it should be there.

llvm-svn: 190720
2013-09-13 20:09:02 +00:00
Chandler Carruth 51428e363f Remove an unused variable, fixing -Werror build with latest Clang.
llvm-svn: 190640
2013-09-12 23:30:48 +00:00
Hal Finkel 262a224712 Fix PPC ABI for ByVal structs with vector members
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).

This should resolve PR16641.

llvm-svn: 190636
2013-09-12 23:20:06 +00:00
Hal Finkel 1e2e3ea584 Make the PPC fast-math sqrt expansion safe at 0
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.

llvm-svn: 190624
2013-09-12 19:04:12 +00:00
Roman Divacky 62cb63543b Implement asm support for a few PowerPC bookIII that are needed for assembling
FreeBSD kernel.

llvm-svn: 190618
2013-09-12 17:50:54 +00:00
Hal Finkel 0096dbd50d Mark PPC MFTB and DST (and friends) as deprecated
Use the new instruction deprecation feature to mark mftb (now replaced with
mfspr) and dst (along with the other Altivec cache control instructions) as
deprecated when targeting cores supporting at least ISA v2.03.

llvm-svn: 190605
2013-09-12 14:40:06 +00:00
Joey Gouly 0e76fa7df5 Add an instruction deprecation feature to TableGen.
The 'Deprecated' class allows you to specify a SubtargetFeature that the
instruction is deprecated on.

The 'ComplexDeprecationPredicate' class allows you to define a custom
predicate that is called to check for deprecation.
For example:
  ComplexDeprecationPredicate<"MCR">

would mean you would have to define the following function:
  bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI,
                             std::string &Info)

Which returns 'false' for not deprecated, and 'true' for deprecated
and store the warning message in 'Info'.

The MCTargetAsmParser constructor was chaned to take an extra argument of
the MCInstrInfo class, so out-of-tree targets will need to be changed.

llvm-svn: 190598
2013-09-12 10:28:05 +00:00
Hal Finkel 7fe6a5390f PPC: Enable aggressive anti-dependency breaking
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.

Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.

llvm-svn: 190587
2013-09-12 05:24:49 +00:00
Hal Finkel f574c27769 Greatly simplify the PPC A2 scheduling itinerary
As Andy pointed out to me a long time ago, there are no structural hazards in
the later pipeline stages of the A2, and so modeling them is useless. Also,
modeling the top pre-dispatch stages is deceiving because, when multiple
hardware threads are active, those resources are shared among the threads. The
bypass definitions were mostly wrong, and so those have been removed. The
resulting itinerary is much simpler, and more accurate.

llvm-svn: 190562
2013-09-11 23:25:21 +00:00
Hal Finkel 21442b24fb Enable MI scheduling (and CodeGen AA) by default for embedded PPC cores
For embedded PPC cores (especially the A2 core), using the MI scheduler with AA
is far superior to the other scheduling options.

llvm-svn: 190558
2013-09-11 23:05:25 +00:00
Hal Finkel 71780ec4fd Implement TTI getUnrollingPreferences for PowerPC
The PowerPC A2 core greatly benefits from aggressive concatenation unrolling;
use the new getUnrollingPreferences to enable this by default when targeting
the PPC A2 core.

llvm-svn: 190549
2013-09-11 21:20:40 +00:00
Bill Wendling 58e2d3d856 Generate compact unwind encoding from CFI directives.
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.

Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.

<rdar://problem/13623355>

llvm-svn: 190290
2013-09-09 02:37:14 +00:00
Charles Davis 8bdfafd505 Move everything depending on Object/MachOFormat.h over to Support/MachO.h.
llvm-svn: 189728
2013-09-01 04:28:48 +00:00
Bill Schmidt eb8d6f7da0 [PowerPC] Fast-isel cleanup patch.
Here are a few miscellaneous things to tidy up the PPC64 fast-isel
implementation.  I corrected a couple of commentary lapses, and added
documentation of future opportunities.  I also implemented
TargetMaterializeAlloca, which I somehow forgot when I split up the
original huge patch.

Finally, I decided to delete SelectCmp.  I hadn't previously hooked it
in to TargetSelectInstruction(), and when I did I realized it wasn't
serving any useful purpose.  This is only useful for compares that
don't feed a branch in the same block, and to handle that we would
have to have logic to interpret i1 as a condition register.  This
could probably be done, but would require Unseemly Hackery, and
honestly does not seem worth the hassle.

This ends the current patch series.

llvm-svn: 189715
2013-08-31 02:33:40 +00:00
Bill Schmidt 9d9510d806 [PowerPC] Add integer truncation support to fast-isel.
This is the last substantive patch I'm planning for fast-isel in the
near future, adding fast selection of integer truncates.  There are
certainly more things that can be improved (many of which are called
out in FIXMEs), but for now we are catching most of the important
cases.

I'll document some of the remaining work in a cleanup patch shortly.

llvm-svn: 189706
2013-08-30 23:31:33 +00:00
Bill Schmidt 0954ea1b5e Correct partially defined variable
llvm-svn: 189705
2013-08-30 23:25:30 +00:00
Bill Schmidt 8470b0f96c [PowerPC] Call support for fast-isel.
This patch adds fast-isel support for calls (but not intrinsic calls
or varargs calls).  It also removes a badly-formed assert.  There are
some new tests just for calls, and also for folding loads into
arguments on calls to avoid extra extends.

llvm-svn: 189701
2013-08-30 22:18:55 +00:00
Bill Schmidt 8d86fe7d6f [PowerPC] Add handling for conversions to fast-isel.
Yet another chunk of fast-isel code.  This one handles various
conversions involving floating-point.  (It also includes some
miscellaneous handling throughout the back end for LWA_32 and LWAX_32
that should have been part of the load-store patch.)

llvm-svn: 189677
2013-08-30 15:18:11 +00:00
Bill Schmidt 057b04f662 [PowerPC] Handle selection of compare instructions in fast-isel.
Mostly trivial patch adding support for compares.  The meat of the
work was added with the branch support.

llvm-svn: 189639
2013-08-30 03:16:48 +00:00
Bill Schmidt 72e3d55a76 Remove bogus debug statement. Sheesh.
llvm-svn: 189638
2013-08-30 03:07:11 +00:00
Bill Schmidt ccecf26157 [PowerPC] Add loads, stores, and related things to fast-isel.
This is the next big chunk of fast-isel code.  The primary purpose is
to implement selection of loads and stores, but there is a lot of
drag-along to support this.  The common code to analyze addresses for
both loads and stores is substantial.  It's also necessary to add the
materialization code for global values.

Related to load-store processing is the code to fold loads into
integer extends, since otherwise we generate lots of redundant
instructions.  We also need to add some overrides to some FastEmit
routines to ensure we don't assign GPR 0 to a virtual register when
this would change the meaning of an instruction.

I added handling selection of a few binary arithmetic instructions, to
enable committing some test cases I wrote a while back.

Finally, ap couple of miscellaneous changes:
 * I cleaned up some poor style from a previous patch in
   PPCISelLowering.cpp, pointed out by David Blaikie.
 * I enlarged the Addr.Offset field to avoid sign problems with 32-bit
   offsets. 

llvm-svn: 189636
2013-08-30 02:29:45 +00:00
Alexey Samsonov a3a037df63 Fix use of uninitialized value added in r189400 (found by MemorySanitizer)
llvm-svn: 189456
2013-08-28 08:30:47 +00:00
Joerg Sonnenberger b822af4721 Given target assembler parsers a chance to handle variant expressions
first. Use this to turn the PPC modifiers into PPC specific expressions,
allowing them to work on constants.

llvm-svn: 189400
2013-08-27 20:23:19 +00:00
Charles Davis 1827bd8a6c Revert "Fix the build broken by r189315." and "Move everything depending on Object/MachOFormat.h over to Support/MachO.h."
This reverts commits r189319 and r189315. r189315 broke some tests on what I
believe are big-endian platforms.

llvm-svn: 189321
2013-08-27 05:38:30 +00:00
Charles Davis 0c6f71b40d Move everything depending on Object/MachOFormat.h over to Support/MachO.h.
llvm-svn: 189315
2013-08-27 05:00:43 +00:00
Bill Schmidt 8c3976eca1 Dummy code to silence warning from 4189266
llvm-svn: 189272
2013-08-26 20:11:46 +00:00
Bill Schmidt d89f678cfd [PowerPC] More fast-isel chunks (returns and integer extends)
Incremental improvement to fast-isel for PPC64.  This allows us to
select on ret, sext, and zext.  Filling in sext/zext improves some of
the existing logic in handling compare-immediates that needed extends.

A simplified return convention for fast-isel is also added to the
PPC64 calling conventions.  All call/return processing for DAG
selection is handled with custom code, so there isn't an existing CC
to rely on here.  The include of PPCGenCallingConv.inc causes compiler
warnings due to the 32-bit calling conventions that are not used, so
the dummy function "usePPC32CCs()" is added here to silence those.

Test cases for the return and extend logic are added.

llvm-svn: 189266
2013-08-26 19:42:51 +00:00
Bill Schmidt 0300813d6a [PowerPC] Add fast-isel branch and compare selection.
First chunk of actual fast-isel selection code.  This handles direct
and indirect branches, as well as feeding compares for direct
branches.  PPCFastISel::PPCEmitIntExt() is just roughed in and will be
expanded in a future patch.  This also corrects a problem with
selection for constant pool entries in JIT mode or with small code
model.

llvm-svn: 189202
2013-08-25 22:33:42 +00:00
Bill Schmidt f381afc906 [PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes.
(Patch committed on behalf of Mark Minich, whose log entry follows.)

This is a continuation of the refactorings performed in svn rev 188573
(see that rev's comments for more detail).

This is my stage 2 refactoring: I combined the emitPrologue() &
emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a
lot of the code since in essence the PPC32 & PPC64 code generation
logic is the same, only the instruction forms are different (in most
cases). This simplification is necessary because my functional changes
(yet to come) add significant complexity, and without the
simplification of my stage 2 refactoring, the overall complexity of
both emitPrologue() & emitEpilogue() would have become almost
intractable for most mortal programmers (like me).

This submission was intended to be a pure refactoring (no functional
changes whatsoever). However, in the process of combining the PPC32 &
PPC64 flows, I spotted a difference that I believe is a bug (see svn
rev 186478 line 863, or svn rev 188573 line 888): This line appears to
be restoring the BP with the original FP content, not the original BP
content. When I merged the 32-bit and 64-bit code, I used the
corresponding code from the 64-bit flow, which I believe uses the
correct offset (BPOffset) for this operation.

llvm-svn: 188741
2013-08-20 03:12:23 +00:00
Hal Finkel 0c5c01aa4a Add a llvm.copysign intrinsic
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728
2013-08-19 23:35:46 +00:00
Hal Finkel 1cf48ab811 Don't form PPC CTR-based loops around a copysignl call
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).

llvm-svn: 188727
2013-08-19 23:35:24 +00:00
Hal Finkel dbc78e1f73 Add the PPC fcpsgn instruction
Modern PPC cores support a floating-point copysign instruction, and we can use
this to lower the FCOPYSIGN node (which is created from calls to the libm
copysign function). A couple of extra patterns are necessary because the
operand types of FCOPYSIGN need not agree.

llvm-svn: 188653
2013-08-19 05:01:02 +00:00
Bill Schmidt 8893a3d159 [PowerPC] Preparatory refactoring for making prologue and epilogue
safe on PPC32 SVR4 ABI

[Patch and following text by Mark Minich; committing on his behalf.]

There are FIXME's in PowerPC/PPCFrameLowering.cpp, method
PPCFrameLowering::emitPrologue() related to "negative offsets of R1"
on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4
(and any ABI without a Red Zone), no spills may be made until after
the stackframe is claimed, which also includes the LR spill which is
at a positive offset. The same problem exists in emitEpilogue(),
though there's no FIXME for it. I intend to fix this issue, making
LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit
platforms (including in particular, OS-free embedded systems & kernel
code, where interrupts may share the same stack as user code).

In preparation for making these changes, to make the diffs for the
functional changes less cluttered, I am providing the non-functional
refactorings in two stages:

Stage 1 does some minor fluffy refactorings to pull multiple method
calls up into a single bool, creating named bools for repeated uses of
obscure logic, moving some code up earlier because either stage 2 or
my final version will require it earlier, and rewording/adding some
comments. My stage 1 changes can be characterized as primarily fluffy
cleanup, the purpose of which may be unclear until the stage 2 or
final changes are made.

My stage 2 refactorings combine the separate PPC32 & PPC64 logic,
which is currently performed by largely duplicate code, into a single
flow, with the differences handled by a group of constants initialized
early in the methods.

This submission is for my stage 1 changes. There should be no
functional changes whatsoever; this is a pure refactoring.

llvm-svn: 188573
2013-08-16 20:05:04 +00:00
Craig Topper 5671010cbb Replace getValueType().getSimpleVT() with getSimpleValueType(). Also remove one weird cast from MVT->EVT just to call getSimpleVT().
llvm-svn: 188441
2013-08-15 02:33:50 +00:00
Hal Finkel b3ca00d2a3 Actually fix PPC64 64-bit GPR inline asm constraint matching
This is a follow-up to r187693, correcting that code to request the correct
register class. The previous version, with the wrong register class, was not
really correcting the constraints, but rather was removing them. Coincidentally,
this fixed the failing test case in r187693, but obviously created other
problems.

llvm-svn: 188407
2013-08-14 20:05:04 +00:00