Commit Graph

4740 Commits

Author SHA1 Message Date
Daniel Sanders 335487ad87 Replace string GNU Triples with llvm::Triple in TargetMachine::getTargetTriple(). NFC.
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10381

llvm-svn: 239815
2015-06-16 13:15:50 +00:00
Matthias Braun 39a2afc941 Rename TargetSubtargetInfo::enablePostMachineScheduler() to enablePostRAScheduler()
r213101 changed the behaviour of this method to not only affect the
PostMachineScheduler scheduler but also the PostRAScheduler scheduler,
renaming should make this fact clear. Also document that the preferred
way is to specify this in the scheduling model instead of overriding
this method.

Differential Revision: http://reviews.llvm.org/D10427

llvm-svn: 239659
2015-06-13 03:42:16 +00:00
Matthias Braun 88e213159a MachineLICM: Use TargetSchedModel instead of just itineraries
This will use Itinieraries if available, but will also work if just a
MCSchedModel is available.

Differential Revision: http://reviews.llvm.org/D10428

llvm-svn: 239658
2015-06-13 03:42:11 +00:00
Daniel Sanders 3e5de88dac Replace string GNU Triples with llvm::Triple in TargetMachine. NFC.
Summary:
For the moment, TargetMachine::getTargetTriple() still returns a StringRef.

This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: ted, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10362

llvm-svn: 239554
2015-06-11 19:41:26 +00:00
Ahmed Bougacha c88bf54366 [CodeGen] ArrayRef'ize cond/pred in various TII APIs. NFC.
llvm-svn: 239553
2015-06-11 19:30:37 +00:00
Nemanja Ivanovic ea1db8a697 LLVM support for vector quad bit permute and gather instructions through builtins
This patch corresponds to review:
http://reviews.llvm.org/D10096

This is the back end portion of the patch related to D10095.
The patch adds the instructions and back end intrinsics for:
vbpermq
vgbbd

llvm-svn: 239505
2015-06-11 06:21:25 +00:00
Daniel Sanders a73f1fdb19 Replace string GNU Triples with llvm::Triple in MCSubtargetInfo and create*MCSubtargetInfo(). NFC.
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10311

llvm-svn: 239467
2015-06-10 12:11:26 +00:00
Daniel Sanders 418caf5002 Replace string GNU Triples with llvm::Triple in MCAsmBackend subclasses and create*AsmBackend(). NFC.
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: echristo, rafael

Reviewed By: rafael

Subscribers: rafael, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10243

llvm-svn: 239464
2015-06-10 10:35:34 +00:00
Matt Arsenault 8b643559d4 MC: Add target hook to control symbol quoting
llvm-svn: 239370
2015-06-09 00:31:39 +00:00
Jim Grosbach 56ed0bb111 MC: Clean up the naming for MCMachObjectWriter. NFC.
s/ExecutePostLayoutBinding/executePostLayoutBinding/
s/ComputeSymbolTable/computeSymbolTable/
s/BindIndirectSymbols/bindIndirectSymbols/
s/RecordTLVPRelocation/recordTLVPRelocation/
s/RecordScatteredRelocation/recordScatteredRelocation/
s/WriteLinkerOptionsLoadCommand/writeLinkerOptionsLoadCommand/
s/WriteLinkeditLoadCommand/writeLinkeditLoadCommand/
s/WriteNlist/writeNlist/
s/WriteDysymtabLoadCommand/writeDysymtabLoadCommand/
s/WriteSymtabLoadCommand/writeSymtabLoadCommand/
s/WriteSection/writeSection/
s/WriteSegmentLoadCommand/writeSegmentLoadCommand/
s/WriteHeader/writeHeader/

llvm-svn: 239119
2015-06-04 23:25:54 +00:00
Jim Grosbach 36e60e9127 MC: Clean up naming in MCObjectWriter. NFC.
s/WriteObject/writeObject/
s/RecordRelocation/recordRelocation/
s/IsSymbolRefDifferenceFullyResolved/isSymbolRefDifferenceFullyResolved/
s/Write8/write8/
s/WriteLE16/writeLE16/
s/WriteLE32/writeLE32/
s/WriteLE64/writeLE64/
s/WriteBE16/writeBE16/
s/WriteBE32/writeBE32/
s/WriteBE64/writeBE64/
s/Write16/write16/
s/Write32/write32/
s/Write64/write64/
s/WriteZeroes/writeZeroes/
s/WriteBytes/writeBytes/

llvm-svn: 239108
2015-06-04 22:24:41 +00:00
Jim Grosbach 7c76b4cc6e MC: Remove obsolete MachO UseAggressiveSymbolFolding.
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.

rdar://21201804

llvm-svn: 239084
2015-06-04 20:27:42 +00:00
Benjamin Kramer 50e2a29385 Replace custom fixed endian to raw_ostream emission with EndianStream.
Less code, clearer and more efficient. No functionality change intended.

llvm-svn: 239040
2015-06-04 15:03:02 +00:00
Daniel Sanders 7813ae879e Replace string GNU Triples with llvm::Triple in MCAsmInfo subclasses and create*AsmInfo(). NFC.
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: ted, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10236

llvm-svn: 239036
2015-06-04 13:12:25 +00:00
Rafael Espindola 8c006ee385 Bring back r239006 with a fix.
The fix is just that getOther had not been updated for packing the st_other
values in fewer bits and could return spurious values:

-  unsigned Other = (getFlags() & (0x3f << ELF_STO_Shift)) >> ELF_STO_Shift;
+  unsigned Other = (getFlags() & (0x7 << ELF_STO_Shift)) >> ELF_STO_Shift;

Original message:

Pack the MCSymbolELF bit fields into MCSymbol's Flags.

This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.

While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.

llvm-svn: 239012
2015-06-04 05:59:23 +00:00
Rafael Espindola a86ecee52b Revert "Pack the MCSymbolELF bit fields into MCSymbol's Flags."
This reverts commit r239006.

I am debugging the powerpc failures.

llvm-svn: 239010
2015-06-04 05:00:12 +00:00
Rafael Espindola d31203ae21 Pack the MCSymbolELF bit fields into MCSymbol's Flags.
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.

While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.

llvm-svn: 239006
2015-06-04 02:32:20 +00:00
Rafael Espindola 95fb9b93ed Merge MCELF.h into MCSymbolELF.h.
Now that we have a dedicated type for ELF symbol, these helper functions can
become member function of MCSymbolELF.

llvm-svn: 238864
2015-06-02 20:38:46 +00:00
Matt Arsenault bd7d80a4a6 Add address space argument to isLegalAddressingMode
This is important because of different addressing modes
depending on the address space for GPU targets.

This only adds the argument, and does not update
any of the uses to provide the correct address space.

llvm-svn: 238723
2015-06-01 05:31:59 +00:00
Jim Grosbach 13760bd152 MC: Clean up MCExpr naming. NFC.
llvm-svn: 238634
2015-05-30 01:25:56 +00:00
Rafael Espindola 4d37b2a259 Remove getData.
This completes the mechanical part of merging MCSymbol and MCSymbolData.

llvm-svn: 238617
2015-05-29 21:45:01 +00:00
Rafael Espindola beb6060a51 Remove the MCSymbolData typedef.
The getData member function is next.

llvm-svn: 238611
2015-05-29 20:41:47 +00:00
Rafael Espindola e3b2acf274 Pass MCSymbols to the helper functions in MCELF.h.
llvm-svn: 238596
2015-05-29 18:47:23 +00:00
Rafael Espindola ece40ca43d Pass a MCSymbol to needsRelocateWithSymbol.
llvm-svn: 238589
2015-05-29 18:26:09 +00:00
Nemanja Ivanovic 376e17364f Add support for VSX FMA single-precision instructions to the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D9941

It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.

llvm-svn: 238578
2015-05-29 17:13:25 +00:00
Rafael Espindola 3a5d3cce80 Remove a trivial forwarding function. NFC.
llvm-svn: 238506
2015-05-28 21:36:02 +00:00
Rafael Espindola f4a1365387 Use operator<< instead of print in a few more places.
llvm-svn: 238315
2015-05-27 13:05:42 +00:00
Michael Kuperstein db0712f986 Use std::bitset for SubtargetFeatures.
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.

The first several times this was committed (e.g. r229831, r233055), it caused several buildbot failures.
Apparently the reason for most failures was both clang and gcc's inability to deal with large numbers (> 10K) of bitset constructor calls in tablegen-generated initializers of instruction info tables. 
This should now be fixed.

llvm-svn: 238192
2015-05-26 10:47:10 +00:00
Rafael Espindola 61e724a8c5 Stop using MCSectionData in MCMachObjectWriter.h.
llvm-svn: 238165
2015-05-26 01:15:30 +00:00
Rafael Espindola 079027ea90 Stop using MCSectionData in MCExpr.h.
llvm-svn: 238163
2015-05-26 00:52:18 +00:00
Rafael Espindola 7549f87672 Return a MCSection from MCFragment::getParent().
Another step in merging MCSectionData and MCSection.

llvm-svn: 238162
2015-05-26 00:36:57 +00:00
Kit Barton 6646033e6e This patch adds support for the vector quadword add/sub instructions introduced
in POWER8:

vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).

http://reviews.llvm.org/D9081

llvm-svn: 238144
2015-05-25 15:49:26 +00:00
Rafael Espindola 6e6820a7e6 Stop forwarding getOrdinal and setOrdinal.
llvm-svn: 238139
2015-05-25 14:12:48 +00:00
Hal Finkel 5f2a1379ef [PowerPC] Fix fast-isel when compare is split from branch
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).

Fixes PR23640.

llvm-svn: 238097
2015-05-23 12:18:10 +00:00
Bill Schmidt e26236eed9 [PPC64] Add support for clrbhrb, mfbhrbe, rfebb.
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching.  These will
not be used by typical applications, so built-in support is not
required.  They will only be available via inline assembly.

Assembly/disassembly tests are included in the patch.

llvm-svn: 238032
2015-05-22 16:44:10 +00:00
Hal Finkel 82e1fc5fc7 [PPC] Correct iterator bug in PPCTLSDynamicCall
Unfortunately, I can't reduce a small test case for this (although compiling
mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but the
problem is fairly clear (at least once you know you're looking for one). If the
TLS instruction being replaced was at the end of the block, we'd increment the
iterator past it (so it would then point to MBB.end()), and then we'd increment
it again as part of the for statement, thus overrunning the end of the list.
Don't do that.

llvm-svn: 237974
2015-05-21 23:45:49 +00:00
Bill Schmidt e13ac91c5d [PPC64] Handle vpkudum mask pattern correctly when vpkudum isn't available
My recent patch to add support for ISA 2.07 vector pack/unpack
instructions didn't properly check for availability of the vpkudum
instruction when recognizing it as a special vector shuffle case.
This causes us to leave the vector shuffle in place (rather than
converting it to a vector permute) so that it can be recognized later
as a vpkudum, but that pattern is invalid for processors prior to
POWER8.  Thus LLVM crashes with an "unable to select" message.  We
observed this since one of our buildbots is configured to generate
code for a POWER7.

This patch fixes the problem by checking for availability of the
vpkudum instruction during custom lowering of vector shuffles.

I've added a test case variant for the vpkudum pattern when the
instruction isn't available.

llvm-svn: 237952
2015-05-21 20:48:49 +00:00
Hal Finkel 3b3c9c3e44 [PPC/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the PPC/A2
On X86 (and similar OOO cores) unrolling is very limited, and even if the
runtime unrolling is otherwise profitable, the expense of a division to compute
the trip count could greatly outweigh the benefits. On the A2, we unroll a lot,
and the benefits of unrolling are more significant (seeing a 5x or 6x speedup
is not uncommon), so we're more able to tolerate the expense, on average, of a
division to compute the trip count.

llvm-svn: 237947
2015-05-21 20:30:23 +00:00
Nemanja Ivanovic f02def6cbc Add support for VSX scalar single-precision arithmetic in the PPC target
http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp

llvm-svn: 237937
2015-05-21 19:32:49 +00:00
Rafael Espindola 0709a7bd1a Move alignment from MCSectionData to MCSection.
This starts merging MCSection and MCSectionData.

There are a few issues with the current split between MCSection and
MCSectionData.

* It optimizes the the not as important case. We want the production
of .o files to be really fast, but the split puts the information used
for .o emission in a separate data structure.

* The ELF/COFF/MachO hierarchy is not represented in MCSectionData,
leading to some ad-hoc ways to represent the various flags.

* It makes it harder to remember where each item is.

The attached patch starts merging the two by moving the alignment from
MCSectionData to MCSection.

Most of the patch is actually just dropping 'const', since
MCSectionData is mutable, but MCSection was not.

llvm-svn: 237936
2015-05-21 19:20:38 +00:00
Duncan P. N. Exon Smith 08b8726de3 MC: Use MCSymbol in MachObjectWriter, NFC
Replace uses of `MCSymbolData` with `MCSymbol` where both are needed, so
we can remove the backpointer.

llvm-svn: 237799
2015-05-20 15:16:14 +00:00
Duncan P. N. Exon Smith 99d8a8e8ac MC: Take MCSymbol in MachObjectWriter::getSymbolAddress(), NFC
Pass through an `MCSymbol` instead of an `MCSymbolData` so we can get
rid of the back pointer.

llvm-svn: 237750
2015-05-20 00:02:39 +00:00
Duncan P. N. Exon Smith 2a40483418 MC: Use MCSymbol in MCAsmLayout::getSymbolOffset(), NFC
Continue to canonicalize on MCSymbol instead of MCSymbolData when both
are needed.

llvm-svn: 237749
2015-05-19 23:53:20 +00:00
David Blaikie ff6409d096 Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced init only
llvm-svn: 237624
2015-05-18 22:13:54 +00:00
Jim Grosbach 6f482000e9 MC: Clean up method names in MCContext.
The naming was a mish-mash of old and new style. Update to be consistent
with the new. NFC.

llvm-svn: 237594
2015-05-18 18:43:14 +00:00
Hal Finkel 8340de142c [PowerPC] Add extra r2 read deps on @toc@l relocations
If some commits are happy, and some commits are sad, this is a sad commit. It
is sad because it restricts instruction scheduling to work around a binutils
linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was
updated not to produce code triggering this bug, and now we'll do the same...

When resolving an address using the ELF ABI TOC pointer, two relocations are
generally required: one for the high part and one for the low part. Only
the high part generally explicitly depends on r2 (the TOC pointer). And, so,
we might produce code like this:

.Ltmp526:
        addis 3, 2, .LC12@toc@ha
.Ltmp1628:
        std 2, 40(1)
        ld 5, 0(27)
        ld 2, 8(27)
        ld 11, 16(27)
        ld 3, .LC12@toc@l(3)
        rldicl 4, 4, 0, 32
        mtctr 5
        bctrl
        ld 2, 40(1)

And there is nothing wrong with this code, as such, but there is a linker bug
in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will
misoptimize this code sequence to this:
        nop
        std     r2,40(r1)
        ld      r5,0(r27)
        ld      r2,8(r27)
        ld      r11,16(r27)
        ld      r3,-32472(r2)
        clrldi  r4,r4,32
        mtctr   r5
        bctrl
        ld      r2,40(r1)
because the linker does not know (and does not check) that the value in r2
changed in between the instruction using the .LC12@toc@ha (TOC-relative)
relocation and the instruction using the .LC12@toc@l(3) relocation.
Because it finds these instructions using the relocations (and not by
scanning the instructions), it has been asserted that there is no good way
to detect the change of r2 in between. As a result, this bug may never be
fixed (i.e. it may become part of the definition of the ABI). GCC was
updated to add extra dependencies on r2 to instructions using the @toc@l
relocations to avoid this problem, and we'll do the same here.

This is done as a separate pass because:
 1. These extra r2 dependencies are not really properties of the
    instructions, but rather due to a linker bug, and maybe one day we'll be
    able to get rid of them when targeting linkers without this bug (and,
    thus, keeping the logic centralized here will make that
    straightforward).
 2. There are ISel-level peephole optimizations that propagate the @toc@l
    relocations to some user instructions, and so the exta dependencies do
    not apply only to a fixed set of instructions (without undesirable
    definition replication).

The test case was reduced with the help of bugpoint, with minimal cleaning. I'm
looking forward to our upcoming MI serialization support, and with that, much
better tests can be created.

llvm-svn: 237556
2015-05-18 06:25:59 +00:00
Duncan P. N. Exon Smith 6e23e5a680 MC: Use MCSymbol in RelAndSymbol, NFC
Switch from `MCSymbolData` to `MCSymbol`.

llvm-svn: 237502
2015-05-16 01:14:19 +00:00
Bill Schmidt 5ed84cdba8 [PPC64] Add vector pack/unpack support from ISA 2.07
This patch adds support for the following new instructions in the
Power ISA 2.07:

  vpksdss
  vpksdus
  vpkudus
  vpkudum
  vupkhsw
  vupklsw

These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces.  These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.

The first three instructions perform saturating pack operations.  The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated.  The other
instructions are only generated via built-in support for now.

Appropriate tests have been added.

There is a companion patch to clang for the rest of this support.

llvm-svn: 237499
2015-05-16 01:02:12 +00:00
Pete Cooper 3de83e4098 Remove 3 includes from MCInstrDesc.h and explicitly include them where needed
llvm-svn: 237481
2015-05-15 21:58:42 +00:00
Jim Grosbach 4c98cf77d9 MC: MCCodeGenInfo naming update. NFC.
s/InitMCCodeGenInfo/initMCCodeGenInfo/

llvm-svn: 237471
2015-05-15 19:13:31 +00:00
Jim Grosbach 91df21f740 MC: Update MCCodeEmitter naming. NFC.
s/EncodeInstruction/encodeInstruction/

llvm-svn: 237469
2015-05-15 19:13:16 +00:00
Jim Grosbach 63661f8d73 MC: Update MCFixup naming. NFC.
s/MCFixup::Create/MCFixup::create/

llvm-svn: 237468
2015-05-15 19:13:05 +00:00
Jim Grosbach e9119e41ef MC: Modernize MCOperand API naming. NFC.
MCOperand::Create*() methods renamed to MCOperand::create*().

llvm-svn: 237275
2015-05-13 18:37:00 +00:00
Michael Kuperstein c3434b390d Reverting r237234, "Use std::bitset for SubtargetFeatures"
The buildbots are still not satisfied.
MIPS and ARM are failing (even though at least MIPS was expected to pass).

llvm-svn: 237245
2015-05-13 10:28:46 +00:00
Michael Kuperstein aba4a34ef2 Use std::bitset for SubtargetFeatures
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.

The first two times this was committed (r229831, r233055), it caused several buildbot failures. 
At least some of the ARM and MIPS ones were due to gcc/binutils issues, and should now be fixed.

llvm-svn: 237234
2015-05-13 08:27:08 +00:00
Douglas Katzman 03dfca04df Strip trailing whitespace. NFC
llvm-svn: 237165
2015-05-12 19:42:31 +00:00
Arnold Schwaighofer dc2711446e Fix compile error
llvm-svn: 236921
2015-05-09 00:10:25 +00:00
Arnold Schwaighofer f54b73d681 ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.

Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.

Fix for PR23459.

rdar://20740035

llvm-svn: 236916
2015-05-08 23:52:00 +00:00
Matthias Braun d04893fa36 Change getTargetNodeName() to produce compiler warnings for missing cases, fix them
llvm-svn: 236775
2015-05-07 21:33:59 +00:00
Nemanja Ivanovic f3c94b1e3c Add VSX Scalar loads and stores to the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D9440

It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.

llvm-svn: 236755
2015-05-07 18:24:05 +00:00
Wei Mi 062c74484d [X86] Disable loop unrolling in loop vectorization pass when VF is 1.
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.

Differential Revision: http://reviews.llvm.org/D9515

llvm-svn: 236613
2015-05-06 17:12:25 +00:00
Bill Schmidt 5fe2e25f7c [PPC64LE] Adjust vector splats during VSX swap optimization
The initial code drop for VSX swap optimization permitted the
optimization only when all operations in a web of related computation
are lane-insensitive.  For some lane-sensitive operations, we can
still permit the optimization provided that we make adjustments to
those operations.  This patch adds special handling for vector splats
so that their presence doesn't kill the optimization.

Vector splats are lane-sensitive since they identify by number a
vector element to be used as the source of a splat.  When swap
optimizations take place, the desired vector element will move to the
opposite doubleword of the quadword vector.  We thus replace the index
I by (I + N/2) % N, where N is the number of elements in the vector.

A new test case is added to test that swap optimization succeeds when
vector splats are present, and that the proper input element is used
as the source of the splat.

An ancillary change removes SH_BUILDVEC as one of the kinds of special
handling that may be required by VSX swap optimization.  From
experience with GCC, I had expected to need some modifications for
vector build operations, but I did not find that to be the case.

llvm-svn: 236606
2015-05-06 15:40:46 +00:00
Quentin Colombet 61b305edfd [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Kit Barton d4eb73c00e This patch adds ABI support for v1i128 data type.
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.

This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.

Phabricator review: http://reviews.llvm.org/D9475

llvm-svn: 236503
2015-05-05 16:10:44 +00:00
Sergey Dmitrouk 842a51bad8 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235989
2015-04-28 14:05:47 +00:00
Daniel Jasper 48e93f7181 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

llvm-svn: 235987
2015-04-28 13:38:35 +00:00
Sergey Dmitrouk adb4c69d5c [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235977
2015-04-28 11:56:37 +00:00
Bill Schmidt e71db85bed Silence unused variable errors for no-asserts builds
llvm-svn: 235913
2015-04-27 20:22:35 +00:00
Bill Schmidt fe723b9a6d [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.

However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.

Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.

This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.

Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.

llvm-svn: 235910
2015-04-27 19:57:34 +00:00
Lang Hames 9ff69c8f4d [AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.
AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a
reference for this is crufty.

llvm-svn: 235752
2015-04-24 19:11:51 +00:00
Hal Finkel 4dc8fcc224 [PowerPC] Support register name prefixes for vector registers
Match binutils by supporting the optional register name prefix for new vector
registers ("vs" for VSX registers and "q" for QPX registers).

llvm-svn: 235665
2015-04-23 23:16:22 +00:00
Hal Finkel d86e90abdd [PowerPC] Use sync inst alias when printing
So long as the choice between printing msync and sync is not ambiguous, we can
print 'sync 0' and just 'sync'.

llvm-svn: 235663
2015-04-23 23:05:08 +00:00
Hal Finkel fefcfffe68 [PowerPC] Add asm/disasm support for dcbt with hint
Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint
field specified (non-zero). Unforunately, the syntax for this instruction is
special in that it differs for server vs. embedded cores:
   dcbt ra, rb, th [server]
   dcbt th, ra, rb [embedded]
where th can be omitted when it is 0. dcbtst is the same. Thus we need to play
games in the parser and the printer to flip the operands around on the embedded
cores. We'll use the server syntax as the default (binutils currently uses the
embedded form by default, but IBM is changing that).

We also stop marking dcbtst as having unmodeled side effects (this is not
necessary, it is just a hint like dcbt -- noticed by inspection, so no separate
test case).

llvm-svn: 235657
2015-04-23 22:47:57 +00:00
Hal Finkel 7c5cb066d0 [PowerPC] Enable printing instructions using aliases
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.

Thus, after some hours of updating test cases...

llvm-svn: 235616
2015-04-23 18:30:38 +00:00
Hans Wennborg 0867b151c9 Re-commit r235560: Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.

llvm-svn: 235608
2015-04-23 16:45:24 +00:00
Aaron Ballman 0be238cebd Revert r235560; this commit was causing several failed assertions in Debug builds using MSVC's STL. The iterator is being used outside of its valid range.
llvm-svn: 235597
2015-04-23 13:41:59 +00:00
Hans Wennborg 15823d49b6 Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a re-commit of r235101, which also fixes the problems with the previous patch:

- Switches with only a default case and non-fallthrough were handled incorrectly

- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.

> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference.  If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649

llvm-svn: 235560
2015-04-22 23:14:56 +00:00
Bill Schmidt 6779075c44 [PowerPC] Flow oversized lines for r235309
llvm-svn: 235310
2015-04-20 15:58:46 +00:00
Bill Schmidt 1962f709c7 [PowerPC] Add future work for vector insert/extract to README_ALTIVEC.txt
llvm-svn: 235309
2015-04-20 15:54:26 +00:00
Benjamin Kramer 97fbdd5a39 [mc] Clean up emission of byte sequences
No functional change intended.

llvm-svn: 235178
2015-04-17 11:12:43 +00:00
Rafael Espindola 5560a4cfbd Use raw_pwrite_stream in the object writer/streamer.
The ELF object writer will take advantage of that in the next commit.

llvm-svn: 234950
2015-04-14 22:14:34 +00:00
Ed Maste 8ed40ce56d Correct 'teh' and other typos / repeated words.
Patch by Eitan Adler.

Differential Revision:	http://reviews.llvm.org/D8514

llvm-svn: 234939
2015-04-14 20:52:58 +00:00
Krzysztof Parzyszek a46c36b8f4 Allow memory intrinsics to be tail calls
llvm-svn: 234764
2015-04-13 17:16:45 +00:00
Hal Finkel 5551f2528c [PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrep
When I fixed these a couple of days ago to iterate over all loops, not just
depth == 1 loops, I inadvertently made it such that we'd only look at the first
top-level loop. Make sure that we really look at all of them.

llvm-svn: 234705
2015-04-12 17:18:56 +00:00
Hal Finkel 58f5f9c393 [PowerPC] Disable part-word atomics on the P7
As it turns out, even though these are part of ISA 2.06, the P7 does not
support them (or, at least, not any P7s we're tested so far).

llvm-svn: 234686
2015-04-11 13:40:36 +00:00
Nemanja Ivanovic c38b5311cb Add direct moves to/from VSR and exploit them for FP/INT conversions
This patch corresponds to review:
http://reviews.llvm.org/D8928

It adds direct move instructions to/from VSX registers to GPR's. These are
exploited for FP <-> INT conversions.

llvm-svn: 234682
2015-04-11 10:40:42 +00:00
Alexander Kornienko f817c1cb9a Use 'override/final' instead of 'virtual' for overridden methods
The patch is generated using clang-tidy misc-use-override check.

This command was used:

  tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks='-*,misc-use-override' -header-filter='llvm|clang' \
    -j=32 -fix -format

http://reviews.llvm.org/D8925

llvm-svn: 234679
2015-04-11 02:11:45 +00:00
Hal Finkel ffb460fdf0 [PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loops
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.

llvm-svn: 234670
2015-04-11 00:33:08 +00:00
Hal Finkel a9fceb803d [PowerPC] Prefetching should also consider depth > 1 loops
Iterating over loops from the LoopInfo instance only provides top-level loops.
We need to search the whole tree of loops to find the inner ones.

llvm-svn: 234603
2015-04-10 15:05:02 +00:00
Hal Finkel 93138503ae [PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).

Fixes PR23173.

llvm-svn: 234561
2015-04-10 03:39:00 +00:00
Nemanja Ivanovic c09047916a Add LLVM support for remaining integer divide and permute instructions from ISA 2.06
This is the patch corresponding to review:
http://reviews.llvm.org/D8406

It adds some missing instructions from ISA 2.06 to the PPC back end.

llvm-svn: 234546
2015-04-09 23:54:37 +00:00
Rafael Espindola 49286e9f4a clang-format bits of code to make a followup patch easy to read.
llvm-svn: 234519
2015-04-09 18:32:58 +00:00
Rafael Espindola df7305a438 Don't repeat name in comment. NFC.
llvm-svn: 234506
2015-04-09 17:10:57 +00:00
Rafael Espindola b91455b5c0 Refactor a lot of duplicated code for stub output.
This also moves it earlier so that it they are produced before we print
an end symbol for the data section.

llvm-svn: 234315
2015-04-07 13:42:44 +00:00
Bill Schmidt 91dd765a04 [PowerPC] Enable splat generation for BUILD_VECTOR with little endian
When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR
nodes for little endian because wrong results were produced.  I've
subsequently investigated and found this is due to a call to
BuildVectorSDNode::isConstantSplat that was always specifying
big-endian.  With this changed to correctly identify the target
endianness, the optimizations work as expected.

I found another case of a call to the same method with big-endian
hardcoded, in PPC::isAllNegativeZeroVector().  I discovered this was
an orphaned method with no callers, so I've just removed it.

The existing test/CodeGen/PowerPC/vec_constants.ll checks these
optimizations, so for testing I've just added a variant for little
endian.

llvm-svn: 234011
2015-04-03 13:48:24 +00:00
Hal Finkel 50271aae7e [PowerPC] FastISel can't handle i1 return values when using CR bits
Under normal circumstances, use of CR bits is disabled when running at -O0, but
it is enabled by default otherwise, and if you have optnone functions, they'll
still generally be generated with crbits turned on (because nothing else turns
them off). FastISel can't handle most things dealing with i1 values when using
CR bits, and checks for that, but was not checking the return type on
functions; we can't fast-isel function calls with i1 return values either when
using CR bits for boolean values.

Fixes PR22664.

llvm-svn: 233775
2015-04-01 00:40:48 +00:00
Hal Finkel 52368d4437 [PowerPC] Don't use a vector preferred memory type at -O0
Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic
is a memset/memcpy/etc. we might normally use vector types. At -O0, this is
probably not a good idea (because, if there is a bug in the lowering code,
there would be no good way to turn it off). At -O0, only use scalar preferred
types.

Related to PR22754.

llvm-svn: 233755
2015-03-31 20:56:09 +00:00
Ulrich Weigand 7aed6890fd [PowerPC] Remove TargetMachine CPU auto-detection
As was done for X86 in r206094.

llvm-svn: 233684
2015-03-31 12:01:06 +00:00
Eric Christopher f8019408dc Replace the MCSubtargetInfo parameter with a Triple when creating
an MCInstPrinter. Update all callers and use where we wanted a Triple
previously.

llvm-svn: 233648
2015-03-31 00:10:04 +00:00
Eric Christopher c7c5592b7e Remove unused Target argument from MCInstPrinter ctor functions.
llvm-svn: 233607
2015-03-30 21:52:21 +00:00
Hal Finkel 6e9110abe9 [PowerPC] Add asm parser support for bitmask forms of rotate-and-mask instructions
The asm syntax for the 32-bit rotate-and-mask instructions can take a 32-bit
bitmask instead of an (mb, me) pair. This syntax is not specified in the Power
ISA manual, but is accepted by GNU as, and is documented in IBM's Assembler
Language Reference. The GNU Multiple Precision Arithmetic Library (gmp)
contains assembly that uses this syntax.

To implement this, I moved the isRunOfOnes utility function from
PPCISelDAGToDAG.cpp to PPCMCTargetDesc.h.

llvm-svn: 233483
2015-03-28 19:42:41 +00:00
Akira Hatanaka b46d0234a6 [MCInstPrinter] Enable MCInstPrinter to change its behavior based on the
per-function subtarget.

Currently, code-gen passes the default or generic subtarget to the constructors
of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which
enables some targets (AArch64, ARM, and X86) to change their instprinter's
behavior based on the subtarget feature bits. Since the backend can now use
different subtargets for each function, instprinter has to be changed to use the
per-function subtarget rather than the default subtarget.

This patch takes the first step towards enabling instprinter to change its
behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to
AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the
various print methods table-gen auto-generates. 

I will follow up with changes to instprinters of AArch64, ARM, and X86.

llvm-svn: 233411
2015-03-27 20:36:02 +00:00
Yaron Keren 75e0c4b060 Remove superfluous .str() and replace std::string concatenation with Twine.
llvm-svn: 233392
2015-03-27 17:51:30 +00:00
Eric Christopher ed1042b97c Add computeFSAdditions to the function based subtarget creation
for PPC due to some unfortunate default setting via TargetMachine
creation. I've added a FIXME on how this can be unraveled in the
backend and a test to make sure we successfully legalize 64-bit things
if we say we're 64-bits.

llvm-svn: 233239
2015-03-26 00:50:23 +00:00
Kit Barton 535e69de34 Add Hardware Transactional Memory (HTM) Support
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.

The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.

The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.

This is send along a clang patch to enabled the builtins and option switch.

[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html

Phabricator Review: http://reviews.llvm.org/D8247

llvm-svn: 233204
2015-03-25 19:36:23 +00:00
Benjamin Kramer b4b5150dfc [APInt] Add an isSplat helper and use it in some places.
To complement getSplat. This is more general than the binary
decomposition method as it also handles non-pow2 splat sizes.

llvm-svn: 233195
2015-03-25 16:49:59 +00:00
Andrew Kaylor 5c73e1f85c Disabling warnings for MSVC build to enable /W4 use.
Differential Revision: http://reviews.llvm.org/D8572

llvm-svn: 233133
2015-03-24 23:37:10 +00:00
Michael Kuperstein 29704e7fb4 Revert "Use std::bitset for SubtargetFeatures"
This reverts commit r233055.

It still causes buildbot failures (gcc running out of memory on several platforms, and a self-host failure on arm), although less than the previous time.

llvm-svn: 233068
2015-03-24 12:56:59 +00:00
Michael Kuperstein 774b441b5e Use std::bitset for SubtargetFeatures
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.

The first time this was committed (r229831), it caused several buildbot failures. 
At least some of the ARM ones were due to gcc/binutils issues, and should now be fixed.

Differential Revision: http://reviews.llvm.org/D8542

llvm-svn: 233055
2015-03-24 09:17:25 +00:00
Eric Christopher 83eb13c967 Remove the bare getSubtargetImpl call from the PPC port. As part
of this add a test that shows we can generate code with
for functions that differ by subtarget feature.

llvm-svn: 232882
2015-03-21 03:36:02 +00:00
John Brawn 1f26a47630 [ARM] Fix handling of thumb1 out-of-range frame offsets
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure. 

Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer. 

Differential Revision: http://reviews.llvm.org/D8419

llvm-svn: 232825
2015-03-20 17:20:07 +00:00
Rafael Espindola cd584a809d Split the object streamer callback in one per file format.
There are two main advantages to doing this

* Targets that only need to handle one of the formats specially don't have
  to worry about the others. For example, x86 now only registers a
  constructor for the COFF streamer.

* Changes to the arguments passed to one format constructor will not impact
  the other formats.

llvm-svn: 232699
2015-03-19 01:50:16 +00:00
Rafael Espindola 69244c3e78 two or more, use a for.
llvm-svn: 232688
2015-03-18 23:15:49 +00:00
Bill Schmidt 1723525e01 [PowerPC] Correct typo in PPCInstrAltivec.td
llvm-svn: 232681
2015-03-18 22:13:03 +00:00
Rafael Espindola 9ab09237dc Centralize the handling of unique ids for temporary labels.
Before this patch code wanting to create temporary labels for a given entity
(function, cu, exception range, etc) had to keep its own counter to have stable
symbol names.

createTempSymbol would still add a suffix to make sure a new symbol was always
returned, but it kept a single counter. Because of that, if we were to use
just createTempSymbol("cu_begin"), the label could change from cu_begin42 to
cu_begin43 because some other code started using temporary labels.

Simplify this by just keeping one counter per prefix and removing the various
specialized counters.

llvm-svn: 232535
2015-03-17 20:07:06 +00:00
Samuel Antao 0d59f31d12 Add assertion to detect invalid registers in the PowerPC MC instruction lowering.
We have observed that noreg was being generated due to a bug in FastIsel and was not being detected during emission. It happens that in the Asm emission there is an assertion that detects this in getRegisterName() from the tbl-generated file PPCGenAsmWriter.inc. However, when emitting an Obj file, invalid registers can be emitted given that no check are made in getBinaryCodeFromInstr() from PPCGenMCCodeEmitter.inc. In order to cover all cases this adds an assertion for reg operands in LowerPPCMachineInstrToMCInst.

llvm-svn: 232525
2015-03-17 19:31:19 +00:00
Samuel Antao f68156015d Fix R0 use in PowerPC VSX store for FastIsel.
The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address.

llvm-svn: 232486
2015-03-17 15:00:57 +00:00
Rafael Espindola ccfbbd5596 Use createTempSymbol to avoid collisions instead of an ad hoc method.
llvm-svn: 232483
2015-03-17 14:50:32 +00:00
Daniel Sanders 914b947e9b Fix r232466 by adding 'i' to the mappings for inline assembly memory constraints.
It's not completely clear why 'i' has historically been treated as a memory
constraint. According to the documentation, it represents a constant immediate.

llvm-svn: 232470
2015-03-17 12:00:04 +00:00
Daniel Sanders 0828860694 [ppc] Distinguish the 'es', 'o', 'm', 'Q', 'Z', and 'Zy' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.

Of these, 'es', and 'Q' do not have backend tests but are accepted by
clang.

No functional change intended. Depends on D8173.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8213

llvm-svn: 232466
2015-03-17 11:09:13 +00:00
Rafael Espindola f696df1148 Pass in a "const Triple &T" instead of a raw StringRef.
llvm-svn: 232429
2015-03-16 22:29:29 +00:00
Rafael Espindola 9bcf2fcb89 Remove unused argument. NFC.
llvm-svn: 232428
2015-03-16 22:06:15 +00:00
Rafael Espindola 73870dd438 There is only one Asm streamer, there is no need for targets to register it.
Instead, have the targets register a TargetStreamer to be use with the
asm streamer (if any).

llvm-svn: 232423
2015-03-16 21:43:42 +00:00
David Blaikie 9f380a3ca0 Fix uses of reserved identifiers starting with an underscore followed by an uppercase letter
This covers essentially all of llvm's headers and libs. One or two weird
cases I wasn't sure were worth/appropriate to fix.

llvm-svn: 232394
2015-03-16 18:06:57 +00:00
Daniel Sanders bf5b80f5f9 Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC.
Summary:
This is instead of doing this in target independent code and is the last
non-functional change before targets begin to distinguish between
different memory constraints when selecting code for the ISD::INLINEASM
node.

Next, each target will individually move away from the idea that all
memory constraints behave like 'm'.

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8173

llvm-svn: 232373
2015-03-16 13:13:41 +00:00
David Blaikie c95c3aee15 [opaque pointer type] gep API migration
llvm-svn: 232279
2015-03-14 21:20:51 +00:00
Daniel Sanders 60f1db0525 Recommit r232027 with PR22883 fixed: Add infrastructure for support of multiple memory constraints.
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.

PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there. 

llvm-svn: 232165
2015-03-13 12:45:09 +00:00
Hal Finkel e78e52ba9b Revert "r232027 - Add infrastructure for support of multiple memory constraints"
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.

Original commit message:

The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

llvm-svn: 232093
2015-03-12 20:09:39 +00:00
Daniel Sanders 41c072e63b Add infrastructure for support of multiple memory constraints.
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8171

llvm-svn: 232027
2015-03-12 11:00:48 +00:00
Eric Christopher 234a1ec404 Remove some unnecessary forward declarations and put a couple more
where they're supposed to reside.

llvm-svn: 232014
2015-03-12 06:07:16 +00:00
Eric Christopher ea178cf48f Remove the need to cache the subtarget in the PowerPC TargetRegisterInfo
classes. Replace it with a cache to the TargetMachine and use that
where applicable at the moment.

llvm-svn: 232002
2015-03-12 01:42:51 +00:00
Mehdi Amini 93e1ea167e Move the DataLayout to the generic TargetMachine, making it mandatory.
Summary:
I don't know why every singled backend had to redeclare its own DataLayout.
There was a virtual getDataLayout() on the common base TargetMachine, the
default implementation returned nullptr. It was not clear from this that
we could assume at call site that a DataLayout will be available with
each Target.

Now getDataLayout() is no longer virtual and return a pointer to the
DataLayout member of the common base TargetMachine. I plan to turn it into
a reference in a future patch.

The only backend that didn't have a DataLayout previsouly was the CPPBackend.
It now initializes the default DataLayout. This commit is NFC for all the
other backends.

Test Plan: clang+llvm ninja check-all

Reviewers: echristo

Subscribers: jfb, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8243

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231987
2015-03-12 00:07:24 +00:00
Hal Finkel 6a778fb7c2 [PowerPC] Remove canFoldAsLoad from instruction definitions
The PowerPC backend had a number of loads that were marked as canFoldAsLoad
(and I'm partially at fault here for copying around the relevant line of
TableGen definitions without really looking at what it meant). This is not
right; PPC (non-memory) instructions don't support direct memory operands, and
so there is nothing a 'foldable' instruction could be folded into.

Noticed by inspection, no test case.

The one thing we might lose by doing this is ability to fold some loads into
stackmap/patchpoint pseudo-instructions. However, this was untested, and would
not obviously have worked for extending loads, and I'd rather re-add support
for that once it can be tested.

llvm-svn: 231982
2015-03-11 23:28:38 +00:00
Eric Christopher 9deb75d176 Have getCallPreservedMask and getThisCallPreservedMask take a
MachineFunction argument so that we can grab subtarget specific
features off of it.

llvm-svn: 231979
2015-03-11 22:42:13 +00:00
Eric Christopher ec8cda5668 One more getCalleeSavedRegs prototype with nullptr.
llvm-svn: 231977
2015-03-11 22:24:37 +00:00
Kit Barton 116e18be41 Updated with list of possible improvements we are tracking internally
llvm-svn: 231946
2015-03-11 17:43:43 +00:00
Eric Christopher 433c432b7e Have TargetRegisterInfo::getLargestLegalSuperClass take a
MachineFunction argument so that it can look up the subtarget
rather than using a cached one in some Targets.

llvm-svn: 231888
2015-03-10 23:46:01 +00:00
Eric Christopher 0169e42c3b Remove the use of the subtarget in MCCodeEmitter creation and
update all ports accordingly. Required a couple of small rewrites
in handling subtarget features during creation in PPC.

llvm-svn: 231861
2015-03-10 22:03:14 +00:00
Nemanja Ivanovic 0adf26b9b0 Add support for part-word atomics for PPC
http://reviews.llvm.org/D8090#inline-67337

llvm-svn: 231843
2015-03-10 20:51:07 +00:00
Kit Barton 20d3981e15 Change the generation of the vmuluwm instruction to be based on the MUL opcode.
Phabricator review: http://reviews.llvm.org/D8185

llvm-svn: 231827
2015-03-10 19:49:38 +00:00
Mehdi Amini a28d91d81b DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
Benjamin Kramer 7bd1f7cb58 Remove the remaining uses of abs64 and nuke it.
std::abs works just fine and we're already using it in many places. NFC intended.

llvm-svn: 231696
2015-03-09 20:20:16 +00:00
Benjamin Kramer 867bfc53ee Make constant arrays that are passed to functions as const.
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.

llvm-svn: 231571
2015-03-07 17:41:00 +00:00
Olivier Sallenave 049d803ce0 Do not restrict interleaved unrolling to small loops, depending on the target.
llvm-svn: 231528
2015-03-06 23:12:04 +00:00
Rafael Espindola 092b619e55 Use the correct func begin symbol in all places in ppc.
I missed an occurrence of the old symbol in my previous patch.

llvm-svn: 231398
2015-03-05 19:47:50 +00:00
Rafael Espindola 86bd6a1202 Use the generic Lfunc_begin label on ppc.
This removes yet another custom label to mark the start of a function.

llvm-svn: 231390
2015-03-05 18:55:50 +00:00
Kit Barton e48b1e1c4f While reviewing the changes to Clang to add builtin support for the vsld, vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics.
llvm-svn: 231378
2015-03-05 16:24:38 +00:00
Nemanja Ivanovic e8effe1edb Add LLVM support for PPC cryptography builtins
Review: http://reviews.llvm.org/D7955

llvm-svn: 231285
2015-03-04 20:44:33 +00:00
Mehdi Amini 46a43556db Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
Nemanja Ivanovic d384cd9907 Test commit. Removed an unnecessary space
llvm-svn: 231257
2015-03-04 17:09:12 +00:00
Bill Schmidt d90aff2c4f [PowerPC] Remove unnecessary and incomplete commentary
This "itinerary class map" in PPCSchedule.td is incomplete and
redundant with the actual code.  As it provides no value, we've
decided to remove it.

No functional change.

llvm-svn: 231246
2015-03-04 14:56:05 +00:00
Kit Barton 0cfa7b7ad0 Add the following 64-bit vector integer arithmetic instructions added in POWER8:
vaddudm
vsubudm
vmulesw
vmulosw
vmuleuw
vmulouw
vmuluwm
vmaxsd
vmaxud
vminsd
vminud
vcmpequd
vcmpequd.
vcmpgtsd
vcmpgtsd.
vcmpgtud
vcmpgtud.
vrld
vsld
vsrd
vsrad

Phabricator review: http://reviews.llvm.org/D7959

llvm-svn: 231115
2015-03-03 19:55:45 +00:00
Benjamin Kramer 7149aabf8b Make some non-constant static variables non-static or fully const.
Otherwise we have to emit thread-safe initialization for them. NFC.

llvm-svn: 230894
2015-03-01 18:09:56 +00:00
Bill Schmidt e3959eb54e [PowerPC] Fix PR22711 - Misaligned .toc section
Straightforward patch to emit an alignment directive when emitting a
TOC entry.  The test case was generated from the test in PR22711 that
demonstrated a misaligned .toc section.  The object code is run
through llvm-readobj to verify that the correct alignment has been
applied to the .toc section.

Thanks to Ulrich Weigand for running down where the fix was needed.

llvm-svn: 230801
2015-02-27 22:14:10 +00:00
Hal Finkel 5c3cacf5c0 [PowerPC] Use vector types for memcpy and friends (sometimes)
When using Altivec, we can use vector loads and stores for aligned memcpy and
friends. Starting with the P7 and VXS, we have reasonable unaligned vector
stores. Starting with the P8, we have fast unaligned loads too.

For QPX, we use vector loads are stores, but only for aligned memory accesses.

llvm-svn: 230788
2015-02-27 19:58:28 +00:00
Eric Christopher 11e4df73c8 getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

llvm-svn: 230699
2015-02-26 22:38:43 +00:00
Eric Christopher 23a3a7c871 Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.

llvm-svn: 230583
2015-02-26 00:00:24 +00:00
Hal Finkel cf59921670 [PowerPC] Make LDtocL and friends invariant loads
LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node,
represent loads from the TOC, which is invariant. As a result, these loads can
be hoisted out of loops, etc. In order to do this, we need to generate
GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory
intrinsic node type. Once this is done, the MMO transfer is automatically
handled for TableGen-driven instruction selection, and for nodes generated
directly in PPCISelDAGToDAG, we need to transfer the MMOs manually.

Also, we were not transferring MMOs associated with pre-increment loads, so do
that too.

Lastly, this fixes an exposed bug where R30 was not added as a defined operand of
UpdateGBR.

This problem was highlighted by an example (used to generate the test case)
posted to llvmdev by Francois Pichet.

llvm-svn: 230553
2015-02-25 21:36:59 +00:00
Hal Finkel 0746211811 [PowerPC] Cleanup unused target-specific SDAG nodes
We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64
TOC access that were referenced only in TableGen patterns. The associated
(pseudo-)instructions are used, but are being generated directly. NFC.

llvm-svn: 230518
2015-02-25 18:06:45 +00:00
Aaron Ballman 5561ed448b Silencing a "result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)" warning in MSVC; NFC.
llvm-svn: 230489
2015-02-25 13:05:24 +00:00
Aaron Ballman 70c27ded97 Silencing a -Wsign-compare warning triggered in MSVC; NFC.
llvm-svn: 230488
2015-02-25 13:02:23 +00:00
Hal Finkel c93a9a2cb4 [PowerPC] Add support for the QPX vector instruction set
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially  { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).

I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).

The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.

llvm-svn: 230413
2015-02-25 01:06:45 +00:00
Tim Northover 3b6b7ca2bc CodeGen: convert CCState interface to using ArrayRefs
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.

There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.

llvm-svn: 230118
2015-02-21 02:11:17 +00:00
Eric Christopher db51e2a52a Fix an asan use-after-free bug introduced by the asm printer
changes to remove non-Function based subtargets out of the asm
printer. For module level emission we'll need to construct up
an MCSubtargetInfo so that we can encode instructions for
emission.

llvm-svn: 230050
2015-02-20 19:54:07 +00:00
Eric Christopher 9cda4b7ed9 Remove a use of the Subtarget in the darwin ppc asm printer.
EmitFunctionStubs is called from doFinalization and so can't
depend on the Subtarget existing. It's also irrelevant as
we know we're darwin since we're in the darwin asm printer.

llvm-svn: 230039
2015-02-20 18:53:42 +00:00
Eric Christopher 1df0c519fc Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 230037
2015-02-20 18:44:15 +00:00
Kit Barton 263edb99ab I incorrectly marked the VORC instruction as isCommutable when I added it.
This fix removes the VORC instruction definition from the isCommutable block.

Phabricator review: http://reviews.llvm.org/D7772

llvm-svn: 230020
2015-02-20 15:54:58 +00:00
Eric Christopher 155290edf9 Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 229998
2015-02-20 08:24:34 +00:00
Eric Christopher 1947a9e2e2 Make the TargetMachine::getSubtarget that takes a Function argument
take a reference to match the getSubtargetImpl that takes a Function
argument.

llvm-svn: 229994
2015-02-20 07:32:59 +00:00
Hal Finkel e5aaf3f2cd [PowerPC] Loop Data Prefetching for the BG/Q
The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the
L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it
prefetches into its own L1P buffer, and the latency to access that buffer is
significantly higher than that to the L1 cache (although smaller than the
latency to the L2 cache). As a result, especially when multiple hardware
threads are not actively busy, explicitly prefetching data into the L1 cache is
advantageous.

I've been using this pass out-of-tree for data prefetching on the BG/Q for well
over a year, and it has worked quite well. It is enabled by default only for
the BG/Q, but can be enabled for other cores as well via a command-line option.

Eventually, we might want to add some TTI interfaces and move this into
Transforms/Scalar (there is nothing particularly target dependent about it,
although only machines like the BG/Q will benefit from its simplistic
strategy).

llvm-svn: 229966
2015-02-20 05:08:21 +00:00
Kit Barton 298beb5e86 This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions.
Phabricator review: http://reviews.llvm.org/D7616

Commiting on Nemanja's behalf.

llvm-svn: 229694
2015-02-18 16:21:46 +00:00
Eric Christopher 5c0e009d3a Make the PowerPC AsmPrinter independent of global subtarget
initialization. Initialize the subtarget once per function and
migrate EmitStartOfAsmFile to either use attributes on the
TargetMachine or get information from all of the various
subtargets.

llvm-svn: 229475
2015-02-17 07:21:21 +00:00
Eric Christopher 75dc3904a5 Add a FIXME to move IsLittleEndian to the target machine.
llvm-svn: 229472
2015-02-17 06:45:17 +00:00
Eric Christopher fee6aaf683 Move ABI handling and 64-bitness to the PowerPC target machine.
This required changing how the computation of the ABI is handled
and how some of the checks for ABI/target are done.

llvm-svn: 229471
2015-02-17 06:45:15 +00:00
Hal Finkel 5cedafb8cd [PowerPC] Support non-direct-sub/superclass VSX copies
Our register allocation has become better recently, it seems, and is now
starting to generate cross-block copies into inflated register classes. These
copies are not transformed into subregister insertions/extractions by the
PPCVSXCopy class, and so need to be handled directly by
PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not
quite (it was unnecessarily restricting itself to only the direct
sub/super-register-class case (not copying between, for example, something in
VRRC and the lower-half of VSRC which are super-registers of F8RC).

Triggering this behavior manually is difficult; I'm including two
bugpoint-reduced test cases from the test suite.

llvm-svn: 229457
2015-02-16 23:46:30 +00:00
Andrew Trick 05938a5481 AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

llvm-svn: 229413
2015-02-16 18:10:47 +00:00
Aaron Ballman f9a1897c72 Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition.
llvm-svn: 229340
2015-02-15 22:54:22 +00:00
Chandler Carruth 003ed332bf Remove a variable only used in an assert and sink its initializer into
the assert. Fixes -Wunused-variable on non-asserts builds.

llvm-svn: 229250
2015-02-14 09:14:44 +00:00
Duncan P. N. Exon Smith 5bedaf934f PowerPC: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

llvm-svn: 229224
2015-02-14 02:54:07 +00:00
Eric Christopher fcd3d87ad8 The base pointer save offset can be computed at initialization time,
do so and fix up the calls.

llvm-svn: 229169
2015-02-13 22:48:53 +00:00
Eric Christopher a10d58dba8 Move the target machine variable so that it's initialized early
enough we can use it to initialize frame lowering.

llvm-svn: 229168
2015-02-13 22:48:51 +00:00
Eric Christopher e8dbfe1cf8 Stash the TargetMachine on the subtarget so we can access it later.
Clean up a subtarget function that has it passed in while we're at it.

llvm-svn: 229164
2015-02-13 22:23:04 +00:00
Eric Christopher a4ae213193 PPC LinkageSize can be computed at initialization time, do so.
llvm-svn: 229163
2015-02-13 22:22:57 +00:00
Chandler Carruth 30d69c2e36 [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

llvm-svn: 229094
2015-02-13 10:01:29 +00:00
Chandler Carruth 71f308adb7 Re-sort #include lines using my handy dandy ./utils/sort_includes.py
script. This is in preparation for changes to lots of include lines.

llvm-svn: 229088
2015-02-13 09:09:03 +00:00
Eric Christopher dc3a8a4a66 PPCFrameLowering's FramePointerOffset can be computed at initialization
time. Do so.

llvm-svn: 228998
2015-02-13 00:39:38 +00:00
Eric Christopher 736d39e189 The TOC save offset can be computed at compile time, do so and
propagate changes.

llvm-svn: 228997
2015-02-13 00:39:36 +00:00
Eric Christopher f71609b5dd The return save offset can be computed at initialization time - do
so and save the value.

llvm-svn: 228996
2015-02-13 00:39:27 +00:00
Olivier Sallenave 05e69157b6 Change max interleave factor to 12 for POWER7 and POWER8.
llvm-svn: 228973
2015-02-12 22:57:58 +00:00
Benjamin Kramer 5f6a907288 MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line with countTrailingZeros
Update all callers.

llvm-svn: 228930
2015-02-12 15:35:40 +00:00
Hal Finkel 7a0516ea66 [PowerPC] Mark jumps as expensive (using using CR bits)
On PowerPC, which has a full set of logical operations on (its multiple sets
of) condition-register bits, it is not profitable to break of complex
conditions feeding a jump into multiple jumps. We can turn off this feature of
CGP/SDAGBuilder by marking jumps as "expensive".

P7 test-suite speedups (no regressions):
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.626647% +/- 0.323583%
MultiSource/Benchmarks/Olden/power/power
	-18.2821% +/- 8.06481%

llvm-svn: 228895
2015-02-12 01:02:52 +00:00
Bill Schmidt 67f36bd0d8 Fix up r228725, missed change in PPCSubtarget definition
llvm-svn: 228728
2015-02-10 19:31:55 +00:00
Bill Schmidt 82f1c775a0 [PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.

We now hide the add-immediate and call instructions together in a
separate pseudo-op, which is tagged to define GPR3 and clobber the
call-killed registers.  The PPCTLSDynamicCall pass prior to RA now
expands this op into the two separate addi and call ops, with explicit
definitions of GPR3 on both instructions, and explicit clobbers on the
call instruction.  The pass is now marked as requiring and preserving
the LiveIntervals and SlotIndexes analyses, and fixes these up after
the replacement sequences are introduced.

Self-hosting has been verified on LE P8 and BE P7 with various
optimization levels, etc.  It has also been verified with the
--no-tls-optimize flag workaround removed.

llvm-svn: 228725
2015-02-10 19:09:05 +00:00
Hal Finkel 57c6ac5e41 [PowerPC] Support the (old) cntlz instruction alias
Some old assembly code uses the cntlz alias for cntlzw, binutils supports this,
and we should too. Fixes PR22519.

llvm-svn: 228719
2015-02-10 18:45:02 +00:00
Eric Christopher d49868080e Migrate PPCAsmPrinter's subtarget from reference to pointer in
preparation for making it MachineFunction dependent.

llvm-svn: 228638
2015-02-10 00:44:17 +00:00
Kit Barton 0b0cdb1cd4 This change implements the following three logical vector operations:
veqv (vector equivalence)
vnand
vorc
I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions.


Phabricator review: http://reviews.llvm.org/D7469

llvm-svn: 228580
2015-02-09 17:03:18 +00:00
Hal Finkel 291cc7bacd [PowerPC] Handle loop predecessor invokes
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.

llvm-svn: 228488
2015-02-07 07:32:58 +00:00
Hal Finkel 0d2a1515d5 Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).

Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.

llvm-svn: 228460
2015-02-06 23:07:40 +00:00
Benjamin Kramer 970eac40bf Make helper functions/classes/globals static. NFC.
llvm-svn: 228410
2015-02-06 17:51:54 +00:00
Sylvestre Ledru 9be0b77f3f Fix an incorrect identifier
Summary:
EIEIO is not a correct declaration and breaks the build under Debian HURD.
Instead, E_IEIO is used.

//
http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html
Some additional classes of identifier names are reserved for future
extensions to the C language or the POSIX.1 environment. While using
these names for your own purposes right now might not cause a problem,
they do raise the possibility of conflict with future versions of the C
or POSIX standards, so you should avoid these names.
...
Names beginning with a capital ‘E’ followed a digit or uppercase letter
may be used for additional error code names. See Error Reporting.//

Reported here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776965
And patch wrote by Svante Signell 
With this patch, LLVM, Clang & LLDB build under Debian HURD:
https://buildd.debian.org/status/fetch.php?pkg=llvm-toolchain-3.6&arch=hurd-i386&ver=1%3A3.6~%2Brc2-2&stamp=1423040039

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7437

llvm-svn: 228331
2015-02-05 18:57:02 +00:00
Hal Finkel c9dd02066c [PowerPC] Prepare loops for pre-increment loads/stores
PowerPC supports pre-increment load/store instructions (except for Altivec/VSX
vector load/stores). Using these on embedded cores can be very important, but
most loops are not naturally set up to use them. We can often change that,
however, by placing loops into a non-canonical form. Generically, this means
transforming loops like this:

  for (int i = 0; i < n; ++i)
    array[i] = c;

to look like this:

  T *p = array[-1];
  for (int i = 0; i < n; ++i)
    *++p = c;

the key point is that addresses accessed are pulled into dedicated PHIs and
"pre-decremented" in the loop preheader. This allows the use of pre-increment
load/store instructions without loop peeling.

A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is
introduced to perform this transformation. I've used this code out-of-tree for
generating code for the PPC A2 for over a year. Somewhat to my surprise,
running the test suite + externals on a P7 with this transformation enabled
showed no performance regressions, and one speedup:

External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk
	-2.32514% +/- 1.03736%

So I'm going to enable it on everything for now. I was surprised by this
because, on the POWER cores, these pre-increment load/store instructions are
cracked (and, thus, harder to schedule effectively). But seeing no regressions,
and feeling that it is generally easier to split instructions apart late than
it is to combine them late, this might be the better approach regardless.

In the future, we might want to integrate this functionality into LSR (but
currently LSR does not create new PHI nodes, so (for that and other reasons)
significant work would need to be done).

llvm-svn: 228328
2015-02-05 18:43:00 +00:00
Hal Finkel 65d1cbf9df [PowerPC] Generate pre-increment floating-point ld/st instructions
PowerPC supports pre-increment floating-point load/store instructions, both r+r
and r+i, and we had patterns for them, but they were not marked as legal. Mark
them as legal (and add a test case).

llvm-svn: 228327
2015-02-05 18:42:53 +00:00
Bill Schmidt 433b1c3aae [PowerPC] Implement the vclz instructions for PWR8
Patch by Kit Barton.

Add the vector count leading zeros instruction for byte, halfword,
word, and doubleword sizes.  This is a fairly straightforward addition
after the changes made for vpopcnt:

 1. Add the correct definitions for the various instructions in
    PPCInstrAltivec.td
 2. Make the CTLZ operation legal on vector types when using P8Altivec
    in PPCISelLowering.cpp 

Test Plan

Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the
instructions are being generated when the CTLZ operation is used in
LLVM.

Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s
and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively.

llvm-svn: 228301
2015-02-05 15:24:47 +00:00
Bill Schmidt 81638eabe6 Replace tabs with spaces from r228116. Oops.
llvm-svn: 228117
2015-02-04 06:14:38 +00:00
Bill Schmidt 1354f7c5fa [PowerPC] Handle 32-bit targets properly in PPCTLSDynamicCall.cpp
llvm-svn: 228116
2015-02-04 05:51:56 +00:00
Bill Schmidt fe88b18990 [PowerPC] Implement the vpopcnt instructions for POWER8
Patch by Kit Barton.

Add the vector population count instructions for byte, halfword, word,
and doubleword sizes.  There are two major changes here:

    PPCISelLowering.cpp: Make CTPOP legal for vector types.
    PPCRegisterInfo.td: Added v2i64 to the VRRC register
      definition. This is needed for the doubleword variations of the
      integer ops that were added in P8. 

Test Plan

Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s

Test the generation of the vpopcnt instructions for various vector
data types.  When adding the v2i64 type to the Vector Register set, I
also needed to add the appropriate bit conversion patterns between
v2i64 and the existing vector types.  Testing for these conversions
were also added in the test case by passing a different vector type as
a parameter into the test functions.  There is also a run step that
will ensure the vpopcnt instructions are generated when the vsx
feature is disabled.

llvm-svn: 228046
2015-02-03 21:58:23 +00:00
Bill Schmidt 685aa8b0c5 [PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations.  This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3.  This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering.  This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy.  The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity.  It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves.  MachineCSE
does not have any machinery to common idempotent calls.  This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace.  In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes.  However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here.  This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering.  Virtual registers
are used until prior to register allocation.  At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them.  The register allocator then coalesces these copies away.  With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with:  After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead.  Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes.  Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address.  This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not.  I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.

llvm-svn: 227976
2015-02-03 16:16:01 +00:00
Hal Finkel 23a3eab65a [PowerPC] Put PPCEarlyReturn into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCEarlyReturn into its own source file. NFC.

Now that PPCInstrInfo.cpp does not also contain pass implementations, I hope
that it will be slightly less unwieldy.

llvm-svn: 227775
2015-02-01 22:58:46 +00:00
Hal Finkel 7c6ab93fba [PowerPC] Remove unnecessary include
llvm-svn: 227772
2015-02-01 22:03:13 +00:00
Hal Finkel 76c751dcef [PowerPC] Put PPCVSXCopy into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXCopy into its own source file. NFC.

llvm-svn: 227771
2015-02-01 22:01:29 +00:00
Hal Finkel 089b8ecf8e [PowerPC] Put PPCVSXFMAMutate into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXFMAMutate into its own source file. NFC.

llvm-svn: 227770
2015-02-01 21:51:22 +00:00
Hal Finkel 83d695419c [PowerPC] Remove the PPCVSXCopyCleanup pass
This MI-level pass was necessary when VSX support was first being developed,
specifically, before the ABI code had been updated to use VSX registers for
arguments (the register assignments did not change, in a physical sense, but
the VSX super-registers are now used). Unfortunately, I never went back and
removed this pass after that was done. I believe this code is now effectively
dead.

llvm-svn: 227767
2015-02-01 21:20:58 +00:00
Hal Finkel 1296f071e0 [PowerPC] Add implicit ops to conditional returns in PPCEarlyReturn
When PPCEarlyReturn, it should really copy implicit ops from the old return
instruction to the new one. This currently does not matter much, because we run
PPCEarlyReturn very late in the pipeline (there is nothing to do DCE on
definitions of those registers). However, for completeness, we should do it
anyway.

Noticed by inspection (and there should be no functional change); thus, no
test case.

llvm-svn: 227763
2015-02-01 20:16:10 +00:00
Hal Finkel e3d2b20c2b [PowerPC] VSX stores don't also read
The VSX store instructions were also picking up an implicit "may read" from the
default pattern, which was an intrinsic (and we don't currently have a way of
specifying write-only intrinsics).

This was causing MI verification to fail for VSX spill restores.

llvm-svn: 227759
2015-02-01 19:07:41 +00:00
Hal Finkel 11d3c561a4 [PowerPC] Better scheduling for isel on P7/P8
isel is actually a cracked instruction on the P7/P8, and must start a dispatch
group. The scheduling model should reflect this so that we don't bunch too many
of them together when possible.

Thanks to Bill Schmidt and Pat Haugen for helping to sort this out.

llvm-svn: 227758
2015-02-01 17:52:16 +00:00
Hal Finkel e6698d5305 [PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions
The TOC base pointer is passed in r2, and we normally reserve this register so
that we can depend on it being there. However, for leaf functions, and
specifically those leaf functions that don't do any TOC access of their own
(which is generally due to accessing the constant pool, using TLS, etc.),
we can treat r2 as an ordinary callee-saved register (it must be callee-saved
because, for local direct calls, the linker will not insert any save/restore
code).

The allocation order has been changed slightly for PPC64/ELF systems to put r2
at the end of the list (while leaving it near the beginning for Darwin systems
to prevent unnecessary output changes). While r2 is allocatable, using it still
requires spill/restore traffic, and thus comes at the end of the list.

llvm-svn: 227745
2015-02-01 15:03:28 +00:00
Chandler Carruth ab5cb36c40 [multiversion] Remove the function parameter from the unrolling
preferences interface on TTI now that all of TTI is per-function.

llvm-svn: 227741
2015-02-01 14:31:23 +00:00
Chandler Carruth c956ab6603 [multiversion] Switch the TTI queries from TargetMachine to Subtarget
now that we have a correct and cached subtarget specific to the
function.

Also, finish providing a cached per-function subtarget in the core
LLVMTargetMachine -- that layer hadn't switched over yet.

The only use of the TargetMachine was to re-lookup a subtarget for
a particular function to work around the fact that TTI was immutable.
Now that it is per-function and we haved a cached subtarget, use it.

This still leaves a few interfaces with real warts on them where we were
passing Function objects through the TTI interface. I'll remove these
and clean their usage up in subsequent commits now that this isn't
necessary.

llvm-svn: 227738
2015-02-01 14:22:17 +00:00
Chandler Carruth c340ca839c [multiversion] Remove the cached TargetMachine pointer from the
intermediate TTI implementation template and instead query up to the
derived class for both the TargetMachine and the TargetLowering.

Most of the derived types had a TLI cached already and there is no need
to store a less precisely typed target machine pointer.

This will in turn make it much cleaner to look up the TLI via
a per-function subtarget instead of the generic subtarget, and it will
pave the way toward pulling the subtarget used for unroll preferences
into the same form once we are *always* using the function to look up
the correct subtarget.

llvm-svn: 227737
2015-02-01 14:01:15 +00:00
Chandler Carruth 8b04c0d26a [multiversion] Switch all of the targets over to use the
TargetIRAnalysis access path directly rather than implementing getTTI.

This even removes getTTI from the interface. It's more efficient for
each target to just register a precise callback that creates their
specific TTI.

As part of this, all of the targets which are building their subtargets
individually per-function now build their TTI instance with the function
and thus look up the correct subtarget and cache it. NVPTX, R600, and
XCore currently don't leverage this functionality, but its trivial for
them to add it now.

llvm-svn: 227735
2015-02-01 13:20:00 +00:00
Chandler Carruth ee642690ea [multiversion] Remove a false freedom to leave the TargetMachine pointer
null.

For some reason some of the original TTI code supported a null target
machine. This seems to have been legacy, and I made matters worse when
refactoring this code by spreading that pattern further through the
various targets.

The TargetMachine can't actually be null, and it doesn't make sense to
support that use case. I've now consistently removed it and removed all
of the code trying to cope with that situation. This is probably good,
as several targets *didn't* cope with it being null despite the null
default argument in their constructors. =]

llvm-svn: 227734
2015-02-01 12:38:24 +00:00
Chandler Carruth d8b3e9a420 [PM] Remove a bunch of stale TTI creation method declarations. I nuked
their definitions, but forgot to clean up all the declarations which are
in different files.

llvm-svn: 227698
2015-02-01 00:22:15 +00:00
Chandler Carruth 93dcdc47db [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

llvm-svn: 227685
2015-01-31 11:17:59 +00:00
Chandler Carruth 705b185f90 [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

llvm-svn: 227669
2015-01-31 03:43:40 +00:00
Eric Christopher b1874ebd32 Remove unused function.
llvm-svn: 227624
2015-01-30 22:02:36 +00:00
Eric Christopher 6d5c47c3c2 Remove extraneous forward declaration.
llvm-svn: 227623
2015-01-30 22:02:34 +00:00
Eric Christopher cccae7951c Use the cached subtargets and remove calls to getSubtarget/getSubtargetImpl
without a Function argument.

llvm-svn: 227622
2015-01-30 22:02:31 +00:00
Eric Christopher 38522b8867 Use the cached subtarget in PPCFrameLowering.
llvm-svn: 227548
2015-01-30 02:11:26 +00:00
Eric Christopher 85806141fd Migrate some of PPC away from the use of bare getSubtarget/getSubtargetImpl.
llvm-svn: 227547
2015-01-30 02:11:24 +00:00
Eric Christopher 065d16bcbe Migrage PPCRegisterInfo to use the cached subtarget.
llvm-svn: 227546
2015-01-30 02:11:21 +00:00
Rafael Espindola ba31e27f0a Compute the ELF SectionKind from the flags.
Any code creating an MCSectionELF knows ELF and already provides the flags.

SectionKind is an abstraction used by common code that uses a plain
MCSection.

Use the flags to compute the SectionKind. This removes a lot of
guessing and boilerplate from the MCSectionELF construction.

llvm-svn: 227476
2015-01-29 17:33:21 +00:00
Bill Schmidt 8cf15ced8c [PowerPC] Complete setting the baseline for ppc64le
Patch by Nemanja Ivanovic.

As was uncovered by the failing test case (when run on non-PPC
platforms), the feature set when compiling with -march=ppc64le was not
being picked up. This change ensures that if the -mcpu option is not
specified, the correct feature set is picked up regardless of whether
we are on PPC or not.

llvm-svn: 227455
2015-01-29 15:59:09 +00:00
Eric Christopher 8b7706517c Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Bill Schmidt 279cabb450 [PowerPC] Reset the baseline for ppc64le to be equivalent to pwr8
Test by Nemanja Ivanovic.

Since ppc64le implies POWER8 as a minimum, it makes sense that the
same features are included. Since the pwr8 processor model will likely
be getting new features until the implementation is complete, I
created a new list to add these updates to. This will include them in
both pwr8 and ppc64le.

Furthermore, it seems that it would make sense to compose the feature
lists for other processor models (pwr3 and up). Per discussion in the
review, I will make this change in a subsequent patch.

In order to test the changes, I've added an additional run step to
test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu
option. Since the feature lists are the same, the behaviour should be
unchanged.

llvm-svn: 227053
2015-01-25 18:05:42 +00:00
Rafael Espindola 2658554aec Add r224985 back with fixes.
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.

Original message:

In an assembly expression like

bar:
  .long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 226503
2015-01-19 21:11:14 +00:00
Hal Finkel c3168129af [PowerPC] Minor correction to r226432
We don't need to exclude patchpoints from the implicit r2 dependence in
FastISel because it is added as an implicit operand and, thus, should not
confuse that StackMap code.

By inspection / no test case.

llvm-svn: 226434
2015-01-19 07:44:45 +00:00
Hal Finkel af51993ee1 [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.

llvm-svn: 226432
2015-01-19 07:20:27 +00:00
David Blaikie 9459832ebd std::unique_ptrify the MCStreamer argument to createAsmPrinter
llvm-svn: 226414
2015-01-18 20:29:04 +00:00
Hal Finkel 58884f9fe6 [PowerPC] Don't hard-code R2 as register when processing TOC relocations
Instructions that have high-order TOC relocations always carry R2 as their base
register, so it does not matter whether we take the register from the
instruction or just hard-code it in PPCAsmPrinter. In the future, however, we
might want to apply these relocations to instructions using a different
register, so taking the register from the instruction is a better thing to do.
No change in functionality here, however.

llvm-svn: 226403
2015-01-18 15:59:44 +00:00
Hal Finkel 8ea446b6a4 [PowerPC] Add some FIXMEs for fastcc and FPR <-> GPR moves
So we don't forget, once we support FPR <-> GPR moves on the P8, we'll likely
want to re-visit this part of the calling convention.

llvm-svn: 226401
2015-01-18 14:31:10 +00:00
Hal Finkel f81b6dd7a2 [PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.

GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.

When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.

llvm-svn: 226399
2015-01-18 12:08:47 +00:00
Chandler Carruth 4f8f307c77 [PM] Split the LoopInfo object apart from the legacy pass, creating
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.

This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.

llvm-svn: 226373
2015-01-17 14:16:18 +00:00
Hal Finkel c19805a75d [PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.

llvm-svn: 226369
2015-01-17 03:57:34 +00:00
Hal Finkel 52f7c018d3 [PowerPC] Adjust PatchPoints for ppc64le
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

llvm-svn: 226247
2015-01-16 04:40:58 +00:00
Hal Finkel e2ab0f17cf [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls
Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.

Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).

Consider this simple test:
  $ cat call.c

  typedef void (*fp)();
  void bar(fp x) {
    for (int i = 0; i < 1600000000; ++i)
      x();
  }

  $ cat main.c

  typedef void (*fp)();
  void bar(fp x);
  void foo() {}
  int main() {
    bar(foo);
  }

On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.

The difference on the POWER7 is smaller. Compiling with:

  gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]

  clang -O3 -mcpu=native call.c main.c : ~5.3 seconds

  clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
  (looks like we'd benefit from additional loop unrolling here, as a first
   guess, because this is faster with the extra loads)

The -mno-invariant-function-descriptors will be added to Clang shortly.

llvm-svn: 226207
2015-01-15 21:17:34 +00:00
Chandler Carruth b98f63dbdb [PM] Separate the TargetLibraryInfo object from the immutable pass.
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.

Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.

llvm-svn: 226157
2015-01-15 10:41:28 +00:00
Chandler Carruth 62d4215baa [PM] Move TargetLibraryInfo into the Analysis library.
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.

This is in preparation for porting this analysis to the new pass
manager.

No functionality changed, and updates inbound for Clang and Polly.

llvm-svn: 226078
2015-01-15 02:16:27 +00:00
Hal Finkel 64202167c5 [PowerPC] Add assembler support for mcrfs and friends
Fill out our support for the floating-point status and control register
instructions (mcrfs and friends). As it turns out, these are necessary for
compiling src/test/harness_fp.h in TBB for PowerPC.

Thanks to Raf Schietekat for reporting the issue!

llvm-svn: 226070
2015-01-15 01:00:53 +00:00
Bill Schmidt 082cfc05f1 [PPC64] Add support for the ICBT instruction on POWER8.
Patch by Kit Barton.

Support for the ICBT instruction is currently present, but limited to
embedded processors. This change adds a new FeatureICBT that can be used
to identify whether the ICBT instruction is available on a specific processor.

Two new tests are added:
 * Positive test to ensure the icbt instruction is present when using
-mcpu=pwr8
 * Negative test to ensure the icbt instruction is not generated when
using -mcpu=pwr7

Both test cases use the Prefetch opcode in LLVM. They are based on the
ppc64-prefetch.ll test case.

llvm-svn: 226033
2015-01-14 20:17:10 +00:00
Rafael Espindola 7244bb3c17 Revert "Add r224985 back with two fixes."
This reverts commit r225644 while I debug a regression.

llvm-svn: 226022
2015-01-14 19:07:23 +00:00
Chandler Carruth d9903888d9 [cleanup] Re-sort all the #include lines in LLVM using
utils/sort_includes.py.

I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.

llvm-svn: 225974
2015-01-14 11:23:27 +00:00
Hal Finkel 934361a4b8 Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.

Original commit message:

This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225909
2015-01-14 01:07:51 +00:00
Ulrich Weigand 6b577e26f0 Use the integrated assembler as default on PowerPC
This was already done in clang, this commit now uses the integrated
assembler as default when using LLVM tools directly.

A number of test cases using inline asm had to be adapted, either by
updating the expected output, or by using -no-integrated-as (for such
tests that deliberately use an invalid instruction in inline asm).

llvm-svn: 225819
2015-01-13 19:43:45 +00:00
Hal Finkel 63fb928109 Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"
Reverting this while I investiage buildbot failures (segfaulting in
GetCostForDef at ScheduleDAGRRList.cpp:314).

llvm-svn: 225811
2015-01-13 18:25:05 +00:00
Hal Finkel 76a31f8c12 [PowerPC] Add missing override keyword
llvm-svn: 225809
2015-01-13 18:02:22 +00:00
Hal Finkel 821befd52b [PowerPC] Add StackMap/PatchPoint support
This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225808
2015-01-13 17:48:12 +00:00
Hal Finkel f4a22c0d48 [PowerPC] Split the blr definition into BLR and BLR8
We really need a separate 64-bit version of this instruction so that it can be
marked as clobbering LR8 (instead of just LR). No change in functionality
(although the verifier might be slightly happier), however, it is required for
stackmap/patchpoint support. Thus, this will be covered by stackmap test cases
once those are added.

llvm-svn: 225804
2015-01-13 17:47:54 +00:00
Hal Finkel 7d3d50bcb2 [PowerPC] Add DWARF numbers for CA (XER), etc.
For registers that have DWARF numbers (like CA, which is really part of XER),
add them. Also, RM is not an SPR, and the declaration hack (where it is
declared as an SPR with an arbitrary number) is not needed, so just declare it
as a register.

NFC; although CA's register number will be needed when stackmap/patchpoint
support is added.

llvm-svn: 225800
2015-01-13 17:45:11 +00:00
Olivier Sallenave 325096980b Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook.
llvm-svn: 225795
2015-01-13 15:06:36 +00:00
Rafael Espindola d9c3e308f5 Add r224985 back with two fixes.
One is that AArch64 has additional restrictions on when local relocations can
be used. We have to take those into consideration when deciding to put a L
symbol in the symbol table or not.

The other is that ld64 requires the relocations to cstring to use linker
visible symbols on AArch64.

Thanks to Michael Zolotukhin for testing this!

Remove doesSectionRequireSymbols.

In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 225644
2015-01-12 18:13:07 +00:00
Hal Finkel 87deb0b8e3 [PowerPC] Fix calls to non-function objects
Looking at r225438 inspired me to see how the PowerPC backend handled the
situation (calling a bitcasted TLS global), and it turns out we also produced
an error (cannot select ...). What it means to "call" something that is not a
function is implementation and platform specific, but in the name of doing
something (besides crashing), this makes sure we do what GCC does (treat all
such calls as calls through a function pointer -- meaning that the pointer is
assumed, as is the convention on PPC, to point to a function descriptor
structure holding the actual code address along with the function's TOC pointer
and environment pointer). As GCC does, we now do the same for calling regular
(non-TLS) non-function globals too.

I'm not sure whether this is the most useful way to define the behavior, but at
least we won't be alone.

llvm-svn: 225617
2015-01-12 04:34:47 +00:00
Hal Finkel 5d5d1539cc [PowerPC] Mark zext of a small scalar load as free
This initial implementation of PPCTargetLowering::isZExtFree marks as free
zexts of small scalar loads (that are not sign-extending). This callback is
used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to
determine whether a zext or an anyext is used to lower illegally-typed PHIs.
Because later truncates of zero-extended values are nops, this allows for the
elimination of later unnecessary truncations.

Fixes the initial complaint associated with PR22120.

llvm-svn: 225584
2015-01-10 08:21:59 +00:00
Justin Hibbits 17744c1e0d Remove some whitespace.
llvm-svn: 225583
2015-01-10 07:50:31 +00:00
Justin Hibbits 654346e6f9 Fully fix Bug #22115.
Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty results.

Test Plan: Tests updated

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6906

llvm-svn: 225573
2015-01-10 01:57:21 +00:00
Hal Finkel 611b127ad8 [PowerPC] Readjust the loop unrolling threshold
Now that the way that the partial unrolling threshold for small loops is used
to compute the unrolling factor as been corrected, a slightly smaller threshold
is preferable. This is expected; other targets may need to re-tune as well.

llvm-svn: 225566
2015-01-10 00:31:10 +00:00
Lang Hames 1e923ec122 Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision
introduced.

A test case for the bug was already committed in r225385.

Patch by Rafael Espindola.

llvm-svn: 225534
2015-01-09 18:55:42 +00:00
Hal Finkel b359b735d6 [PowerPC] Enable late partial unrolling on the POWER7
The P7 benefits from not have really-small loops so that we either have
multiple dispatch groups in the loop and/or the ability to form more-full
dispatch groups during scheduling. Setting the partial unrolling threshold to
44 seems good, empirically, for the P7. Compared to using no late partial
unrolling, this yields the following test-suite speedups:

SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding
	-66.3253% +/- 24.1975%
SingleSource/Benchmarks/Misc-C++/oopack_v1p8
	-44.0169% +/- 29.4881%
SingleSource/Benchmarks/Misc/pi
	-27.8351% +/- 12.2712%
SingleSource/Benchmarks/Stanford/Bubblesort
	-30.9898% +/- 22.4647%

I've speculatively added a similar setting for the P8. Also, I've noticed that
the unroller does not quite calculate the unrolling factor correctly for really
tiny loops because it neglects to account for the fact that not every loop body
replicant contains an ending branch and counter increment. I'll fix that later.

llvm-svn: 225522
2015-01-09 15:51:16 +00:00
Hal Finkel 5ff00b4350 [PowerPC] Add a flag for experimenting with subreg liveness tracking
This cannot yet be enabled by default, it causes ~50 miscompiles in the test
suite.

llvm-svn: 225497
2015-01-09 02:03:11 +00:00
Hal Finkel 6c39269a4c [PowerPC] Fold [sz]ext with fp_to_int lowering where possible
On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32
to i64 into the load necessary for an i64 -> fp conversion.

llvm-svn: 225493
2015-01-09 01:34:30 +00:00
Hal Finkel 3c0952b072 [PowerPC] Mark all instructions as non-cheap for MachineLICM
MachineLICM uses a callback named hasLowDefLatency to determine if an
instruction def operand has a 'low' latency. If all relevant operands have a
'low' latency, the instruction is considered too cheap to hoist out of loops
even in low-register-pressure situations. On PowerPC cores, both the embedded
cores and the others, there is no reason to believe that this is a good choice:
all instructions have a cost inside a loop, and hoisting them when not limited
by register pressure is a reasonable default.

llvm-svn: 225471
2015-01-08 22:11:49 +00:00
Justin Hibbits 98a532dd8e Add saving and restoring of r30 to the prologue and epilogue, respectively
Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register).  This does that.

Test Plan: Tests updated.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6876

llvm-svn: 225450
2015-01-08 15:47:19 +00:00
Ahmed Bougacha 2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Ahmed Bougacha 67dd2d25a3 [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.

llvm-svn: 225392
2015-01-07 21:27:10 +00:00
Hal Finkel b0e9b35bc3 [PowerPC] Transform a README.txt entry into a FIXME
Remove the README.txt entry regarding register allocation of CR logical ops,
and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was
not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM
for clarifying what was intended and highlighting the relevant text in the ISA
specification.

llvm-svn: 225325
2015-01-07 00:15:29 +00:00
Lang Hames 66f755f84f Revert r224935 "Refactor duplicated code. No intended functionality change."
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.

llvm-svn: 225311
2015-01-06 23:04:36 +00:00
Hal Finkel ed844c4ad1 [PowerPC] Reuse a load operand in int->fp conversions
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).

This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.

Also, remove a related entry from the README.txt for which we now generate
reasonable code.

llvm-svn: 225301
2015-01-06 22:31:02 +00:00
Hal Finkel bde27836ce [PowerPC] Remove old README.txt entry regarding struct passing
Because of how Clang represents structs as arrays (at least on non-Darwin
platforms), and what SROA does, etc. this is no longer a problem.

llvm-svn: 225251
2015-01-06 07:23:13 +00:00
Hal Finkel 3fe09ea4d9 [PowerPC] Add some missing names in getTargetNodeName
These are used for debugging output; NFC.

llvm-svn: 225249
2015-01-06 07:02:15 +00:00
Hal Finkel 5efb918844 [PowerPC] Improve int_to_fp(fp_to_int(x)) combining
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.

llvm-svn: 225248
2015-01-06 06:01:57 +00:00
Lang Hames 04b37c4043 Revert r225048: It broke ObjC on AArch64.
I've filed http://llvm.org/PR22100 to track this issue.

llvm-svn: 225228
2015-01-06 00:54:32 +00:00
Duncan P. N. Exon Smith 6e6428d981 Revert "Use the integrated assembler by default on 32-bit PowerPC and SPARC"
This reverts commit r225213.  It's failing on multiple buildbots [1][2].

[1]: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22032
[2]: http://lab.llvm.org:8080/green/view/Clang/job/clang-stage1-cmake-RA-incremental_check/2357/

llvm-svn: 225222
2015-01-05 23:31:51 +00:00
Hal Finkel 6837077fcf [PowerPC] Remove old README.txt entry
We no longer generate horrible code for the stated function:

void f(signed char *a, _Bool b, _Bool c) {
  signed char t = 0;
  if (b)  t = *a;
  if (c)  *a = t;
}

for which we now generate:

.L.f:
        andi. 5, 5, 1
        cmpldi 1, 4, 0
        li 5, 0
        beq 1, .LBB0_2
        lbz 5, 0(3)
.LBB0_2:                                # %if.end
        bclr 4, 1, 0
        stb 5, 0(3)
        blr

so we don't need the README.txt entry.

llvm-svn: 225217
2015-01-05 22:20:22 +00:00
Hal Finkel 9187711f08 [PowerPC] Convert a README.txt entry into a better test
We now produce the desired code as noted in the README.txt file (no spurious
or). Remove the README entry and improve the regression test.

llvm-svn: 225214
2015-01-05 21:53:52 +00:00
Brad Smith 9cf1e26201 Use the integrated assembler by default on 32-bit PowerPC and SPARC
llvm-svn: 225213
2015-01-05 21:48:16 +00:00
Hal Finkel f4044b02a5 [PowerPC] Remove README.txt entry
This entry has been rendered irrelevant now that we have proper CR bit
tracking.

llvm-svn: 225211
2015-01-05 21:41:26 +00:00
Hal Finkel c7d35bb5b1 [PowerPC] Add a test for truncating a shifted load
We now produce the desired code as noted in the README.txt file. Remove the
README entry and add a regression test.

llvm-svn: 225209
2015-01-05 21:33:14 +00:00
Hal Finkel a4750dec99 [PowerPC] Add another test for load/store with update
We now produce the desired code as noted in the README.txt file. Remove the
README entry and add a regression test.

llvm-svn: 225205
2015-01-05 21:22:42 +00:00
Hal Finkel 200d2ad188 [PowerPC] Fold i1 extensions with other ops
Consider this function from our README.txt file:

  int foo(int a, int b) { return (a < b) << 4; }

We now explicitly track CR bits by default, so the comment in the README.txt
about not really having a SETCC is no longer accurate, but we did generate this
somewhat silly code:

        cmpw 0, 3, 4
        li 3, 0
        li 12, 1
        isel 3, 12, 3, 0
        sldi 3, 3, 4
        blr

which generates the zext as a select between 0 and 1, and then shifts the
result by a constant amount. Here we preprocess the DAG in order to fold the
results of operations on an extension of an i1 value into the SELECT_I[48]
pseudo instruction when the resulting constant can be materialized using one
instruction (just like the 0 and 1). This was not implemented as a DAGCombine
because the resulting code would have been anti-canonical and depends on
replacing chained user nodes, which does not fit well into the lowering
paradigm. Now we generate:

        cmpw 0, 3, 4
        li 3, 0
        li 12, 16
        isel 3, 12, 3, 0
        blr

which is less silly.

llvm-svn: 225203
2015-01-05 21:10:24 +00:00
Hal Finkel 49557f1b42 [PowerPC] Remove zexts after i32 ctlz
The 64-bit semantics of cntlzw are not special, the 32-bit population count is
stored as a 64-bit value in the range [0,32]. As a result, it is always zero
extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a
frontier instruction for the removal of unnecessary zero extensions.

llvm-svn: 225192
2015-01-05 18:52:29 +00:00
Hal Finkel 4e2c78228a [PowerPC] Remove zexts after byte-swapping loads
lhbrx and lwbrx not only load their data with byte swapping, but also clear the
upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG
peephole optimization as frontier instructions for the removal of unnecessary
zero extensions.

llvm-svn: 225189
2015-01-05 18:09:06 +00:00
Hal Finkel 9bb61de1be [PowerPC] Enable speculation of cttz/ctlz
PPC has an instruction for ctlz with defined zero behavior, and our lowering of
cttz (provided by DAGCombine) is also efficient and branchless, so speculating
these makes sense.

llvm-svn: 225150
2015-01-05 05:24:42 +00:00
Hal Finkel 2f61879ff4 [PowerPC] Materialize i64 constants using rotation with masking
r225135 added the ability to materialize i64 constants using rotations in order
to reduce the instruction count. Sometimes we can use a rotation only with some
extra masking, so that we take advantage of the fact that generating a bunch of
extra higher-order 1 bits is easy using li/lis.

llvm-svn: 225147
2015-01-05 03:41:38 +00:00
Hal Finkel 241ba79f95 [PowerPC] Materialize i64 constants using rotation
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing a rotated constant, and then applying the inverse rotation, requires
fewer instructions than the direct method. If so, do that instead.

In r225132, I added support for forming constants using bit inversion. In
effect, this reverts that commit and replaces it with rotation support. The bit
inversion is useful for turning constants that are mostly ones into ones that
are mostly zeros (thus enabling a more-efficient shift-based materialization),
but the same effect can be obtained by using negative constants and a rotate,
and that is at least as efficient, if not more.

llvm-svn: 225135
2015-01-04 15:43:55 +00:00
Hal Finkel ca6375fb75 [PowerPC] Materialize i64 constants using bit inversion
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing the bit-reversed constant, and then flipping the bits, requires
fewer instructions than the direct method. If so, do that instead.

llvm-svn: 225132
2015-01-04 12:35:03 +00:00
Hal Finkel 5772566ed6 [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.

To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.

llvm-svn: 225117
2015-01-03 17:58:24 +00:00
Hal Finkel d73bfba7eb [PowerPC] Use 16-byte alignment for modern cores for functions/loops
Most modern PowerPC cores prefer that functions and loops start on
16-byte-aligned boundaries (*), so instruct block placement, etc. to make this
happen. The branch selector has also been adjusted so account for the extra
nops that might now be inserted before loop headers.

(*) Some cores actually prefer other alignments for small loops, but that will
    be addressed in a follow-up commit.

llvm-svn: 225115
2015-01-03 14:58:25 +00:00
Craig Topper 589ceee7f4 Minor cleanup to all the switches after MatchInstructionImpl in all the AsmParsers.
Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation.

llvm-svn: 225114
2015-01-03 08:16:34 +00:00
Hal Finkel 4edc66b8de [PowerPC] Add support for the CMPB instruction
Newer POWER cores, and the A2, support the cmpb instruction. This instruction
compares its operands, treating each of the 8 bytes in the GPRs separately,
returning a 'mask' result of 0 (for false) or -1 (for true) in each byte.

Code generation support is added, in the form of a PPCISelDAGToDAG
DAG-preprocessing routine, that recognizes patterns close to what the
instruction computes (either exactly, or related by a constant masking
operation), and generates the cmpb instruction (along with any necessary
constant masking operation). This can be expanded if use cases arise.

llvm-svn: 225106
2015-01-03 01:16:37 +00:00
Hal Finkel ddf8d7d155 [PowerPC] use UINT64_C instead of ul
Attempting to fix PR22078 (building on 32-bit systems) by replacing my careless
use of 1ul to be a uint64_t constant with UINT64_C(1).

llvm-svn: 225066
2015-01-01 19:33:59 +00:00
Hal Finkel c58ce4132a [PowerPC] Improve instruction selection bit-permuting operations (64-bit)
This is the second installment of improvements to instruction selection for "bit
permutation" instruction sequences. r224318 added logic for instruction
selection for 32-bit bit permutation sequences, and this adds lowering for
64-bit sequences. The 64-bit sequences are more complicated than the 32-bit
ones because:
  a) the 64-bit versions of the 32-bit rotate-and-mask instructions
     work by replicating the lower 32-bits of the value-to-be-rotated into the
     upper 32 bits -- and integrating this into the cost modeling for the various
     bit group operations is non-trivial
  b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions
     cannot, in one instruction, specify the
     mask starting index, the mask ending index, and the rotation factor. Also,
     forming arbitrary 64-bit constants is more complicated than in 32-bit mode
     because the number of instructions necessary is value dependent.

Plus, support for 'late masking' was added: it is sometimes more efficient to
treat the overall value as if it had no mandatory zero bits when planning the
bit-group insertions, and then mask them in at the very end. Unfortunately, as
the structure of the bit groups is different in the two cases, the more
feasible implementation technique was to generate both instruction sequences,
and then pick the shorter one.

And finally, we now generate reasonable code for i64 bswap:

        rldicl 5, 3, 16, 0
        rldicl 4, 3, 8, 0
        rldicl 6, 3, 24, 0
        rldimi 4, 5, 8, 48
        rldicl 5, 3, 32, 0
        rldimi 4, 6, 16, 40
        rldicl 6, 3, 48, 0
        rldimi 4, 5, 24, 32
        rldicl 5, 3, 56, 0
        rldimi 4, 6, 40, 16
        rldimi 4, 5, 48, 8
        rldimi 4, 3, 56, 0

vs. what we used to produce:

        li 4, 255
        rldicl 5, 3, 24, 40
        rldicl 6, 3, 40, 24
        rldicl 7, 3, 56, 8
        sldi 8, 3, 8
        sldi 10, 3, 24
        sldi 12, 3, 40
        rldicl 0, 3, 8, 56
        sldi 9, 4, 32
        sldi 11, 4, 40
        sldi 4, 4, 48
        andi. 5, 5, 65280
        andis. 6, 6, 255
        andis. 7, 7, 65280
        sldi 3, 3, 56
        and 8, 8, 9
        and 4, 12, 4
        and 9, 10, 11
        or 6, 7, 6
        or 5, 5, 0
        or 3, 3, 4
        or 7, 9, 8
        or 4, 6, 5
        or 3, 3, 7
        or 3, 3, 4

which is 12 instructions, instead of 25, and seems optimal (at least in terms
of code size).

llvm-svn: 225056
2015-01-01 02:53:29 +00:00
Rafael Espindola 54b435ec3c Add r224985 back with a fix.
The issues was that AArch64 has additional restrictions on when local
relocations can be used. We have to take those into consideration when
deciding to put a L symbol in the symbol table or not.

Original message:

Remove doesSectionRequireSymbols.

In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 225048
2014-12-31 17:19:34 +00:00
Rafael Espindola d4da9040de Revert "Remove doesSectionRequireSymbols."
This reverts commit r224985.

I am investigating why it made an Apple bot unhappy.

llvm-svn: 225044
2014-12-31 16:06:48 +00:00
Rafael Espindola b22d5aa49a Remove doesSectionRequireSymbols.
In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 224985
2014-12-30 13:13:27 +00:00
Rafael Espindola bed67f3adc Refactor duplicated code.
No intended functionality change.

llvm-svn: 224935
2014-12-29 15:18:31 +00:00
David Majnemer d0bcef2040 PowerPC: CTR shouldn't fire if a TLS call is in the loop
Determining the address of a TLS variable results in a function call in
certain TLS models.  This means that a simple ICmpInst might actually
result in invalidating the CTR register.

In such cases, do not attempt to rely on the CTR register for loop
optimization purposes.

This fixes PR22034.

Differential Revision: http://reviews.llvm.org/D6786

llvm-svn: 224890
2014-12-27 19:45:38 +00:00
Hal Finkel 0c505b08a5 [PowerPC] [FastISel] i1 constants must be zero extended
When materializing constant i1 values, they must be zero extended. We represent
i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code
path was dead for i1 values prior to r216006 (which is why this did not manifest in
miscompiles until recently).

Fixes -O0 self-hosting on PPC64/Linux.

llvm-svn: 224842
2014-12-25 23:08:25 +00:00
Hal Finkel fc096c98f3 [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64
On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl
instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction
sequence is interpreted by the unwinding code in libgcc. To make sure these
occur as a pair, as with other pairings interpreted by the linker, fuse the two
instructions into one instruction (for code generation only).

In the future, we might wish to do this by emitting CFI directives instead,
but this solution is simpler, and mirrors what GCC does. Additional discussion
on this point is contained in the PR.

Fixes PR22015.

llvm-svn: 224788
2014-12-23 22:29:40 +00:00
Hal Finkel 6e27c6d450 [PowerPC] Don't mark the return-address slot as immutable
It is tempting to mark the fixed stack slot used to store the return address as
immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the
function, it is not completely immutable: it is written during the function
prologue. When using post-RA instruction scheduling, the prologue instructions
are available for scheduling, and we're not free to interchange the order of a
particular store in the prologue with loads from that stack location.

Fixes PR21976.

llvm-svn: 224761
2014-12-23 09:45:06 +00:00
Hal Finkel 04b16b51ec [PowerPC] Don't attempt a 64-bit pow2 division on PPC32
In r224033, in moving the signed power-of-2 division expansion into
BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a
64-bit division on PPC32. This later asserts.

Fixes PR21928.

llvm-svn: 224758
2014-12-23 08:38:50 +00:00
Craig Topper f7df7221d1 [PowerPC] Use MCPhysReg for tables of registers. Const-correct the tables. Only put the anonymous namespace around classes. NFC.
llvm-svn: 224498
2014-12-18 05:02:14 +00:00
Will Schmidt 428488c594 Enable the P8Model entry
This was missed last time around, for the P8 Instruction Scheduling
changes (223257). This will hook the P8Model entry in so those
changes will actually be used.

llvm-svn: 224452
2014-12-17 19:56:29 +00:00
Hal Finkel 8adf2254ef [PowerPC] Improve instruction selection bit-permuting operations (32-bit)
The PowerPC backend, somewhat embarrassingly, did not generate an
optimal-length sequence of instructions for a 32-bit bswap. While adding a
pattern for the bswap intrinsic to fix this would not have been terribly
difficult, doing so would not have addressed the real problem: we had been
generating poor code for many bit-permuting operations (by which I mean things
like byte-swap that permute the bits of one or more inputs around in various
ways). Here are some initial steps toward solving this deficiency.

Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL,
SHL, SRL, AND and OR (mostly with constant second operands). Looking back
through these operations, we can build up a description of the bits in the
resulting value in terms of bits of one or more input values (and constant
zeros). For each bit, we compute the rotation amount from the original value,
and then group consecutive (value, rotation factor) bits into groups. Groups
sharing these attributes are then collected and sorted, and we can then
instruction select the entire permutation using a combination of masked
rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts
(rlwimi).

The result is that instead of lowering an i32 bswap as:

	rlwinm 5, 3, 24, 16, 23
	rlwinm 4, 3, 24, 0, 7
	rlwimi 4, 3, 8, 8, 15
	rlwimi 5, 3, 8, 24, 31
	rlwimi 4, 5, 0, 16, 31

we now produce:

	rlwinm 4, 3, 8, 0, 31
	rlwimi 4, 3, 24, 16, 23
	rlwimi 4, 3, 24, 0, 7

and for the 'test6' example in the PowerPC/README.txt file:

 unsigned test6(unsigned x) {
   return ((x & 0x00FF0000) >> 16) | ((x & 0x000000FF) << 16);
 }

we used to produce:

	lis 4, 255
	rlwinm 3, 3, 16, 0, 31
	ori 4, 4, 255
	and 3, 3, 4

and now we produce:

	rlwinm 4, 3, 16, 24, 31
	rlwimi 4, 3, 16, 8, 15

and, as a nice bonus, this fixes the FIXME in
test/CodeGen/PowerPC/rlwimi-and.ll.

This commit does not include instruction-selection for i64 operations, those
will come later.

llvm-svn: 224318
2014-12-16 05:51:41 +00:00
Hal Finkel 4104a1a346 [PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in PPCTL::DAGCombineExtBoolTrunc
PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted
truncations and extensions when dealing with nodes of the form:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)

There was a FIXME in the implementation (now removed) regarding the fact that
the function would abort the transformations if any of the non-output operands
of a SELECT or SELECT_CC node would need to be promoted (because they were
also output operands, for example). As a result, we continued to generate
unnecessary zero-extends for code such as this:

  unsigned foo(unsigned a, unsigned b) {
    return  (a <= b) ? a : b;
  }

which would produce:

  cmplw 0, 3, 4
  isel 3, 4, 3, 1
  rldicl 3, 3, 0, 32
  blr

and now we produce:

  cmplw 0, 3, 4
  isel 3, 4, 3, 1
  blr

which is better in the obvious way.

llvm-svn: 224213
2014-12-14 05:53:19 +00:00
Hal Finkel 4c6658feb0 [PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-exts
On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all
of the usual places, but also from the ABI, which specifies that values passed
are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined
to do *something* to the higher-order bits, and for some instructions, that
action clears those bits (thus providing a zero-extended result). This is
especially common after rotate-and-mask instructions. Adding an additional
instruction to zero-extend the results of these instructions is unnecessary.

This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and
looks back through their operands to see if all instructions will implicitly
zero extend their results. If so, we convert these instructions to their 64-bit
variants (which is an internal change only, the actual encoding of these
instructions is the same as the original 32-bit ones) and remove the
unnecessary zero-extension (changing where the INSERT_SUBREG instructions are
to make everything internally consistent).

llvm-svn: 224169
2014-12-12 23:59:36 +00:00
Hal Finkel b5e9b0426a [PowerPC] Better lowering for add/or of a FrameIndex
If we have an add (or an or that is really an add), where one operand is a
FrameIndex and the other operand is a small constant, we can combine the
lowering of the FrameIndex (which is lowered as an add of the FI and a zero
offset) with the constant operand.

Amusingly, this is an old potential improvement entry from
lib/Target/PowerPC/README.txt which had never been resolved. In short, we used
to lower:

        %X = alloca { i32, i32 }
        %Y = getelementptr {i32,i32}* %X, i32 0, i32 1
        ret i32* %Y

as:

        addi 3, 1, -8
        ori 3, 3, 4
        blr

and now we produce:

        addi 3, 1, -4
        blr

which is much more sensible.

llvm-svn: 224071
2014-12-11 22:51:06 +00:00
Matthias Braun 7e37a5f523 [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.

llvm-svn: 224059
2014-12-11 21:26:47 +00:00
Rafael Espindola 01c73610d0 This reverts commit r224043 and r224042.
check-llvm was failing.

llvm-svn: 224045
2014-12-11 20:03:57 +00:00
Matthias Braun a7c82a9f1d [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

llvm-svn: 224042
2014-12-11 19:42:05 +00:00
Hal Finkel 13d104bf78 [PowerPC] Implement BuildSDIVPow2, lower i64 pow2 sdiv using sradi
PPCISelDAGToDAG contained existing code to lower i32 sdiv by a power-of-2 using
srawi/addze, but did not implement the i64 case. DAGCombine now contains a
callback specifically designed for this purpose (BuildSDIVPow2), and part of
the logic has been moved to an implementation of that callback. Doing this
lowering using BuildSDIVPow2 likely does not matter, compared to handling
everything in PPCISelDAGToDAG, for the positive divisor case, but the negative
divisor case, which generates an additional negation, can potentially benefit
from additional folding from DAGCombine. Now, both the i32 and the i64 cases
have been implemented.

Fixes PR20732.

llvm-svn: 224033
2014-12-11 18:37:52 +00:00
Bill Schmidt efe9ce216e [PowerPC 4/4] Enable little-endian support for VSX.
With the foregoing three patches, VSX instructions can be used for
little endian.  This patch removes the restriction that prevented
this, and re-enables the test cases from the first three patches.

llvm-svn: 223792
2014-12-09 16:59:57 +00:00
Bill Schmidt 3014435ca9 [PowerPC 3/4] Little-endian adjustments for VSX vector shuffle
When performing instruction selection for ISD::VECTOR_SHUFFLE, there
is special code for handling v2f64 and v2i64 using VSX instructions.
This code must be adjusted for little-endian.  Because the two inputs
are treated as a double-wide register, we must swap their order for
little endian.  To get the appropriate mask elements to use with the
big-endian biased XXPERMDI instruction, we must reverse their order
and invert the bits.

A new test is added to test the 16 possible values of the shuffle
mask.  It is initially disabled for reasons specified in the test.  It
is re-enabled by patch 4/4.

llvm-svn: 223791
2014-12-09 16:52:29 +00:00
Bill Schmidt 10f6eb91a0 [PowerPC 2/4] Little-endian adjustments for VSX insert/extract operations
For little endian, we need to make some straightforward adjustments in
the code expansions for scalar_to_vector and vector_extract of v2f64.
First, scalar_to_vector must place the scalar into vector element
zero.  However, our implementation of SUBREG_TO_REG will place it into
big-element vector element zero (high-order bits), and for little
endian we need it in the low-order bits.  The LE implementation splats
the high-order doubleword into the low-order doubleword.

Second, the meaning of (vector_extract x, 0) and (vector_extract x, 1)
must be reversed for similar reasons.

A new test is added that tests code generation for insertelement and
extractelement for both element 0 and element 1.  It is disabled in
this patch but enabled in patch 4/4, for reasons stated in the test.

llvm-svn: 223788
2014-12-09 16:43:32 +00:00
Bill Schmidt fae5d71584 [PowerPC 1/4] Little-endian adjustments for VSX loads/stores
This patch addresses the inherent big-endian bias in the lxvd2x,
lxvw4x, stxvd2x, and stxvw4x instructions.  These instructions load
vector elements into registers left-to-right (with the first element
loaded into the high-order bits of the register), regardless of the
endian setting of the processor.  However, these are the only
vector memory instructions that permit unaligned storage accesses, so
we want to use them for little-endian.

To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
followed by an xxswapd, which swaps the doublewords.  This works for
lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
vector elements are in LE order (right-to-left) within each
doubleword.  (Thus after lxvw2x of a <4 x float> the elements will
appear as 1, 0, 3, 2.  Following the swap, they will appear as 3, 2,
0, 1, as desired.)   For stores, an stxvd2x or stxvw4x is replaced
with an stxvd2x preceded by an xxswapd.

Introduction of extra swap instructions provides correctness, but
obviously is not ideal from a performance perspective.  Future patches
will address this with optimizations to remove most of the introduced
swaps, which have proven effective in other implementations.

The introduction of the swaps is performed during lowering of LOAD,
STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations.  The latter
are used to translate intrinsics that specify the VSX loads and stores
directly into equivalent sequences for little endian.  Thus code that
uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
ported from BE to LE.

We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
use during this lowering step.  In PPCInstrVSX.td, we add new SDType
and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
These are recognized during instruction selection and mapped to the
correct instructions.

Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
to disable VSX on LE variants because code generation changes with
this and subsequent patches in this set.  I chose to include all of
these in the first patch than try to rigorously sort out which tests
were broken by one or another of the patches.  Sorry about that.

The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
are disabled until LE support is enabled because of breakages that
occur as noted in those tests.  They are re-enabled in patch 4/4.

llvm-svn: 223783
2014-12-09 16:35:51 +00:00
Bill Schmidt 0913500021 Restore r223709 as it was meant to be, and enable FeatureP8Vector for P8
llvm-svn: 223751
2014-12-09 03:02:48 +00:00
NAKAMURA Takumi cc4487eb8b Revert r223709, "[PowerPC]Activate FeatureVSX for the Power target", to unbreak bots.
CodeGen/PowerPC/vsx-p8.ll was failing.

  '+power8-vector' is not a recognized feature for this target (ignoring feature)
  llvm/test/CodeGen/PowerPC/vsx-p8.ll:33:14: error: expected string not found in input
  ; CHECK-REG: lxvw4x 34, 0, 3
               ^
  <stdin>:50:2: note: scanning from here
   .align 3
   ^
  <stdin>:61:2: note: possible intended match here
   lvx 3, 0, 3
   ^

llvm-svn: 223729
2014-12-09 01:03:27 +00:00
Bill Seurer 05663d8589 [PowerPC]Activate FeatureVSX for the Power target
This change activates FeatureVSX for Power 7 and Power 8 in PPC.td.

http://reviews.llvm.org/D6570

llvm-svn: 223709
2014-12-08 23:07:12 +00:00
Hal Finkel aa10b3caaf [PowerPC] Don't use a non-allocatable register to implement the 'cc' alias
GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when
processing inline asm constraints. This had previously been implemented using a
non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but
the infrastructure does not seem to support this properly (neither the register
allocator nor the scheduler properly accounts for the alias). Instead, we can
just process this as a naming alias inside of the inline asm
constraint-processing code, so we'll do that instead.

There are two regression tests, one where the post-RA scheduler did the wrong
thing with the non-allocatable alias, and one where the register allocator did
the wrong thing. Fixes PR21742.

llvm-svn: 223708
2014-12-08 22:54:22 +00:00
Bill Seurer 8c728ae9fb [PowerPC]Add VSX loads/stores to fastisel for PPC target
This patch adds VSX floating point loads and stores to fastisel.

Along with the change to tablegen (D6220), VSX instructions are now fully supported in fastisel.

http://reviews.llvm.org/D6274

llvm-svn: 223507
2014-12-05 20:15:56 +00:00
Hal Finkel 029042b278 [PowerPC] 'cc' should be an alias only to 'cr0'
We had mistakenly believed that GCC's 'cc' referred to the entire
condition-code register (cr0 through cr7) -- and implemented this in r205630 to
fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM
to clobber too much with legacy code with inline asm using the 'cc' clobber.

Fixes PR21451.

llvm-svn: 223328
2014-12-04 00:46:20 +00:00
Hal Finkel d433838adf [PowerPC] Fix inline asm memory operands not to use r0
On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is
a register containing the address. As a result, this register cannot be r0, and
we need to enforce this register subclass constraint to prevent miscompiling
the code (we'd get this constraint for free with the usual instruction
definitions, but that scheme has no knowledge of how we end up printing inline
asm memory operands, and so here we need to do it 'by hand'). We can accomplish
this within the current address-mode selection framework by introducing an
explicit COPY_TO_REGCLASS node.

Fixes PR21443.

llvm-svn: 223318
2014-12-03 23:40:13 +00:00
Will Schmidt eba49233c3 Add TableGen info for Power8.
This is based on the Power7 version, with units added and renamed to match P8.

Differential Revision: http://reviews.llvm.org/D6358

llvm-svn: 223257
2014-12-03 18:46:30 +00:00
Hal Finkel c91fc11181 [PowerPC] Print all inline-asm consts as signed numbers
Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed
numbers, and it is important that we print them as such. To make sure that
happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it
does all intermediate checks on a signed-extended int64_t value, and then
creates the resulting target constant using MVT::i64. This will ensure that all
negative values are printed as negative values (mirroring what is done in other
backends to achieve the same sign-extension effect).

This came up in the context of inline assembly like this:
  "add%I2   %0,%0,%2", ..., "Ir"(-1ll)
where we used to print:
  addi   3,3,4294967295
and gcc would print:
  addi   3,3,-1
and gas accepts both forms, but our builtin assembler (correctly) does not. Now
we print -1 like gcc does.

While here, I replaced a bunch of custom integer checks with isInt<16> and
friends from MathExtras.h.

Thanks to Paul Hargrove for the bug report.

llvm-svn: 223220
2014-12-03 09:37:50 +00:00
Hal Finkel 01fa7701e6 [PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets
We need to use the custom expansion of readcyclecounter on all 32-bit targets
(even those with 64-bit registers). This should fix the ppc64 buildbot.

llvm-svn: 223182
2014-12-03 00:19:17 +00:00
Hal Finkel bbdee93638 [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161
2014-12-02 22:01:00 +00:00
Jay Foad 1f0a44e662 [PowerPC] Fix unwind info with dynamic stack realignment
Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.

This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.

Reviewers: willschm, uweigand, hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6410

llvm-svn: 222996
2014-12-01 09:42:32 +00:00
Hal Finkel 378107daa4 [PowerPC] Add asm support for cache-inhibited ld/st instructions
Add assembler support for the fixed-point cache-inhibited load/store
instructions. These are hypervisor-level only, so don't get too excited ;)

Fixes PR21650.

llvm-svn: 222976
2014-11-30 10:15:56 +00:00
Simon Pilgrim 2bfd9129f4 Target triple OS detection tidyup. NFC
Use Triple::isOS*() helpers where possible.

llvm-svn: 222960
2014-11-29 19:18:21 +00:00
Craig Topper c50d64b07b Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.
llvm-svn: 222801
2014-11-26 00:46:26 +00:00
Hal Finkel 5901676581 [PowerPC] Add the 'attn' instruction
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.

llvm-svn: 222712
2014-11-25 00:30:11 +00:00
Hal Finkel 360f213d03 [PowerPC] Implement combineRepeatedFPDivisors
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.

llvm-svn: 222710
2014-11-24 23:45:21 +00:00
Ulrich Weigand a69bcd5ed0 [PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.

Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.

llvm-svn: 222672
2014-11-24 18:09:47 +00:00
NAKAMURA Takumi 7495dc76e8 Add LLVMScalarOpts to LLVMPowerPCCodeGen.
llvm-svn: 222516
2014-11-21 09:14:45 +00:00
Craig Topper 61e88f44f9 Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'
llvm-svn: 222509
2014-11-21 05:58:21 +00:00
Hal Finkel f413be11f0 [PPC] Use SeparateConstOffsetFromGEP
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.

llvm-svn: 222504
2014-11-21 04:35:51 +00:00
Reid Kleckner 357600eab5 Add out of line virtual destructors to all LLVMTargetMachine subclasses
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878.  When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.

This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.

llvm-svn: 222480
2014-11-20 23:37:18 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Bill Schmidt 7674692961 [PowerPC] Add VSX builtins for vec_div
This patch adds builtin support for xvdivdp and xvdivsp, along with a
test case.  Straightforward stuff.

There's a companion patch for Clang.

llvm-svn: 221983
2014-11-14 12:10:40 +00:00
Aditya Nandakumar 3053155652 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Aditya Nandakumar a27193297f This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
llvm-svn: 221878
2014-11-13 09:26:31 +00:00
Justin Hibbits a88b605721 Add support for small-model PIC for PowerPC.
Summary:
Large-model was added first.  With the addition of support for multiple PIC
models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI.  This
generates more optimal code, for shared libraries with less than about 16380
data objects.

Test Plan: Test cases added or updated

Reviewers: joerg, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, mcrosier, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D5399

llvm-svn: 221791
2014-11-12 15:16:30 +00:00