Commit Graph

4539 Commits

Author SHA1 Message Date
NAKAMURA Takumi 8061e8645f PPCFrameLowering::emitEpilogue(): Avoid manipulating MBBI on iterator end.
It caused crash in MachineInstr::hasPropertyInBundle() since r247237.

llvm-svn: 247395
2015-09-11 08:20:56 +00:00
Cong Hou c536bd9e73 Pass BranchProbability/BlockMass by value instead of const& as they are small. NFC.
llvm-svn: 247357
2015-09-10 23:10:42 +00:00
Kit Barton d3b904d440 Enable the shrink wrapping optimization for PPC64.
The changes in this patch are as follows:
  1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function
  2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function
  3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run:
      Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line

A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64.

Phabricator review: http://reviews.llvm.org/D11817

llvm-svn: 247237
2015-09-10 01:55:44 +00:00
Chandler Carruth 7b560d40bd [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167
2015-09-09 17:55:00 +00:00
Eric Christopher 71f6e2f568 Fix the PPC CTR Loop pass to look for calls to the intrinsics that
read CTR and count them as reading the CTR.

llvm-svn: 247083
2015-09-08 22:14:58 +00:00
Hal Finkel ccf9259c00 [PowerPC] Don't commute trivial rlwimi instructions
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.

The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.

Fixes PR24719.

llvm-svn: 246937
2015-09-06 04:17:30 +00:00
Hal Finkel b1518d6c24 [PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:

  and(or(x, c1), c2)

but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.

Fixes PR24704.

llvm-svn: 246900
2015-09-05 00:02:59 +00:00
Hal Finkel 4a7be23976 [PowerPC] Enable interleaved-access vectorization
This adds a basic cost model for interleaved-access vectorization (and a better
default for shuffles), and enables interleaved-access vectorization by default.
The relevant difference from the default cost model for interleaved-access
vectorization, is that on PPC, the shuffles that end up being used are *much*
cheaper than modeling the process with insert/extract pairs (which are
quite expensive, especially on older cores).

llvm-svn: 246824
2015-09-04 00:10:41 +00:00
Hal Finkel 75afa2b6b6 [PowerPC] Always use aggressive interleaving on the A2
On the A2, with an eye toward QPX unaligned-load merging, we should always use
aggressive interleaving. It is generally superior to only using concatenation
unrolling.

llvm-svn: 246819
2015-09-03 23:23:00 +00:00
Hal Finkel e6702ca0e2 [PowerPC] Try harder to find a base+offset when looking for consecutive accesses
When forming permutation-based unaligned vector loads, we need to know whether
it is valid to read ahead of the requested address by a full vector length.
Doing so is more efficient (and allows for more CSE with later loads), but
could trigger a page fault if invalid. To determine validity, we look for other
loads in the same block that access the relevant address range.

The relevant point here is that we need to do this as part of the process of
forming permutation-based vector loads, and this happens quite early in the
SDAG pipeline - specifically before many of the address calculations are fully
canonicalized. As a result, we need to try harder to recognize base+offset
address computations, because they still might appear as chain of adds
(base+offset+offset, for example). To account for this, we'll look through
chains of adds, accumulating the constant offsets.

llvm-svn: 246813
2015-09-03 22:37:44 +00:00
Hal Finkel f11bc761d8 [PowerPC] Include the permutation cost for unaligned vector loads
Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX
types), even when accounting for the combining that takes place for multiple
consecutive such loads, there is at least one load instructions and one
permutation for each load. Make sure the cost reported reflects the cost of the
permutes as well.

llvm-svn: 246807
2015-09-03 21:23:18 +00:00
Hal Finkel 99d95328d6 [PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic
If you compute the MMO offset using unsigned arithmetic, you end up with a
large positive offset instead of a small negative one. In theory, this could
cause bad instruction-scheduling decisions later.

I noticed this by inspection from the debug output, and using that for the
regression test is the best I can do right now.

llvm-svn: 246805
2015-09-03 21:12:15 +00:00
Hal Finkel 79dbf5b562 [PowerPC] Cleanup cost model for unaligned vector loads/stores
I'm adding a regression test to better cover code generation for unaligned
vector loads and stores, but there's no functional change to the code
generation here. There is an improvement to the cost model for unaligned vector
loads and stores, mostly for QPX (for which we were not previously accounting
for the permutation-based loads), and the cost model implementation is cleaner.

llvm-svn: 246712
2015-09-02 21:03:28 +00:00
Hal Finkel 77c8b7ffd3 [PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE
LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the
TableGen-generated matching code, and it does this by testing the same
predicates used by the TableGen files. Unfortunately, when we added new
P8Altivec-only predicates, we started universally testing them in
LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8,
we'd end up with a selection failure.

llvm-svn: 246675
2015-09-02 16:52:37 +00:00
Reid Kleckner e00faf8ce1 [EH] Handle non-Function personalities like unknown personalities
Also delete and simplify a lot of MachineModuleInfo code that used to be
needed to handle personalities on landingpads.  Now that the personality
is on the LLVM Function, we no longer need to track it this way on MMI.
Certainly it should not live on LandingPadInfo.

llvm-svn: 246478
2015-08-31 20:02:16 +00:00
Hal Finkel a2cdbce661 [PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands
There were really two problems here. The first was that we had the truth tables
for signed i1 comparisons backward. I imagine these are not very common, but if
you have:
  setcc i1 x, y, LT
this has the '0 1' and the '1 0' results flipped compared to:
  setcc i1 x, y, ULT
because, in the signed case, '1 0' is really '-1 0', and the answer is not the
same as in the unsigned case.

The second problem was that we did not have patterns (at all) for the unsigned
comparisons select_cc nodes for i1 comparison operands. This was the specific
cause of PR24552. These had to be added (and a missing Altivec promotion added
as well) to make sure these function for all types. I've added a bunch more
test cases for these patterns, and there are a few FIXMEs in the test case
regarding code-quality.

Fixes PR24552.

llvm-svn: 246400
2015-08-30 22:12:50 +00:00
Hal Finkel 982e8d48f8 [MIR Serialization] static -> static const in getSerializable*MachineOperandTargetFlags
Make the arrays 'static const' instead of just 'static'. Post-commit review
comment from Roman Divacky on IRC. NFC.

llvm-svn: 246376
2015-08-30 08:07:29 +00:00
Hal Finkel 2d55698ed7 [PowerPC/MIR Serialization] Target flags serialization support
Add support for MIR serialization of PowerPC-specific operand target flags
(based on the generic infrastructure added in r244185 and r245383).

I won't even pretend that this is good test coverage, but this includes the
regression test associated with r246372. Adding an MIR test for that fix is far
superior to adding an IR-level test because particular instruction-scheduling
decisions are necessary in order to expose the bug, and using an MIR test we
can start the pipeline post-scheduling.

llvm-svn: 246373
2015-08-30 07:50:35 +00:00
Hal Finkel d2fd9becf4 [PowerPC] Don't assume ADDISdtprelHA's source is r3
Even through ADDISdtprelHA generally has r3 as its source register, it is
possible for the instruction scheduler to move things around such that some
other register is the source. We need to print the actual source register, not
always r3. Fixes PR24394.

The test case will come in a follow-up commit because it depends on MIR
target-flags parsing.

llvm-svn: 246372
2015-08-30 07:44:05 +00:00
Hal Finkel 7ffe55ae9d [PowerPC] Remove unnecessary braces in PPCVSXFMAMutate
Address Eric's post-commit review of r245741. NFC.

llvm-svn: 246121
2015-08-26 23:41:53 +00:00
Matthias Braun ccfc9c8d6d FastISel: Use finishCondBranch() for ARM,Mips,PowerPC FastISel
Note that after this change branch probabilities are preserved now.

llvm-svn: 245998
2015-08-26 01:55:47 +00:00
Hal Finkel 0f2ddcb83f [PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.

llvm-svn: 245907
2015-08-24 23:48:28 +00:00
Bill Schmidt 32fd189de2 [PPC64LE] Fix PR24546 - Swap optimization and debug values
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass.  The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation.  I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.

llvm-svn: 245862
2015-08-24 19:27:27 +00:00
Hal Finkel ff9639d6b7 [PowerPC] PPCVSXFMAMutate should not segfault on undef input registers
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).

Fixes the test case from PR24542 (although that may have been over-reduced).

llvm-svn: 245741
2015-08-21 21:34:24 +00:00
Hal Finkel 9fdce9adee [PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).

Fixes PR24225.

llvm-svn: 245535
2015-08-20 03:02:02 +00:00
Hal Finkel be78c25acb [PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.

llvm-svn: 245530
2015-08-20 01:18:20 +00:00
Nemanja Ivanovic 5f1cea4141 Temporary fix for the self-host failures introduced by rL244921.
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.

llvm-svn: 245481
2015-08-19 19:04:47 +00:00
Chandler Carruth 2f1fd1658f [PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.

I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.

But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.

To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.

To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.

With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.

Differential Revision: http://reviews.llvm.org/D12063

llvm-svn: 245193
2015-08-17 02:08:17 +00:00
Saleem Abdulrasool 3e190cb098 PowerPC: remove dead initialization (NFC)
Identified by the clang static analyzer.  No functional change intended.

llvm-svn: 245022
2015-08-14 03:48:35 +00:00
Nemanja Ivanovic 1c39ca6501 Scalar to vector conversions using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D11471

It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.

llvm-svn: 244921
2015-08-13 17:40:44 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
Cameron Esfahani f97999dc46 Explicitly clear the MI operand list when getInstruction() is called. Call MI.clear() within MCD::OPC_Decode case and inside of translateInstruction() for the X86 target. Remove now unnecessary MI.clear() from ARMDisassembler.
Summary: Explicitly clear the MI operand list when getInstruction() is called.

Reviewers: hfinkel, t.p.northover, hvarga, kparzysz, jyknight, qcolombet, uweigand

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11665

llvm-svn: 244557
2015-08-11 01:15:07 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Pete Cooper ebcd748927 Convert a bunch of loops to foreach. NFC.
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst.  This commit changes a bunch
of eligible loops to use it.

llvm-svn: 244260
2015-08-06 20:22:46 +00:00
Chandler Carruth 93205eb966 [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'
rather than 'unsigned' for their costs.

For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).

All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.

This passes all tests, and is also UBSan clean.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D11741

llvm-svn: 244080
2015-08-05 18:08:10 +00:00
Bill Schmidt 42ddd71120 [PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask
Given certain shuffle-vector masks, LLVM emits splat instructions
which splat the wrong bytes from the source register.  The issue is
that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp
does not ensure that the splat pattern found is requesting bytes that
are aligned on an EltSize boundary.  This patch detects this situation
as not a valid splat mask, resulting in a permute being generated
instead of a splat.

Patch and test case by Tyler Kenney, cleaned up a bit by me.

This is a simple bug fix that would be good to incorporate into 3.7.

llvm-svn: 243519
2015-07-29 14:31:57 +00:00
Sanjay Patel 1dd15598cf fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold
This fix was suggested as part of D11345 and is part of fixing PR24141.

With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.

There is no NFC-intended other than that.

Differential Revision: http://reviews.llvm.org/D11531

llvm-svn: 243498
2015-07-28 23:05:48 +00:00
Chih-Hung Hsieh 1e859582d6 Implement target independent TLS compatible with glibc's emutls.c.
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.

clang and driver changes in http://reviews.llvm.org/D10524

  Added -femulated-tls flag to select the emulated TLS model,
  which will be used for old targets like Android that do not
  support ELF TLS models.

Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.

Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.

TODO: Add proper DIE for emulated TLS variables.
      Added new unit tests with emulated TLS.

Differential Revision: http://reviews.llvm.org/D10522

llvm-svn: 243438
2015-07-28 16:24:05 +00:00
Colin LeMahieu fe2c8b8015 [llvm-mc] Pushing plumbing through for --fatal-warnings flag.
llvm-svn: 243334
2015-07-27 21:56:53 +00:00
Pete Cooper 2e20147403 Revert "Add const to some Type* parameters which didn't need to be mutable. NFC."
This reverts commit r243146.

Feedback from Craig Topper and David Blaikie was that we don't put const on Type as it has no mutable state.

llvm-svn: 243282
2015-07-27 17:15:24 +00:00
Eric Christopher f0024d14f1 Fix PPCMaterializeInt to check the size of the integer based on the
extension property we're requesting - zero or sign extended.

This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.

llvm-svn: 243192
2015-07-25 00:48:08 +00:00
Eric Christopher 03df7ac8a9 PPCMaterializeInt should only take a ConstantInt so represent this in the prototype
and fix up all uses.

llvm-svn: 243191
2015-07-25 00:48:06 +00:00
Pete Cooper 098f7c1fcb Add const to some Type* parameters which didn't need to be mutable. NFC.
We were only getting the size of the type which doesn't need to modify
the type.

llvm-svn: 243146
2015-07-24 19:19:26 +00:00
Pete Cooper 0debbdc872 Use foreach loops for StructType::elements(). NFC.
We had a few places where we did

for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {

but those could instead do

for (auto *EltTy : STy->elements()) {

llvm-svn: 243136
2015-07-24 18:55:49 +00:00
Bill Schmidt 2be8054b49 [PPC64LE] More vector swap optimization TLC
This makes one substantive change and a few stylistic changes to the
VSX swap optimization pass.

The substantive change is to permit LXSDX and LXSSPX instructions to
participate in swap optimization computations.  The previous change to
insert a swap following a SUBREG_TO_REG widening operation makes this
almost trivial.

I experimented with also permitting STXSDX and STXSSPX instructions.
This can be done using similar techniques:  we could insert a swap
prior to a narrowing COPY operation, and then permit these stores to
participate.  I prototyped this, but discovered that the pattern of a
narrowing COPY followed by an STXSDX does not occur in any of our
test-suite code.  So instead, I added commentary indicating that this
could be done.

Other TLC:
 - I changed SH_COPYSCALAR to SH_COPYWIDEN to more clearly indicate
 the direction of the copy.
 - I factored the insertion of swap instructions into a separate
 function.

Finally, I added a new test case to check that the scalar-to-vector
loads are working properly with swap optimization.

llvm-svn: 242838
2015-07-21 21:40:17 +00:00
JF Bastien e4d22d59d1 Targets: commonize some stack realignment code
This patch does the following:
* Fix FIXME on `needsStackRealignment`: it is now shared between multiple targets, implemented in `TargetRegisterInfo`, and isn't `virtual` anymore. This will break out-of-tree targets, silently if they used `virtual` and with a build error if they used `override`.
* Factor out `canRealignStack` as a `virtual` function on `TargetRegisterInfo`, by default only looks for the `no-realign-stack` function attribute.

Multiple targets duplicated the same `needsStackRealignment` code:
 - Aarch64.
 - ARM.
 - Mips almost: had extra `DEBUG` diagnostic, which the default implementation now has.
 - PowerPC.
 - WebAssembly.
 - x86 almost: has an extra `-force-align-stack` option, which the default implementation now has.

The default implementation of `needsStackRealignment` used to just return `false`. My current patch changes the behavior by simply using the above shared behavior. This affects:
 - AMDGPU
 - BPF
 - CppBackend
 - MSP430
 - NVPTX
 - Sparc
 - SystemZ
 - XCore
 - Out-of-tree targets
This is a breaking change! `make check` passes.

The only implementation of the `virtual` function (besides the slight different in x86) was Hexagon (which did `MF.getFrameInfo()->getMaxAlignment() > 8`), and potentially some out-of-tree targets. Hexagon now uses the default implementation.

`needsStackRealignment` was being overwritten in `<Target>GenRegisterInfo.inc`, to return `false` as the default also did. That was odd and is now gone.

Reviewers: sunfish

Subscribers: aemerson, llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11160

llvm-svn: 242727
2015-07-20 22:51:32 +00:00
Bill Schmidt 54cced54a6 [PowerPC] v4i32 is a VSRCRegClass
I was looking at some vector code generation and kept seeing
unnecessary vector copies into the Altivec half of the VSX registers.
I discovered that we overlooked v4i32 when adding the register classes
for VSX; we only added v4f32 and v2f64.  This means that anything that
canonicalizes into v4i32 (which is a LOT of stuff) ends up being
forced into VRRC on its way to VSRC.

The fix is one line.  The rest of the patch is fixing up some test
cases whose code generation has changed as a result.

This seems like it would be a good candidate for backport to 3.7.

llvm-svn: 242442
2015-07-16 21:14:07 +00:00
Mehdi Amini bd7287ebe5 Move most user of TargetMachine::getDataLayout to the Module one
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

This patch is quite boring overall, except for some uglyness in
ASMPrinter which has a getDataLayout function but has some clients
that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so
some methods are taking a DataLayout as parameter.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11090

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242386
2015-07-16 06:11:10 +00:00
Bill Schmidt 1e77bb12b4 [PPC64LE] Fix vec_sld semantics for little endian
The vec_sld interface provides access to the vsldoi instruction.
Unlike most of the vec_* interfaces, we do not attempt to change the
generated code for vec_sld based on the endian mode.  It is too
difficult to correctly infer the desired semantics because of
different element types, and the corrected instruction sequence is
expensive, involving loading a permute control vector and performing a
generalized permute.

For GCC, this was implemented as "Don't touch the vec_sld"
implementation.  When it came time for the LLVM implementation, I did
the same thing.  However, this was hasty and incorrect.  In LLVM's
version of altivec.h, vec_sld was previously defined in terms of the
vec_perm interface.  Because vec_perm semantics are adjusted for
little endian, this means that leaving vec_sld untouched causes it to
generate something different for LE than for BE.  Not good.

This back-end patch accompanies the changes to altivec.h that change
vec_sld's behavior for little endian.  Those changes mean that we see
slightly different code in the back end when trying to recognize a
VSLDOI instruction in isVSLDOIShuffleMask.  In particular, a
ShuffleKind of 1 (where the two inputs are identical) must now be
treated the same way as a ShuffleKind of 2 (little endian with
different inputs) when little endian mode is in force.  This is
because ShuffleKind of 1 is defined using big-endian numbering.

This has a ripple effect on LowerBUILD_VECTOR, where we create our own
internal VSLDOI instructions.  Because these are a ShuffleKind of 1,
they will now have their shift amounts subtracted from 16 when
recognizing the shuffle mask.  To avoid problems we have to subtract
them from 16 again before creating the VSLDOI instructions.

There are a couple of other uses of BuildVSLDOI, but these do not need
to be modified because the shift amount is 8, which is unchanged when
subtracted from 16.

llvm-svn: 242296
2015-07-15 15:45:30 +00:00
Benjamin Kramer c11fd3e775 [PPC] Disassemble little endian ppc instructions in the right byte order
PR24122. The test is simply a byte swapped version of ppc64-encoding.txt.

llvm-svn: 242288
2015-07-15 12:56:19 +00:00