Commit Graph

1085 Commits

Author SHA1 Message Date
Simon Dardis 659c43f11a Revert "[MIPS] Add support to match more patterns for DINS instruction"
This reverts commit rL302512. This broke the mips buildbots.

llvm-svn: 302526
2017-05-09 13:18:48 +00:00
Strahinja Petrovic 27ae4c3259 [MIPS] Add support to match more patterns for DINS instruction
This patch adds support for recognizing patterns to match
DINS instruction.

Differential Revision: https://reviews.llvm.org/D31465

llvm-svn: 302512
2017-05-09 10:02:00 +00:00
Zoran Jovanovic ffef3e3c6a [mips][microMIPS] Adding code size reduction pass for MicroMIPS
Author: milena.vujosevic.janicic
Reviewers: sdardis
The code implements size reduction pass for MicroMIPS.
Load and store instructions are examined and transformed, if possible.
lw32 instruction is transformed into 16-bit instruction lwsp
sw32 instruction is transformed into 16-bit instruction swsp
Arithmetic instrcutions are examined and transformed, if possible.
addu32 instruction is transformed into 16-bit instruction addu16
subu32 instruction is transformed into 16-bit instruction subu16
Differential Revision: https://reviews.llvm.org/D15144

llvm-svn: 301540
2017-04-27 13:10:48 +00:00
Sagar Thakur b458b468a2 [mips] Fix test mips64fpldst.ll with machine verifier enabled
Removed micro mips register classes for gp initialization because gp initialization uses pure mips64 instruction. Even when compiling for micro mips, gp initialization can be done with pure mips64 instructions.

Reviewed by Simon Dardis
Differential: D32286

llvm-svn: 301394
2017-04-26 11:40:12 +00:00
Petar Jovanovic 2b6fe3ffa6 [mips][msa] Mask vectors holding shift amounts
Masked vectors which hold shift amounts when creating the following nodes:
ISD::SHL, ISD::SRL or ISD::SRA.
Instructions that use said nodes, which have had their arguments altered are
sll, srl, sra, bneg, bclr and bset.

For said instructions, the shift amount or the bit position that is
specified in the corresponding vector elements will be interpreted as the
shift amount/bit position modulo the size of the element in bits.

The problem lies in compiling with -O2 enabled, where the instructions for
formats .w and .d are not generated, but are instead optimized away.
In this case, having shift amounts that are either negative or greater than
the element bit size results in generation of incorrect results when
constant folding.

We remedy this by masking the operands for the nodes mentioned above before
actually creating them, so that the final result is correct before placed
into the constant pool.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D31331

llvm-svn: 300839
2017-04-20 13:26:46 +00:00
Matt Arsenault f10061ec70 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

llvm-svn: 299876
2017-04-10 20:18:21 +00:00
Simon Dardis f7e4388e3b Revert "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
This reverts commit r299766. This change appears to have broken the MIPS
buildbots. Reverting while I investigate.

Revert "[mips] Remove usage of debug only variable (NFC)"

This reverts commit r299769. Follow up commit.

llvm-svn: 299788
2017-04-07 17:25:05 +00:00
Petar Jovanovic bc54eb89ad [mips][msa] Fix generation of bm(n)zi and bins[lr]i instructions
We have two cases here, the first one being the following instruction
selection from the builtin function:
bm(n)zi builtin -> vselect node -> bins[lr]i machine instruction

In case of bm(n)zi having an immediate which has either its high or low bits
set, a bins[lr] instruction can be selected through the selectVSplatMask[LR]
function. The function counts the number of bits set, and that value is
being passed to the bins[lr]i instruction as its immediate, which in turn
copies immediate modulo the size of the element in bits plus 1 as per specs,
where we get the off-by-one-error.

The other case is:
bins[lr]i -> vselect node -> bsel.v

In this case, a bsel.v instruction gets selected with a mask having one bit
less set than required.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D30579

llvm-svn: 299768
2017-04-07 13:31:36 +00:00
Simon Dardis 6470ff0b24 [SelectionDAG] Enable target specific vector scalarization of calls and returns
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.

The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.

Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.

By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.

Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".

This patch enables the MIPS backend to take either form for vector types.

Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D27845

llvm-svn: 299766
2017-04-07 13:03:52 +00:00
Simon Dardis 0a47edb153 [mips] Deal with empty blocks in the mips hazard scheduler
This patch teaches the hazard scheduler how to handle empty blocks
when search for the next real instruction when dealing with forbidden
slots.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D31293

llvm-svn: 299427
2017-04-04 11:28:53 +00:00
Petar Jovanovic 9bff3b7818 [mips][msa] Prevent output operand from commuting for dpadd_[su].df ins
Implementation of TargetInstrInfo::findCommutedOpIndices for MIPS target,
restricting commutativity to second and third operand only for
dpaadd_[su].df instructions therein.

Prior to this change, there were cases where the vector that is to be added
to the dot product of the other two could take a position other than the
first one in the instruction, generating false output in the destination
vector.

Such behavior has been noticed in the two functions generating v2i64 output
values so far. Other ones may exhibit such behavior as well, just not for
the vector operands which are present in the test at the moment.

Tests altered so that the function's first operand is a constant splat so
that it can be loaded with a ldi instruction, since that is the case in
which the erroneous instruction operand placement has occurred. We check
that the register which is present in the ldi instruction is placed as the
first operand in the corresponding dpadd instruction.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D30827

llvm-svn: 299223
2017-03-31 14:31:55 +00:00
Ahmed Bougacha 6dd6082472 [CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.

Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.

SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.

Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).

Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed.  For now, use
'isEnabled' as a crude replacement.

This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'.  Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.

This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.

Differential Revision: https://reviews.llvm.org/D31405

llvm-svn: 299093
2017-03-30 17:49:58 +00:00
Kyle Butt 08655997eb CodeGen: BlockPlacement: Reduce TriangleChainCount to 2
This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%

llvm-svn: 297925
2017-03-16 01:32:29 +00:00
Petar Jovanovic b71386a4a4 [Mips] Add support to match more patterns for DEXT and CINS
This patch adds support for recognizing more patterns to match to DEXT and
CINS instructions.
It finds cases where multiple instructions could be replaced with a single
DEXT or CINS instruction.

For example, for the following:

define i64 @dext_and32(i64 zeroext %a) {
entry:

 %and = and i64 %a, 4294967295
 ret i64 %and
}

instead of generating:

 0000000000000088 <dext_and32>:

 88:   64010001        daddiu  at,zero,1
 8c:   0001083c        dsll32  at,at,0x0
 90:   6421ffff        daddiu  at,at,-1
 94:   03e00008        jr      ra
 98:   00811024        and     v0,a0,at
 9c:   00000000        nop

the following gets generated:

 0000000000000068 <dext_and32>:

 68:   03e00008        jr      ra
 6c:   7c82f803        dext    v0,a0,0x0,0x20

Cases that are covered:

DEXT:

 1. and $src, mask where mask > 0xffff
 2. zext $src zero extend from i32 to i64

CINS:

 1. and (shl $src, pos), mask
 2. shl (and $src, mask), pos
 3. zext (shl $src, pos) zero extend from i32 to i64

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D30464

llvm-svn: 297832
2017-03-15 13:10:08 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Simon Dardis 7090d145e8 [mips][msa] Accept more values for constant splats
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.

As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30640

llvm-svn: 297457
2017-03-10 13:27:14 +00:00
Simon Dardis 7577ce2140 [mips] Revert fixes for PR32020.
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.

Revert "[Target/MIPS] Kill dead code, no functional change intended."

This reverts commit r296153.

Revert "Recommit "[mips] Fix atomic compare and swap at O0.""

This reverts commit r296134.

llvm-svn: 297380
2017-03-09 14:03:26 +00:00
Simon Dardis 158956c6cc [mips] Fix return lowering
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.

This partially resolves PR/27458

Thanks to Quentin Colombet for reporting the issue!

llvm-svn: 297372
2017-03-09 11:19:48 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Simon Dardis ae6f2bcb25 Recommit "[mips] Fix atomic compare and swap at O0."
This time with the missing files.

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296134
2017-02-24 16:32:18 +00:00
Simon Dardis 3c58c18ff0 Revert "[mips] Fix atomic compare and swap at O0."
This reverts r296132. I forgot to include the tests.

llvm-svn: 296133
2017-02-24 16:30:27 +00:00
Simon Dardis cf0e06d375 [mips] Fix atomic compare and swap at O0.
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296132
2017-02-24 16:27:45 +00:00
Sagar Thakur ec65792910 [LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el
Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.

Reviewed by sdardis, dberris
Differential: D27697

llvm-svn: 295164
2017-02-15 10:48:11 +00:00
Sanne Wouda 57b63d6ade [LLC] Add an inline assembly diagnostics handler.
Summary:
llc would hit a fatal error for errors in inline assembly. The
diagnostics message is now printed.

Reviewers: rengolin, MatzeB, javed.absar, anemet

Reviewed By: anemet

Subscribers: jyknight, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D29408

llvm-svn: 293999
2017-02-03 11:14:39 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Kyle Butt b15c06677c CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Differential Revision: https://reviews.llvm.org/D28583

llvm-svn: 293716
2017-01-31 23:48:32 +00:00
Simon Dardis ca74dd79e9 [mips] Recommit: "N64 static relocation model support"
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

This version corrects a "Conditional jump or move depends on uninitialised
value(s)" error detected by valgrind present in the original commit.

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293279
2017-01-27 11:36:52 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Simon Dardis 5b67a4f75f Revert "[mips] N64 static relocation model support"
This reverts commit r293164. There are multiple tests failing.

llvm-svn: 293170
2017-01-26 10:46:07 +00:00
Simon Dardis 09e65efd09 [mips] N64 static relocation model support
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293164
2017-01-26 10:19:02 +00:00
Benjamin Kramer db9e0b659d Fix some broken CHECK lines.
The colon is important.

llvm-svn: 292761
2017-01-22 20:28:56 +00:00
Serge Pavlov ed5eb93384 Reverted: Track validity of pass results
Commits r291882 and related r291887.

llvm-svn: 292062
2017-01-15 10:23:18 +00:00
Serge Pavlov d409411ef1 Track validity of pass results
Running tests with expensive checks enabled exhibits some problems with
verification of pass results.

First, the pass verification may require results of analysis that are not
available. For instance, verification of loop info requires results of dominator
tree analysis. A pass may be marked as conserving loop info but does not need to
be dependent on DominatorTreePass. When a pass manager tries to verify that loop
info is valid, it needs dominator tree, but corresponding analysis may be
already destroyed as no user of it remained.

Another case is a pass that is skipped. For instance, entities with linkage
available_externally do not need code generation and such passes are skipped for
them. In this case result verification must also be skipped.

To solve these problems this change introduces a special flag to the Pass
structure to mark passes that have valid results. If this flag is reset,
verifications dependent on the pass result are skipped.

Differential Revision: https://reviews.llvm.org/D27190

llvm-svn: 291882
2017-01-13 06:09:54 +00:00
Kyle Butt efe56fed12 Revert "CodeGen: Allow small copyable blocks to "break" the CFG."
This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.

This needs a simple probability check because there are some cases where it is
not profitable.

llvm-svn: 291695
2017-01-11 19:55:19 +00:00
Kyle Butt df27aa8c89 CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.

Differential revision: https://reviews.llvm.org/D27742

llvm-svn: 291609
2017-01-10 23:04:30 +00:00
Matt Arsenault 0b382a7cb8 DAG: Avoid OOB when legalizing vector indexing
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.

llvm-svn: 291604
2017-01-10 22:02:30 +00:00
Simon Dardis 548a53f5ee [mips] Fix Mips MSA instrinsics
The usage of some MIPS MSA instrinsics that took immediates could crash LLVM
during lowering. This patch addresses that behaviour. Crucially this patch
also makes the use of intrinsics with out of range immediates as producing an
internal error.

The ld,st instrinsics would trigger an assertion failure for MIPS64 as their
lowering would attempt to add an i32 offset to a i64 pointer.

Reviewers: vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D25438

llvm-svn: 291571
2017-01-10 16:40:57 +00:00
Simon Dardis 0e9e237310 [mips] Honour -mno-odd-spreg for vector splat (again)
Previous the lowering of FILL_FW would use the MSA128W register class when
performing a vector splat. Instead it should be honouring -mno-odd-spreg and
only use the even registers when performing a splat from word to vector
register.

Logical follow-on from r230235.

This fixes PR/31369.

A previous commit was missing the test case and had another differential
in it.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D28373

llvm-svn: 291566
2017-01-10 15:53:10 +00:00
Petar Jovanovic 8a4e63994e [mips] Fix compact branch hazard detection, part 2
Follow up to D27209 fix, this patch now properly handles single transient
instruction in basic block.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D27856

llvm-svn: 290361
2016-12-22 19:29:50 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Simon Dardis 43b5ce492d [mips] Fix compact branch hazard detection
In certain cases it is possible that transient instructions such as
%reg = IMPLICIT_DEF as a single instruction in a basic block to reach
the MipsHazardSchedule pass. This patch teaches MipsHazardSchedule to
properly look through such cases.

Reviewers: vkalintiris, zoran.jovanovic

Differential Revision: https://reviews.llvm.org/D27209

llvm-svn: 289529
2016-12-13 11:07:51 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00